A School for all Seasons on Trustworthy Machine Learning

List curated by Reza Shokri (National University of Singapore) and Nicolas Papernot (University of Toronto and Vector Institute)

Machine learning algorithms are trained on potentially sensitive data, and are increasingly being used in critical decision making processes. Can we trust machine learning frameworks to have access to personal data? Can we trust the models not to reveal personal information or sensitive decision rules? In the settings where training data is noisy or adversarially crafted, can we trust the algorithms to learn robust decision rules? Can we trust them to make correct predictions on adversarial or noisy data? Bias affecting some groups in the population underlying a dataset can arise from both a lack of representation in data but also poor choices of learning algorithms. Can we build trustworthy algorithms that remove disparities and provide fair predictions for all groups? To identify various issues with machine learning algorithms and establish trust, can we provide informative interpretation of machine learning decisions? These are the major questions that the emerging research field of trustworthy machine learning aims to respond.

We have selected different sub-topics and key related research papers (as starting points) to help a student learn about this research area. There are so many good papers which are being published in this domain. This list is by no means comprehensive. Papers are selected here with the intention of maximizing coverage of the techniques introduced in the literature in as few papers as possible. Students are encouraged to dive deeper by reading the follow-up research papers.

Privacy and Confidentiality

Data Inference Attacks

Background
- Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. “Exposed! a survey of attacks on private data.” In Annual Review of Statistics and Its Application, 2017. [paper]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. “Membership inference attacks against machine learning models.” In IEEE Symposium on Security and Privacy (SP), 2017. [paper] [conference talk] [citations]
Milad Nasr, Reza Shokri, and Amir Houmansadr. “Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning.” In IEEE Symposium on Security and Privacy (SP), 2019. [paper] [conference talk] [citations]
Congzheng Song, and Ananth Raghunathan. “Information Leakage in Embedding Models.” In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2020. [paper]
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts et al. “Extracting training data from large language models.” In Proceedings of USENIX Security, 2021. [paper]

Memorization

Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. “The secret sharer: Evaluating and testing unintended memorization in neural networks.” In USENIX Security Symposium, 2019. [paper] [conference talk] [citations]
Vitaly Feldman. “Does learning require memorization? a short tale about a long tail.” In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, 2020. [paper] [talk] [follow-up paper] [citations]

Model Inference Attacks

Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. “Stealing machine learning models via prediction APIs.” In USENIX Security Symposium, 2016. [paper] [conference talk] [citations]
Lejla Batina, Shivam Bhasin, Dirmanto Jap, and Stjepan Picek. “CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel.” In USENIX Security Symposium, 2019. [paper] [conference talk]
Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, and Songbai Yan. “Exploring Connections Between Active Learning and Model Extraction.” In USENIX Security Symposium, 2020. [paper]
Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. “High Accuracy and High Fidelity Extraction of Neural Networks.” In USENIX Security Symposium, 2020. [paper]

Privacy-Preserving Learning

Background
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. “Calibrating noise to sensitivity in private data analysis.” In Theory of cryptography conference, 2006. [paper]
- Cynthia Dwork, and Aaron Roth. “The algorithmic foundations of differential privacy.” Foundations and Trends in Theoretical Computer Science 9, 2014. [book]
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. “Differentially private empirical risk minimization.” In Journal of Machine Learning Research 12, no. 3, 2011. [paper] [tutorial talk at NIPS 2017] [citations]
Raef Bassily, Adam Smith, and Abhradeep Thakurta. “Private empirical risk minimization: Efficient algorithms and tight error bounds.” In IEEE 55th Annual Symposium on Foundations of Computer Science, 2014. [paper] [talk by AT at MSR] [follow-up paper] [citations]
Martin Abadi, Andy Chu, Ian Goodfellow, Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. “Deep learning with differential privacy.” In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2016. [paper] [conference talk] [citations]
Nicolas Papernot, Martin Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. “Semi-supervised knowledge transfer for deep learning from private training data.” In Proceedings of the 5th International Conference on Learning Representations, 2017. [paper] [conference talk] [follow-up paper] [citations]
Milad Nasr, Reza Shokri, and Amir Houmansadr. “Machine learning with membership privacy using adversarial regularization.” In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018. [paper] [conference talk] [citations]
Florian Tramèr, and Dan Boneh. “Differentially Private Learning Needs Better Features (or Much More Data).” In International Conference on Learning Representations, 2020. [paper
Overview Talks
- Adam Smith, “Differential Privacy.” 2019. [Tutorial]
- Kunal Talwar, “Large-Scale Private Learning.” 2019. [Part I], [Part II]
- Ilya Mironov, “Rényi Differential Privacy.” 2018. [Talk]

Confidential Computing

Payman Mohassel, and Peter Rindal. “ABY3: A mixed protocol framework for machine learning.” In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018. [paper] [conference talk] [citations]
Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. “GAZELLE: A low latency framework for secure neural network inference.” In 27th USENIX Security Symposium, 2018. [paper] [conference talk] [citations]
Olga Ohrimenko, Felix Schuster, Cédric Fournet, Aastha Mehta, Sebastian Nowozin, Kapil Vaswani, and Manuel Costa. “Oblivious multi-party machine learning on trusted processors.” In 25th USENIX Security Symposium, 2016. [paper] [conference talk] [citations]

Machine Unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. “Machine unlearning.” arXiv preprint arXiv:1912.03817, 2019. [paper]
Seth Neel, Aaron Roth, and Saeed Sharifi-Malvajerdi. “Descent-to-Delete: Gradient-Based Methods for Machine Unlearning.” arXiv preprint arXiv:2007.02923, 2020. [paper]

Decentralized (Collaborative, Federated) Learning

Reza Shokri, and Vitaly Shmatikov. “Privacy-preserving deep learning.” In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015. [paper] [citations]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. “Communication-efficient learning of deep networks from decentralized data.” In Artificial Intelligence and Statistics, 2017. [paper] [citations]
Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S. Talwalkar. “Federated multi-task learning.” In Advances in Neural Information Processing Systems, 2017. [paper] [citations]
Peter Kairouz et al. “Advances and open problems in federated learning.” arXiv preprint arXiv:1912.04977, 2019. [paper]

Law and Policy

Kobbi Nissim, Aaron Bembenek, Alexandra Wood, Mark Bun, Marco Gaboardi, Urs Gasser, David R. O’Brien, Thomas Steinke, and Salil Vadhan. “Bridging the gap between computer science and legal approaches to privacy.” Harvard Journal of Law & Technology, Volume 31, 2018. [paper] [talk by KN] [citations]

Tools and Libraries

Privacy in Statistics and Machine Learning, Adam Smith and Jonathan Ullman, 2021.
Data Privacy: Foundations and Applications, Simons Institute for the Theory of Computing, 2019.
Applied Privacy for Data Science, James Honaker and Salil Vadhan, Harvard, 2019.
Privacy in Machine Learning and Statistical Inference, Adam Smith, Boston University, 2018.
The Algorithmic Foundations of Adaptive Data Analysis, Adam Smith and Aaron Roth, UPenn and Boston University, 2017.
Algorithms for Private Data Analysis, Gautam Kamath, University of Waterloo, 2020.

Robustness

Training Phase

Background
- Ilias Diakonikolas, and Daniel M. Kane. “Recent advances in algorithmic high-dimensional robust statistics.” arXiv preprint arXiv:1911.05911, 2019. [paper]
- Ankur Moitra, “Robustness Meets Algorithms.” 2019. [Talk]
Battista Biggio, Blaine Nelson, Pavel Laskov. “Poisoning Attacks against Support Vector Machines”. In International Conference on Machine Learning, 2012. [paper]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. “Understanding deep learning requires rethinking generalization.” In International Conference on Learning Representations, 2017. [paper] [conference talk] [citations]
Jacob Steinhardt, Pang Wei W. Koh, and Percy S. Liang. “Certified defenses for data poisoning attacks.” In Advances in neural information processing systems, 2017. [paper] [citations]
Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, Bo Li. “Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning” In IEEE Symposium on Security and Privacy, 2018. [paper]
Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart. Sever: A Robust Meta-Algorithm for Stochastic Optimization. In ICML 2019. [paper]

Inference Phase

Integrity

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples.” International Conference on Learning Representations, 2015. [paper] [citations] [follow-ups: universal perturbation, physical world, random steps in iterative adversarial training, attacks on question answering, attacks on audio and semantic segmentation]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, Ananthram Swami. “Practical Black-Box Attacks against Machine Learning”. In Asia Conference on Computer and Communications Security, 2017. [paper] [follow-ups: transferability, gradient-free black-box attacks]
Nicholas Carlini, and David Wagner. “Towards evaluating the robustness of neural networks.” In IEEE symposium on security and privacy (SP), 2017. [paper] [conference talk] [citations] [follow-ups: evading detection]
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. “Certified Adversarial Robustness via Randomized Smoothing.” In International Conference on Machine Learning, 2019. [paper] [talk by ZK] [citations]
Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C. Duchi, and Percy S. Liang. “Unlabeled data improves adversarial robustness.” In Advances in Neural Information Processing Systems, 2019. [paper] [citations]
Overview Talks
- Ian Goodfellow, “Adversarial Examples and Adversarial Training.” 2017. [Lecture]
- Zico Kolter and Aleksander Madry, “Adversarial Robustness - Theory and Practice.” 2018. [NeurIPS Tutorial]
Benchmarks
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. “Towards Deep Learning Models Resistant to Adversarial Attacks.” In ICLR, 2018. [paper]

Availability

Sanghyun Hong, Pietro Frigo, Yiğitcan Kaya, Cristiano Giuffrida, and Tudor Dumitraș. “Terminal brain damage: Exposing the graceless degradation in deep neural networks under hardware fault attacks.” In USENIX Security Symposium, 2019. [paper] [conference talk]
Ilia Shumailov, Yiren Zhao, Daniel Bates, Nicolas Papernot, Robert Mullins, Ross Anderson, “Sponge Examples: Energy-Latency Attacks on Neural Networks”. Preprint, 2020. [paper]

Testing and Verification

Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. “Reluplex: An efficient SMT solver for verifying deep neural networks.” In International Conference on Computer Aided Verification, 2017. [paper] [conference talk] [citations]
Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. “An abstract domain for certifying neural networks.” Proceedings of the ACM on Programming Languages (POPL), 2019. [paper] [conference talk] [citations]
Xiaowei Huang Marta Kwiatkowska Sen Wang Min Wu. “Safety Verification of Deep Neural Networks”. In Computer Aided Verification, 2017. [paper]

Tools and Libraries

Law and Policy

Ram Shankar Siva Kumar, Jonathon Penney, Bruce Schneier, Kendra Albert. “Legal Risks of Adversarial Machine Learning Research” In ICML 2020 Workshop on Law & Machine Learning. [paper]

Reliable and Interpretable Artificial Intelligence, Martin Vechev, ETH Zurich, 2020.

Algorithmic Fairness

Overview
- Alexandra Chouldechova, and Aaron Roth. “The frontiers of fairness in machine learning.” arXiv preprint arXiv:1810.08810, 2018. [paper]
- Solon Barocas, Moritz Hardt, and Arvind Narayanan. “Fairness and machine learning: Limitations and Opportunities.” Work in progress book, 2019. [book]
Overview Talks
- Solon Barocas and Moritz Hardt, “Fairness in machine learning.” [Tutorial at NIPS], 2017.
- Arvind Narayanan, “21 fairness definitions and their politics.” 2018. [Tutorial]
- Cynthia Dwork, “The Emerging Theory of Algorithmic Fairness.” 2018. [Talk]
- Moritz Hardt, “Fairness.” [Part I], [Part II]
- Suresh Venkatasubramanian, “Algorithmic Fairness and Unfairness: A New Research Area.” 2019. [Talk]

Measures

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. “Fairness through awareness.” In Proceedings of the 3rd innovations in theoretical computer science conference, 2012. [paper] [citations]
Moritz Hardt, Eric Price, and Nati Srebro. “Equality of opportunity in supervised learning.” In Advances in neural information processing systems, 2016. [paper] [citations]
Matt J. Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. “Counterfactual fairness.” In Advances in neural information processing systems, 2017. [paper] [talk by MK] [citations]

Mechanisms

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. “Learning fair representations.” In International Conference on Machine Learning, 2013. [paper] [citations]
Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. “Certifying and removing disparate impact.” In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015. [paper] [conference talk] [citations]
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P. Gummadi. “Fairness constraints: Mechanisms for fair classification.” In Artificial Intelligence and Statistics, 2017. [paper] [citations]
Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna Wallach. “A reductions approach to fair classification.” In International Conference on Machine Learning, 2018. [paper] [citations]
B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating Unwanted Biases with Adversarial Learning.” in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society - AIES ’18 [paper]

Analysis

Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. “Inherent trade-offs in the fair determination of risk scores.” arXiv preprint arXiv:1609.05807, 2016. [paper] [talk by JK] [citations]
Sam Corbett-Davies, Emma Pierson], Avi Feller, Sharad Goel, and Aziz Huq. “Algorithmic decision making and the cost of fairness.” In Proceedings of the 23rd acm SIGKDD international conference on knowledge discovery and data mining, 2017. [paper] [talk by SCD] [citations]
Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. “Delayed impact of fair machine learning.” In International Conference on Machine Learning, 2018. [paper] [talk by LL] [citations]
Awad, Edmond, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan. “The Moral Machine Experiment”. in Nature 2018. [paper]

Robustness

Avrim Blum, and Kevin Stangl. “Recovering from biased data: Can fairness constraints improve accuracy?.” In 1st Symposium on Foundations of Responsible Computing (FORC), 2020. [paper] [conference talk]
Heinrich Jiang, and Ofir Nachum. “Identifying and correcting label bias in machine learning.” In International Conference on Artificial Intelligence and Statistics, 2020. [paper]
Hongyan Chang, Ta Duy Nguyen, Sasi Kumar Murakonda, Ehsan Kazemi, and Reza Shokri. “On Adversarial Bias and the Robustness of Fair Machine Learning.” arXiv preprint arXiv:2006.08669, 2020. [paper]

Recent Developments in Research on Fairness, Simons Institute for the Theory of Computing, 2019.
Fairness in Machine Learning, Moritz Hardt, UC Berkeley, 2017.
Fairness in Machine Learning, Arvind Narayanan, Princeton, 2017.
Human-Centered Machine Learning, Krishna P. Gummadi, MPI-Software, 2018.

Tools and libraries

Algorithmic Transparency

Finale Doshi-Velez, and Been Kim. “Towards a rigorous science of interpretable machine learning.” arXiv preprint arXiv:1702.08608, 2017. [paper] [talk by FDV]
Brent Mittelstadt, Chris Russell, and Sandra Wachter. “Explaining explanations in AI.” In Proceedings of the conference on fairness, accountability, and transparency, 2019. [paper]
Overview talks
- Cynthia Rudin. “Do Simpler Models Exist and How Can We Find Them?.” [keynote talk at KDD]
- Rich Caruana. “Friends Don’t Let Friends Deploy Black-Box Models: Intelligibility in Machine Learning for Bias Detection and Correction.” [talk]

Model Explanation

Sandra Wachter, Brent Mittelstadt, and Chris Russell. “Counterfactual explanations without opening the black box: Automated decisions and the GDPR.” Harvard Journal of Law & Technology, 2017. [paper] [citations]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ““Why should I trust you?” Explaining the predictions of any classifier.” In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016. [paper] [conference talk] [talk by SS] [citations]
Scott M. Lundberg, and Su-In Lee. “A unified approach to interpreting model predictions.” In Advances in neural information processing systems, 2017. [paper] [citations]
Anupam Datta, Shayak Sen, and Yair Zick. “Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems.” In IEEE symposium on security and privacy (SP), 2016. [paper] [talk by AD] [citations]

Interpretability

Cynthia Rudin. “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.” Nature Machine Intelligence 1, 2019. [paper]

Recourse

Berk Ustun, Alexander Spangher, and Yang Liu. “Actionable recourse in linear classification.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019. [paper] [conference talk] [citations]

Robustness

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. “Sanity checks for saliency maps.” In Advances in Neural Information Processing Systems, 2018. [paper] [conference talk, starts at 53’] [citations]
Amirata Ghorbani, Abubakar Abid, and James Zou. “Interpretation of neural networks is fragile.” In Proceedings of the AAAI Conference on Artificial Intelligence, 2019. [paper] [conference talk] [citations]
Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. “Fooling lime and shap: Adversarial attacks on post hoc explanation methods.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020. [paper] [talk by HL] [citations]

Privacy and Confidentiality

Reza Shokri, Martin Strobel, and Yair Zick. “On the Privacy Risks of Model Explanations.” AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021. [paper] [citations]
Neel Patel, Reza Shokri, and Yair Zick. “Model Explanations with Differential Privacy.” arXiv preprint arXiv:2006.09129, 2020. [paper]
Smitha Milli, Ludwig Schmidt, Anca D. Dragan, and Moritz Hardt. “Model reconstruction from model explanations.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019. [paper] [conference talk] [citations]

Analysis

Jon Kleinberg, and Sendhil Mullainathan. “Simplicity creates inequity: implications for fairness, stereotypes, and interpretability.” In Proceedings of the 2019 ACM Conference on Economics and Computation, 2019. [paper] [conference talk] [citations]

Law and Policy

Andrew D. Selbst, and Solon Barocas. “The Intuitive Appeal of Explainable Machines.” Fordham law review 87, no. 3, 2018. [paper]

Interpretability and Explainability in Machine Learning, Himabindu Lakkaraju, Harvard, 2019.

A School for all Seasons on Trustworthy Machine Learning

Privacy and Confidentiality

Data Inference Attacks

Memorization

Model Inference Attacks

Privacy-Preserving Learning

Confidential Computing

Machine Unlearning

Decentralized (Collaborative, Federated) Learning

Law and Policy

Tools and Libraries

Related Courses and Schools

Robustness

Training Phase

Inference Phase

Integrity

Availability

Testing and Verification

Tools and Libraries

Law and Policy

Related Courses and Schools

Algorithmic Fairness

Measures

Mechanisms

Analysis

Robustness

Related Courses and Schools

Tools and libraries

Algorithmic Transparency

Model Explanation

Interpretability

Recourse

Robustness

Privacy and Confidentiality

Analysis

Law and Policy

Related Courses and Schools