Towards Life 3.0 - Ethics and Technology in the 21st Century: Privacy and Accuracy in Data Science: Do We Need to Choose Only One?


Monday, April 22, 2019, 5:30pm to 6:45pm


Wexner Room 102, 79 JFK Street Cambridge, MA, 02138

Towards Life 3.0: Ethics and Technology in the 21stCentury is a new talk series organized and facilitated by Mathias Risse, Director of the Carr Center for Human Rights Policy and Lucius N. Littauer Professor of Philosophy and Public Administration. Drawing inspiration from the title of Max Tegmark’s book, Life 3.0: Being Human in the Age of Artificial Intelligence, the series draws upon a range of scholars, technology leaders, and public interest technologists to address the ethical aspects of the long-term impact of artificial intelligence on society and human life.

Held on select Monday evenings at 5:30 – 6:45 in Wexner 102, and occasionally on other weekdays, the series will also be shared on Facebook Live and on the Carr Center website. A light dinner will be served.

James H. Waldo, Gordon McKay Professor of Practice of Computer Science at the John A. Paulson School of Engineering and Applied Sciences, will be giving a talk titled, "Privacy and Accuracy in Data Science: Do We Need to Choose Only One?"



The accessibility of large data sets and the innovations in machine learning promises huge payoffs in the social sciences and education. But basic ethics, and a number of laws, require that we protect the identities of the persons whose data is in these sets. Different laws set different requirements, none of which can guarantee that the data cannot be re-identified. More worrying is that de-identification of these data sets often results in introducing considerable statistical bias into these sets, causing the conclusions reached by using data science techniques to be called into question.

This talk will center around this trade-off with a particular data set: the data generated by students taking massive open on-line courses (MOOCs) offered by HarvardX. We will discuss what the law requires to share such sets, what we discovered about the de-identified data, and efforts to try to square the circle that combines privacy and accuracy.