STAT 188: Variations, Information, and Privacy
Undergraduate Course, Teaching Assistant, Harvard University, Department of Statistics, 2024
Instructor: Xiao-Li Meng
Offered: Fall 2024
Course Abstract
This course delves into the intriguing realms of variations, information, and privacy, with a attention to both their qualitative conceptualizations, such as contextual integrity, and their quantitative specifications, exemplified by differential privacy. Our primary goal is to examine these concepts through a foundational statistical lens, and study statistics from the dual perspectives of creating and limiting information from data. At the heart of our exploration is the concept of variations, serving as a unifying theme that intricately links information (revelatory variations) with uncertainty (obfuscatory variations). This nuanced approach enables us to recognize that the principles governing how we restrict the flow of information mirror those involved in generating information (the traditional focus of statistics).
A considerable portion of the course will focus on an in-depth study of differential privacy. First, we will dissect its mathematical framework through theory and examples, identify five key elements that define a general DP specification, and understand what DP guarantees – and what it does not. Second, we will delve into the intricacies of implementing DP via the case of the 2020 U.S. Census and the social and legal perspectives on privacy this raises. Third, we will learn about how to apply missing data methodologies to properly analyze differentially privatized data. Throughout the course, we will confront the challenge of meaningfully defining and quantifying individual privacy and information.
By the conclusion of this course, students are expected to have developed a deeper appreciation for the complex interplay among variations, information, and privacy. They will be equipped with foundational analytical tools and statistical insights, empowering them to navigate the theoretical and practical challenges associated with revealing and concealing information in data for statistical inference and learning.