This assignment is intended to cultivate a habit of evidence-based AI skepticism. By experiencing firsthand the difficulty of the task that facial recognition systems are asked to perform, and by auditing an AI’s own argumentation against primary research data, students develop the capacity to evaluate AI-generated content not as a finished product, but as a draft subject to human scrutiny.
In 2018, Joy Buolamwini—a researcher at the MIT Media Lab and founder of the Algorithmic Justice League—published a landmark study that would reverberate across the technology industry, the legal community, and beyond. In “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” (Buolamwini & Gebru, 2018), Buolamwini revealed that three leading commercial facial recognition systems—built by Microsoft, IBM, and Face++—misclassified darker-skinned women at dramatically higher rates than lighter-skinned men. While the systems performed with near-perfect accuracy on lighter-skinned male faces, error rates on darker-skinned female faces reached as high as 34.7%. The disparity was not a bug. It was a consequence of the data used to train the systems.
Buolamwini’s methodology centered on a benchmark she constructed herself: the Pilot Parliaments Benchmark (PPB), a dataset of 1,270 parliamentarian headshots balanced across gender and the Fitzpatrick skin tone scale—a dermatological classification system that ranges from Type I (lightest) to Type VI (darkest). Crucially, Buolamwini did not rely solely on automated classification to label gender in her dataset. She coded each image by hand, a process that underscored one of the project’s most provocative findings: even humans, examining a simple headshot in isolation, frequently struggle to determine gender with confidence. For AI systems to perform this task reliably and equitably, the challenge is far harder than it might first appear.
The Gender Shades dataset is publicly accessible at gs.ajl.org. Visitors to the site can browse the benchmark images and examine disaggregated performance data for each of the commercial systems Buolamwini evaluated. Since its publication, the study has prompted vendor responses, federal legislative hearings, and ongoing debate about whether facial recognition technology should be deployed in high-stakes settings at all—and if so, under what conditions. Buolamwini expanded this body of work in her 2023 book Unmasking AI: My Mission to Protect What Is Human in a World of Machines, a memoir-driven account of algorithmic bias and the human cost of invisible errors (Buolamwini, 2023).
This assignment unfolds in three phases. Each phase builds directly on the one before it. Read all three phases before beginning.
Write a short reflection of 400–600 words that does three things:
The following criteria reflect the core learning outcomes demonstrated through this assignment: students' ability to critically examine AI-generated claims using empirical evidence, recognize and evaluate algorithmic bias, engage directly with primary data sources, and connect personal observations to broader ethical and institutional questions surrounding AI deployment.
Student completed the manual classification exercise and wrote a genuine, specific reflection on the experience. Evidence of honest uncertainty is valued over false confidence.
Student’s institutional context is clearly defined, and the prompt given to the AI is specific enough to elicit substantive argumentation on both sides.
Student’s position is clearly stated and directly supported by specific benchmark figures from the Gender Shades dataset. Vague references to “bias” without data citations do not meet this criterion.
Student identifies a concrete gap or inaccuracy in the AI’s argumentation and quotes the AI’s language directly. The critique is analytical, not impressionistic.
Student draws a meaningful link between their Phase 1 experience and their broader argument. This connection should feel specific to the student’s own encounter with the dataset, not generic.
Algorithmic Justice League. (2018). Gender Shades. Retrieved from https://gs.ajl.org/
Buolamwini, J. (2016). How I’m fighting bias in algorithms [TED Talk]. TED Conferences. https://www.ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms
Buolamwini, J. (2023). Unmasking AI: My mission to protect what is human in a world of machines. Random House.
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15.
