Gender Bias in Word-Level Predictions: Human Behavior vs. Large Language Models

Overview

This project investigates how both humans and large language models (LLMs) interpret gendered pronouns in political contexts—an area where subtle biases can shape comprehension and decision-making.

It builds on behavioral data I collected during my PhD, extending the analysis to explore whether LLMs exhibit the same predictive biases as human readers, and whether those expectations can be modified through targeted fine-tuning.

The goal is to demonstrate not only key findings about language and bias, but also the analytical, modeling, and communication skills that data scientists use to extract meaningful insights from complex datasets.

The project is divided into three parts, each answering a distinct research question and demonstrating different technical tools and methodologies. Interactive dashboards accompany each part to make the results accessible for non-technical audiences.

Note

The behavioral data I analyze in Part 1 of this project is the result of joint work I did during my PhD with my colleagues at MIT: Roger Levy, Titus von der Malsburg, Veronica Boyce, and Chelsea Ajunwa.

Part Focus Status
Part 1 Analysis of human behavioral data Analysis complete; write-up in progress
Part 2 LLM surprisal analysis on same stimuli Design finalized; implementation next
Part 3 Low-shot fine-tuning of LLMs Exploring methods and model scope
Note

This project is being developed incrementally. Check back regularly for updates, or follow along on GitHub for real-time progress.


Part 1: Human Behavior in Sentence Processing

Research question:
Do human readers show predictive biases based on gendered expectations in role nouns (e.g., “the next president… she”)?

Methods and tools:
- Data cleaning and pre-processing in R
- Mixed-effects modeling with brms
- Data visualization with ggplot2
- Interactive Shiny app for exploring response times

Links:
- Full report (Stay tuned)
- Shiny app (Stay tuned)


Part 2: How Do LLMs Compare? (Planned)

Research question:
Do LLMs like GPT-4 or LLaMA exhibit human-like gender biases when processing the same sentence structures?

Planned methods and tools:
- Prompting and probing HuggingFace models
- Surprisal calculation and token-level analysis
- Streamlit dashboard for exploring model outputs

Links:
- Full report (Coming soon)
- Streamlit app (Coming soon)


Part 3: Fine-Tuning for Bias Mitigation (Coming soon)

Research question:
Can targeted fine-tuning reduce bias in LLM behavior while preserving linguistic competence?

Planned methods and tools:
- Data curation and minimal exposure training
- Fine-tuning open-source LLMs
- Re-running behavioral tests on fine-tuned models

Links:
- Full report (Coming soon)
- Streamlit app (Coming soon)


Project Repository

All code, data, and reports are publicly available in this GitHub repository.