Data Science Projects
Below are selected projects designed to demonstrate my technical skills and applied experience in data science, natural language processing, and statistical modeling. Each project highlights a specific set of tools and methods, and links to a live demo or interactive report where available.
Gender Bias in Language Prediction: Humans vs. Large Language Models
This multi-part portfolio project investigates how humans and large language models (LLMs) respond to gendered pronouns in political contexts, particularly when interpreting role nouns like “the next president… she.”
The project extends a large-scale psycholinguistic experiment I conducted during my PhD, comparing human response patterns to model behavior and exploring whether predictive biases can be modified through targeted fine-tuning.
Key highlights:
- Statistical modeling using
brms
and reaction time data from 2,000+ human participants - LLM surprisal analysis using HuggingFace models in Python
- Interactive dashboards built with Shiny (R) and Streamlit (Python)
- Strong focus on reproducibility, transparency, and structured reasoning
📄 View the live project website
🛠 Tools: R, brms, ggplot2, tidyverse, Python, HuggingFace, Streamlit, Shiny, Quarto, GitHub Pages