NLP & Data Science Projects

Below are selected projects designed to demonstrate my technical skills and applied experience in data science, natural language processing, and statistical modeling. Each project highlights a specific set of tools and methods, and links to a live demo or interactive report where available.


Building on Cresta’s FECT paper: A QUD-based approach to factuality evaluation with conversational transcripts

I extended the 3D paradigm proposed in Cresta’s KDD paper by Hagyeong Shin and her colleagues. Instead of the compositional semantics approach that drives the 3D paradigm, I direct the LLM to parse the conversational transcript through the lens of the Question Under Discussion framework. Initial results suggest that the challenge of evaluating the factuality of analytical claims about conversational data does benefit from a QUD-based transformation of the transcript, with performance roughly matching that from the 3D paradigm proposed in the original paper.

Key highlights:

📄 View the live project website

📂 GitHub repository

🛠 Tools: Python, OpenAI API, ipynb reports, matplotlib/seaborn, LLM prompt engineering


Gender Bias in Language Prediction: Humans vs. Large Language Models

This multi-part portfolio project investigates how humans and large language models (LLMs) respond to gendered pronouns in political contexts, particularly when interpreting role nouns like “the next president… she.”

The project extends a large-scale psycholinguistic experiment I conducted during my PhD, comparing human response patterns to model behavior and exploring whether predictive biases can be modified through targeted fine-tuning.

Key highlights:

📄 View the live project website

📂 GitHub repository

🛠 Tools: R, brms, ggplot2, tidyverse, Python, HuggingFace, Streamlit, Shiny, Quarto, GitHub Pages