About Me

Hello, I'm Jenny. Welcome to my website!

I am a current PhD student in the Technology & Operations Management program at Harvard Business School, supported by an NSF Graduate Research Fellowship.

Previously, I was a Pre-Doctoral Researcher in the Computational Social Science Lab at Microsoft Research working with David Rothschild, Jake Hofman, and Dan Goldstein. Before that, I was an undergraduate student at Wellesley College, where I graduated with majors in Computer Science and Economics.

I am broadly interested in improving existing methods used to answer social and policy-relevant questions. Recently, I've been thinking about how LLMs can be used to transform social science research methods.

Research

Shopping Without Shoppers: How AI Agents Navigate Product Assortments

Nil Karacaoglu, Antonio (Toni) Moreno, Jenny S. Wang (alphabetical) — Work in Progress

Abstract

AI agents are increasingly capable of autonomously executing complex, multi-step tasks. In the retail sector, this means consumers can now delegate their shopping to AI agents that browse assortments, evaluate tradeoffs, and even initiate checkout. With companies like OpenAI, Google, and Amazon deploying these tools at scale, the market impact of this technological shift is not yet understood. While humans face high search costs and rarely browse deeply—allowing top-ranked products to capture most sales—AI agents promise to lower search costs and enable more efficient product discovery. Because agents can effectively evaluate large assortments at near-zero marginal cost, classic intuition from search theory suggests that reducing search costs should decrease concentration: consumers (or their agents) should be more willing to explore and discover a wider set of products. Alternatively, AI agents, which are trained on historical corpora, may encode systematic priors or biases toward particular brands or products, which could instead increase concentration. We study whether delegating search to AI agents expands or concentrates demand, and through what mechanisms. To properly understand how AI agents may change user behaviors, we must first understand how humans currently conduct search on e-commerce platforms.

In Your Own Words: Computationally Identifying Interpretable Themes in Free-Text Survey Data

Jenny S. Wang, Aliya Saperstein, Emma Pierson — Work in Progress

Abstract

Free-text survey responses can provide nuance often missed by structured questions, but remain difficult to statistically analyze. To address this, we introduce In Your Own Words, a computational framework for exploratory analyses of free-text survey data that identifies structured, interpretable themes in free-text responses more precisely than previous computational approaches, facilitating systematic analysis. To illustrate the benefits of this approach, we apply it to free-text descriptions of race, gender, and sexual orientation from 1,004 U.S. participants. The themes our approach learns have three practical applications in survey research. First, the themes can suggest structured questions to add to future surveys by surfacing salient constructs—such as belonging and identity fluidity—that existing surveys do not capture. Second, the themes reveal heterogeneity within standardized categories, explaining additional variation in health, well-being, and identity importance. Third, the themes illuminate systematic discordance between self-identified and perceived identities, highlighting mechanisms of misrecognition that existing measures do not reflect. More broadly, our framework can be deployed in a wide range of survey settings to identify interpretable themes from free text, complementing existing qualitative methods.

The Media Bias Detector: A Framework for Annotating and Analyzing the News at Scale

Samar Haider*, Amir Tohidi*, Jenny S. Wang*, Timothy Dörr, David M Rothschild, Chris Callison-Burch, Duncan J Watts — R&R at Science Advances

Abstract

Mainstream news organizations shape public perception not only directly through the articles they publish but also through the choices they make about which topics to cover (or ignore) and how to frame the issues they do decide to cover. However, measuring these subtle forms of media bias at scale remains a challenge. Here, we introduce a large, ongoing (from January 1, 2024 to present), near real-time dataset and computational framework developed to enable systematic study of selection and framing bias in news coverage. Our pipeline integrates large language models (LLMs) with scalable, near-real-time news scraping to extract structured annotations—including political lean, tone, topics, article type, and major events—across hundreds of articles per day. We quantify these dimensions of coverage at multiple levels—the sentence level, the article level, and the publisher level—expanding the ways in which researchers can analyze media bias in the modern news landscape. In addition to a curated dataset, we also release an interactive web platform for convenient exploration of these data. Together, these contributions establish a reusable methodology for studying media bias at scale, providing empirical resources for future research. Leveraging the breadth of the corpus over time and across publishers, we also present some examples (focused on the 150,000+ articles examined in 2024) that illustrate how this novel data set can reveal insightful patterns in news coverage and bias, supporting academic research and real-world efforts to improve media accountability.

Pre-PhD Projects

Project

Research Paper

Thesis

Behind the Scenes: Examining Efficacy of Race Classification Models with a Focus on TV

Honors Thesis

This thesis explores the use of web scraping and machine learning to enhance the predictive power of current race classification models. With the rise of online data as an important source for research, the availability of race labels typically acquired through traditional survey methods is increasingly limited. Instead, we have access to many features that may relate to race, but not the explicit race label itself. Social scientists have turned to name-based racial classifiers as a predictive source for a person's racial identity. However, relying solely on name-based racial classifiers may be an unreliable approach to inferring racial identity due to potential measurement errors. Therefore, I explore and fine-tune alternative methods for racial classification, focusing on three types of data sources: names, images, and unstructured text.

My new methods are relevant given the growing and influential literature that uses name-based classification to study racial cleavages in America, particularly in the social sciences. While my conceptual framework is motivated by applications in economics, the research methodology and analysis rely on computer science techniques. I evaluate the performance of existing name and image-based race classifiers using a well-balanced dataset consisting of 5,201 individuals. Additionally, I experiment with several machine learning models, such as support vector machines, logistic regression, and transformers, by leveraging biographical information from IMDb, an online repository of movies, TV series, and cast and crew members. Furthermore, I compare the strengths of linear and stacked generalization ensemble methods in combining predictions from name, image, and unstructured text-based racial classifiers. Finally, I construct a linear multi-modal ensemble race classifier that leverages the strengths of several models to enhance predictive accuracy. As an illustration of its benefits, I outline the usage of web scraping and my multi-modal classifier in a case study centered on examining the representation of race on television.

View Paper

Proposal

Research Paper

Project

Other

Project

What is Ambient?

TechTogether Hackathon

As a member of a community, whether it's a company, school, or local restaurant, you have a voice. Devices like TP-Link or Google Nest give you the autonomy to control your home temperature from your phone, but ambient allows entire communities to provide feedback on their temperature preferences.

Ambient gives all employees and customers the right to work and spend quality time in a comfortable atmosphere. With respect to all the requests submitted, ambient will analyze and send the data to the thermostat administrator. The administrator does not have to worry what temperature to set to satisfy everyone; Ambient will do it for them. Join Ambient in helping us all 'love the air around us'.

Quick Facts

Technologies Used: Front-End: Figma, Flask, Bootstrap, HTML/CSS, Back-End: AWS DynamoDB, SQLite, Other: Wolfram API, Google Cloud Compute Engine Main Language: Python BootStrap,html,css, Amazon Web service dynamo db for user database and sqlite for management database, Google Cloud Compute Engine to create VM instances and run the entire Flask app in the cloud.

The Inspiration

This project came to life as an answer to the rhetorical question: why is it so cold in here? I worked in a large office building last year, and always came to work with a jacket. It would be 90 degrees in the summer, yet the AC was cranked down to what felt like subzero temperatures. I wasn't the only one who felt this way. From conversations with friends, I learned that many others felt the same way: their offices/schools/place-of-work was too cold in the summer, and too hot in the winter.

How We Built "Ambient"

During TechTogether, our group brainstormed many approaches to solving this problem. We chose to implement it as an app that takes in mass data and calculates the optimal temperature change based on these individual preferences (which we quantified). It has three major components: (1) front end mobile app, designed in Figma, that shows what users see, and a Flask web app for greater control/features. (2) database that collects data from each user tap in the app, which would mean connecting the front end to the AmazonDB database. (3) backend that uses the data from the database to calculate the suggested ambient temperature increase/decrease using the Wolfram API in Python. Finally, we learned how to use the Google Cloud Compute Engine to create VM instances and run our entire Flask app in the cloud. We were able to complete part of each component, but we didn't have enough time to learn how to connect each piece fully together. However, we learned a lot along the way ☺.

View Project

Media Bias Detector

An Examination of the Racial Differences in the Performance of Race Classifiers: Evidence from Television

Behind the Scenes: Examining Efficacy of Race Classification Models with a Focus on TV

Do Women-Only Trains Affect Female Labor Force Participation in Developing Countries?

Citi x Wellesley Case Competition

What is Ambient?

Quick Facts

The Inspiration

How We Built "Ambient"