Media consumption
We've surveyed thousands of voters across the country about their media consumption habits, asking whether they've recently used specific platforms, listened to specific podcasts, read specific publications, or watched specific video programs.
After conducting this survey, we used the responses to build a set of 28 distinct predictive models to estimate the probability a given person would consume content from a particular media outlet.
In this documentation, we explain how we built these models, describe how they might be used in practice, and review detailed evaluation statistics.
A growing body of research has linked media consumption habits to intensifying partisan polarization. From eroding trust in mainstream news and leading audience members to fortify their own social filter bubbles to radicalizing audience members into increasingly extreme positions, it's clear that our increasingly polarized media (and its feedback loop with an increasingly polarized audience) is playing a consequential role in our politics.
As such, a voter's media consumption diet provides critical insight into what they might believe, what messaging frames might break through to them, what myths they might believe as fact, and what types of targeting might be leveraged to reach them. Our hope is that these predictions will open the door to smarter campaigning by allowing users to be more aware of these factors.
The media outlets we focused on, broken out by content type, include:
- Audio: Ben Shapiro Show, The Daily, Joe Rogan Experience, NPR, Pod Save America, Tucker Carlson Show
- Social: Facebook, Instagram, Nextdoor, Reddit, Snapchat, Truth Social, X, Youtube
- Text: Daily Wire, Huffington Post, local newspaper, MSN.com, New York Times, USA Today, Wall Street Journal, Yahoo! News
- Video: CNN, Fox News, Last Week Tonight, local broadcast news, MSNBC, national broadcast news
As an increasing number of voters turn away from mass media and toward more niche media products created by partisans, conspiracy theorists, and amateur content creators, it's become more difficult to know (1) what information is informing a voter's views, (2) what social attitudes they subscribe to, and (3) how to reach them via paid or earned media pushes.
These scores are an attempt to ease these challenges. For example, you might use these scores to determine where specific targeted audiences consume media so that you could place ads with that outlet or seek an earned media opportunity. You might also use these scores to combat conspiracy theories spread by a given outlet, tailor messaging based on the attitudes a voter seems attuned to, or build on positive coverage by activating an outlet's audience.
To gather training data, we asked survey respondents to select which media outlets they generally consume. Following guidance from Pew Research, we asked respondents to select outlets they "typically" turn to in order to avoid biases based on recent news events or a respondent's recent routine. We declined to ask respondents to quantify their consumption of each outlet – both because of the fluid, ongoing nature of contemporary media consumption and an interest in mitigating nonresponse bias from less engaged media consumers whose available attention spans might be shorter.
A review of the average consumption rates we found for each outlet is below.
After collecting the survey data, we put it through a cleaning and preprocessing phase, joining survey responses with respondents' personal traits from the Stacks Voter File.
We then used deep learning models to train dense neural networks that would predict each respondent's typical media consumption choices. For each unique model, we scanned our training data for optimal combinations of predictors using a method called Variable Selection Using Random Forests.
The deep learning hyperparameters used to configure each model are detailed below.
Model | Loss | ||
---|---|---|---|
Ben Shapiro | exponential | sgd | binary crossentropy |
The Daily | hard sigmoid | adam | binary crossentropy |
Joe Rogan | softplus | adam | binary crossentropy |
NPR | sigmoid | nadam | binary crossentropy |
Pod Save America | mish | adam | binary focal crossentropy |
Tucker Carlson | sigmoid | adam | binary crossentropy |
Model | Activation | Optimizer | Loss |
---|---|---|---|
softplus | rmsprop | binary crossentropy | |
exponential | adamax | binary crossentropy | |
Nextdoor | softplus | adamax | poisson |
exponential | adam | binary crossentropy | |
Snapchat | selu | sgd | binary focal crossentropy |
TikTok | sigmoid | rmsprop | poisson |
Truth Social | sigmoid | adam | poisson |
X | sigmoid | rmsprop | poisson |
Youtube | sigmoid | nadam | binary focal crossentropy |
Model | Activation | Optimizer | Loss |
---|---|---|---|
Daily Wire | hard sigmoid | nadam | binary crossentropy |
Huffington Post | exponential | adam | poisson |
Local newspaper | sigmoid | adam | binary crossentropy |
MSN.com | hard sigmoid | adam | poisson |
New York Times | hard sigmoid | adam | binary focal crossentropy |
USA Today | sigmoid | sgd | binary crossentropy |
Wall Street Journal | sigmoid | adamax | poisson |
Yahoo! News | hard silu | adamax | binary focal crossentropy |
Model | Activation | Optimizer | Loss |
---|---|---|---|
CNN | softplus | sgd | binary crossentropy |
Fox News | exponential | adamax | poisson |
Last Week Tonight | hard sigmoid | adam | binary crossentropy |
Local broadcast News | softplus | sgd | poisson |
MSNBC | exponential | sgd | binary focal crossentropy |
National broadcast news | sigmoid | nadam | binary focal crossentropy |
To validate these models, we suppressed 20% of our survey respondents as a hold-out group for testing. (An additional 10% of responses were suppressed and used as validation samples within the model design process.) We then ran the models on the testing group and compared our predictions to the respondents' actual choices.
While we focused on a range of evaluation metrics when deciding whether to keep a model, the metrics that mattered most to us were:
- Area under the ROC curve (AUC): The probability that a model would rank a positive value higher than a negative value.
- Gain captured by the model: The percentage of theoretical lift over random performance that a model achieves.
- Huber loss: A measure of prediction error with protections against distortion by outliers.
Below, we share these values for each model. Models with higher AUCs and gains, and lower Huber losses, are higher performing.
The models are generally all high-quality, though in some cases – such as Facebook and Youtube – the audience for an outlet is so broad that our models were not able to achieve significant differentiation. On the flip side, more niche outlets – such as Last Week Tonight and Tucker Carlson – have the best evaluation statistics due to their smaller, more unique audiences.
Model | AUC | Gain captured | Huber loss |
---|---|---|---|
Ben Shapiro | 0.66 | 0.31 | 0.15 |
The Daily | 0.63 | 0.26 | 0.15 |
Joe Rogan | 0.72 | 0.43 | 0.15 |
NPR | 0.70 | 0.39 | 0.13 |
Pod Save America | 0.69 | 0.38 | 0.16 |
Tucker Carlson | 0.75 | 0.51 | 0.14 |
Model | AUC | Gain captured | Huber loss |
---|---|---|---|
0.57 | 0.13 | 0.14 | |
0.66 | 0.31 | 0.15 | |
Nextdoor | 0.65 | 0.31 | 0.16 |
0.74 | 0.48 | 0.15 | |
Snapchat | 0.70 | 0.41 | 0.15 |
TikTok | 0.68 | 0.36 | 0.15 |
Truth Social | 0.70 | 0.40 | 0.16 |
X | 0.64 | 0.29 | 0.14 |
Youtube | 0.55 | 0.11 | 0.15 |
Model | AUC | Gain captured | Huber loss |
---|---|---|---|
Daily Wire | 0.63 | 0.26 | 0.13 |
Huffington Post | 0.62 | 0.25 | 0.16 |
Local newspaper | 0.60 | 0.21 | 0.15 |
MSN.com | 0.62 | 0.24 | 0.16 |
New York Times | 0.68 | 0.36 | 0.10 |
USA Today | 0.57 | 0.14 | 0.16 |
Wall Street Journal | 0.60 | 0.20 | 0.16 |
Yahoo! News | 0.60 | 0.19 | 0.16 |
Model | AUC | Gain captured | Huber loss |
---|---|---|---|
CNN | 0.62 | 0.24 | 0.15 |
Fox News | 0.75 | 0.50 | 0.07 |
Last Week Tonight | 0.74 | 0.49 | 0.15 |
Local broadcast News | 0.61 | 0.22 | 0.14 |
MSNBC | 0.65 | 0.31 | 0.14 |
National broadcast news | 0.63 | 0.27 | 0.15 |