Model documentation
Media consumption
we've surveyed thousands of voters across the country about their media consumption habits, asking whether they've recently used specific platforms, listened to specific podcasts, read specific publications, or watched specific video programs after conducting this survey, we used the responses to build a set of 28 distinct predictive models to estimate the probability a given person would consume content from a particular media outlet in this documentation, we explain how we built these models, describe how they might be used in practice, and review detailed evaluation statistics what we're attempting to predict a growing body of research has linked media consumption habits to intensifying partisan polarization from https //www pnas org/doi/abs/10 1073/pnas 2013464118 and https //academic oup com/joc/article/72/2/214/6548826 to https //onlinelibrary wiley com/doi/full/10 1111/ajps 12886 , it's clear that our increasingly polarized media (and its feedback loop with an increasingly polarized audience) is playing a consequential role in our politics as such, a voter's media consumption diet provides critical insight into what they might believe, what messaging frames might break through to them, what myths they might believe as fact, and what types of targeting might be leveraged to reach them our hope is that these predictions will open the door to smarter campaigning by allowing users to be more aware of these factors the media outlets we focused on, broken out by content type, include audio ben shapiro show, the daily, joe rogan experience, npr, pod save america, tucker carlson show social facebook, instagram, nextdoor, reddit, snapchat, truth social, x, youtube text daily wire, huffington post, local newspaper, msn com, new york times, usa today, wall street journal, yahoo! news video cnn, fox news, last week tonight, local broadcast news, msnbc, national broadcast news use cases as an increasing number of voters turn away from mass media and toward more niche media products created by partisans, conspiracy theorists, and amateur content creators, it's become more difficult to know (1) what information is informing a voter's views, (2) what social attitudes they subscribe to, and (3) how to reach them via paid or earned media pushes these scores are an attempt to ease these challenges for example, you might use these scores to determine where specific targeted audiences consume media so that you could place ads with that outlet or seek an earned media opportunity you might also use these scores to combat conspiracy theories spread by a given outlet, tailor messaging based on the attitudes a voter seems attuned to, or build on positive coverage by activating an outlet's audience survey to gather training data, we asked survey respondents to select which media outlets they generally consume following https //www pewresearch org/journalism/2020/12/08/assessing different survey measurement approaches for news consumption/ , we asked respondents to select outlets they "typically" turn to in order to avoid biases based on recent news events or a respondent's recent routine we declined to ask respondents to quantify their consumption of each outlet – both because of the fluid, ongoing nature of contemporary media consumption and an interest in mitigating nonresponse bias from less engaged media consumers whose available attention spans might be shorter a review of the average consumption rates we found for each outlet is below processing and analysis after collecting the survey data, we put it through a cleaning and preprocessing phase, joining survey responses with respondents' personal traits from the docid\ ktyqegfp6f2n4xi farro we then used https //en wikipedia org/wiki/deep learning models to train dense neural networks that would predict each respondent's typical media consumption choices for each unique model, we scanned our training data for optimal combinations of predictors using a method called https //hal archives ouvertes fr/file/index/docid/755489/filename/prlv4 pdf the deep learning hyperparameters used to configure each model are detailed below audio true unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type social true unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type text true unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type video true unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type evaluation to validate these models, we suppressed 20% of our survey respondents as a hold out group for testing (an additional 10% of responses were suppressed and used as validation samples within the model design process ) we then ran the models on the testing group and compared our predictions to the respondents' actual choices while we focused on a range of evaluation metrics when deciding whether to keep a model, the metrics that mattered most to us were area under the roc curve (auc) the probability that a model would rank a positive value higher than a negative value gain captured by the model the percentage of theoretical lift over random performance that a model achieves huber loss a measure of prediction error with protections against distortion by outliers below, we share these values for each model models with higher aucs and gains, and lower huber losses, are higher performing the models are generally all high quality, though in some cases – such as facebook and youtube – the audience for an outlet is so broad that our models were not able to achieve significant differentiation on the flip side, more niche outlets – such as last week tonight and tucker carlson – have the best evaluation statistics due to their smaller, more unique audiences audio true unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type social true unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type text true unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type video true unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type unhandled content type center unhandled content type center unhandled content type center unhandled content type
