Model documentation

Issue salience & support

in august 2024, we surveyed thousands of voters across the country to understand how they would manage trade offs between different policy positions when deciding which candidate to vote for we focused on abortion access, crime, immigration, labor economics, and “wokeness” in schools we also tested the degree to which a candidate’s party might matter more or less than their stances on a particular issue after conducting this survey, we used the responses to build a set of predictive models to estimate each voter’s take on the aforementioned issues – with separate models for voters’ ideological positioning on a given issue (i e , “support”) and the importance they place on that issue relative to others (i e , “salience”) in this documentation, we explain how we built these models, describe how they might be used in practice, and review detailed evaluation statistics for each issue’s support and salience models what we're attempting to predict while vote choices are motivated by a range of factors, research indicates that candidates' policy stances matter more than the conventional wisdom might suggest however, the mechanisms behind this influence are complex candidates and voters each hold a range of (often conflicting) views on a given set of issues, with some holding greater importance than others for example, imagine a voter who supports abortion access and is highly motivated by that issue they are facing a choice between a democratic state legislative candidate who opposes abortion rights and a republican who supports them however, on all other issues, the voter aligns much more closely with the democrat and to complicate matters further, the voter knows the democratic party generally supports abortion rights while the republican party generally opposes them, which means it might be more strategic to help the democrats gain a majority even if this one candidate is imperfect on the issue who is this voter most likely to support? which arguments are most likely to sway them? to better understand these dynamics, we've assembled a suite of six "support" and "salience" scores – indicating the likelihood that a voter (1) holds progressive stances in a given issue area and (2) considers a given issue area more important than others when determining which candidate to vote for use cases issue support scores have been a staple in the progressive data ecosystem for decades they are often used to segment audiences so that voters get exposed to messages they're more likely to support but audiences are rarely segmented based on the issues they're likely to care the most about – because we haven't had effective tools to drive that segmentation let's say a given candidate is seen as too soft on crime these scores would allow a campaign to target only the voters most likely to (1) prefer a tough on crime stance and (2) choose a candidate based primarily on their approach to crime in this scenario, the campaign would avoid increasing the salience of an issue that isn't working for them by only reaching those for whom the issue is already salient these scores could also help campaigns expand their persuasion audiences in smarter ways for example, our scores indicate that millions of republicans support abortion access and care deeply about the issue those voters might appear to be deeply entrenched in their views according to a normal candidate support model – but the right message could potentially win them over survey to gather training data, we deployed a national conjoint analysis https //online hbs edu/blog/post/what is conjoint analysis survey to understand how voters would make trade offs between different positions held by hypothetical democratic and republican candidates the issue areas included abortion access, crime, immigration, labor economics, and “wokeness” in schools in each area, we drafted several positions a real candidate might hold and assigned those positions to a left to right ideological scale survey respondents saw five randomly constructed match ups between two candidates, with each holding a unique position on the five aforementioned issue areas we were then able to analyze which issues drove voter decisions (or alternately, whether voters seemed motivated only by party affiliation ) the chart below shows the average support for a candidate with each issue stance we tested processing and analysis after collecting the survey data, we put it through a cleaning and preprocessing phase we summarized the complex conjoint responses into a set of numeric values for each voter indicating their ideological position on each issue area and the relative importance they appeared to attach to a candidate’s views in each issue category we then used deep learning https //en wikipedia org/wiki/deep learning models to train dense neural networks that would predict each voter's issue stances and relative prioritizations for each unique model, we scanned our training data (consisting of the survey responses and personal traits from the voters modeling docid\ ktyqegfp6f2n4xi farro ) for optimal combinations of predictors using a method called variable selection using random forests https //hal archives ouvertes fr/file/index/docid/755489/filename/prlv4 pdf the deep learning hyperparameters used to configure each model are detailed below salience scores model activation http //keras io/api/layers/activations optimizer http //keras io/api/optimizers loss https //keras io/api/losses/ abortion hard silu nadam mean abs error crime hard silu nadam huber immigration softplus nadam mean sq log error labor selu sgd mean sq error party mish nadam mean abs error wokeness hard silu rmsprop huber support scores model activation http //keras io/api/layers/activations optimizer http //keras io/api/optimizers loss https //keras io/api/losses/ abortion selu nadam mean sq error crime hard silu sgd mean abs error immigration sigmoid adam mean sq error labor sigmoid nadam mean abs error party hard silu sgd mean abs error wokeness sigmoid adamax cosine similarity evaluation to validate these models, we suppressed 20% of our survey respondents as a hold out group for testing (an additional 10% of responses were suppressed and used as validation samples within the model design process ) we then ran the models on the testing group and compared our predictions to the respondents' actual choices while we focused on a range of evaluation metrics when deciding whether to keep a model, the metrics that mattered most to us were area under the roc curve (auc) the probability that a model would rank a positive value higher than a negative value gain captured by the model the percentage of theoretical lift over random performance that a model achieves mean absolute error (mae) the average distance between actual values and those predicted by the model below, we share these values for each model models with higher aucs and gains, and lower maes, are higher performing you'll notice that some models are better than others for example, the abortion salience and support models are quite good – meaning that it's relatively easy to distinguish between both (1) supporters and opponents of abortion rights and (2) those who care deeply about abortion rights and those who do not in other cases, salience is easier to predict than support (e g , the salience of wokeness in schools is more differentiated that support for policies related to that topic) and vice versa (e g , the salience of issues related to labor economics and immigration are less differentiated than the views people hold in each issue area) in these cases, weaker evaluation metrics indicate a less polarized electorate, though the models still provide some sorting utility salience scores model auc gain captured mae abortion 0 69 0 38 0 26 crime 0 68 0 35 0 26 immigration 0 59 0 16 0 29 labor 0 55 0 10 0 32 party 0 57 0 15 0 20 wokeness 0 61 0 21 0 29 support scores model auc gain captured mae abortion 0 72 0 44 0 24 crime 0 66 0 33 0 19 immigration 0 75 0 51 0 24 labor 0 77 0 54 0 28 party 0 76 0 52 0 20 wokeness 0 54 0 09 0 22

Model documentation

Media consumption