Model documentation

Presidential support (2024)

we've surveyed thousands of voters across the country about who they plan to vote for in the november 2024 general election for us president we have used those responses to build a predictive model estimating the probability that a given person would cast a vote for kamala harris, presuming they turn out to vote in this documentation, we explain how we built this models, describe how it might be used in practice, and review detailed evaluation statistics use cases this model allows organizations to identify voters who are very likely to support kamala harris, which could be useful in turnout, fundraising, or volunteer recruitment it could also be used in conjunction with down ballot support models to identify voters who might support a down ballot democrat, but not vice president harris (or vice versa) this model could also be used to avoid likely supporters of kamala harris, which could be useful in a persuasion program for that use case, this score could be used in conjunction with our docid\ apoz0 krawr2y35fpb036 scores note mid range support scores do not indicate persuadability instead, they indicate that we lack sufficient information to confidently say a given voter is certainly a harris supporter or not in theory, all voters either would or would not vote for harris if given the chance this model attempts to identify the probability a voter is on one end of that spectrum or the other – not where a voter's support falls in a gradient of intensity survey our initial survey responses for this model were collected between august 5 and august 12, 2024 – after the republican national convention and the selection of tim walz as harris's running mate, but before the democratic national convention and the quasi suspension of robert f kennedy jr 's campaign (to handle the rfk factor, we removed rfk supporters from our training sample ) as of august 12, our survey indicated that 50% of registered voters would vote for kamala harris, 42% would vote for donald trump, 4% would vote for robert f kennedy, jr , and 4% would vote for some other candidate (however, compared with a similar survey we conducted in june, prior to harris becoming the democratic nominee, we saw significant differences in partisan nonresponse so while these results are useful for modeling, the toplines might require a grain of salt ) we intend to gather additional survey responses in september and october to keep the model up to date processing and analysis after collecting the survey data, we put it through a cleaning and preprocessing phase, joining survey responses with respondents' personal traits from the docid\ ktyqegfp6f2n4xi farro we also joined in economic data from our docid nq6ctd2rqpwxrc6mqjpa dataset, historic local election results from our docid\ exkuwmxoechuugx3xi0aw dataset, and recent zip code level donation trends from our docid\ yb2tl2ci suew zibh6wr dataset (these additional contextual factors significantly improved the accuracy of the model ) we scanned our training data for optimal combinations of predictors using a method called https //hal archives ouvertes fr/file/index/docid/755489/filename/prlv4 pdf we then used https //en wikipedia org/wiki/deep learning models to train dense neural networks that would predict whether each respondent would vote for vice president harris in november the final model we arrived at used the sigmoid activation function, rmsprop as its optimizer, and binary crossentropy as its loss function it was trained with a batch size of 8 and several dropout layers to reduce overfitting the model has six dense layers with descending units most layers use the relu activation function evaluation to validate this model, we suppressed 20% of our survey respondents as a hold out group for testing (an additional 10% of responses were suppressed and used as validation samples within the model design process ) we then ran the model on the testing group and compared our predictions to the respondents' actual presidential vote choices while we focused on a range of evaluation metrics during this phase, the metrics that mattered most to us were area under the roc curve (auc) the probability that a model would rank a positive value higher than a negative value gain captured by the model the percentage of theoretical achievable lift over random selection that a model captures huber loss a measure of prediction error with protections against distortion by outliers the model performs very well against hold out data, the model's auc was 0 88, it captured 75% of theoretical gain, and its huber loss was 0 07 according to a lift chart, a voter in the top decile of scores would be about 60% more likely than a random voter to support kamala harris for president