People

voters

We work with partners to collect and process data from each state's voter file. We then join that data with consumer data (including high quality email addresses and phone numbers) and our own predictive models to provide users with a dataset that can power a variety of targeting, messaging, outreach, and modeling use cases.

Our core national voter file comes in two varieties:

voters_full

A record of all registered voters – taking each state's data at face value – with PII, contact information, geographic and district info, consumer data, vote history, and more joined in.

voters

Using entity resolution, we've identified voters who have multiple registration records across different states and have removed outdated, inactive records. PII from multiple records are combined into the most complete version of the current record.

Schema

Column

Description

voter_id

Unique voter ID

state_voter_id

State supplied voter ID, format differs from state to state

county_voter_id

County supplied voter ID, format differs by county and state

first_name

First name in uppercase with spaces, numbers, and special characters removed. Accented characters have been replaced with non-accented versions

middle_name

Middle name in uppercase with spaces, numbers, and special characters removed. Accented characters have been replaced with non-accented versions

middle_init

Middle initial in uppercase

last_name

Last name in uppercase with spaces, numbers, and special characters removed. Accented characters have been replaced with non-accented versions

name_suffix

Name suffix in uppercase, i.e. JR, III

dob

Date of birth formatted as date (YYYY-MM-DD). DOB comes from voter registration data and commercial data

myob

Month and year of birth formatted as 6 digit integer (YYYYMM). MYOB comes from voter registration data and commercial data. Some states truncate DOB to the first of the month making MYOB better for matching in those cases

yob

Month and year of birth formatted as 4 digit integer (YYYY). YOB comes from voter registration data and commercial data. Some states truncate DOB to the first of the year making YOB better for matching in those cases.

reg_date

Voter registration date formatted as date (YYYY-MM-DD). In situations where state or county updates reg date when a voter record is updated, reg_date is calculated as 30 days prior to earliest recorded vote date

gender

Voter gender from voter registration data. M, F, or NULL

ethnicity

Ethnicity, self reported on the voterfile where available, otherwise modeled. In cases where the model isn't confident, Null. AAPI, Black, Latino, Native American, White

ethnicity_source

Ethnicity source: voterfile, modeled, previous registration

modeled_race_aapi

Indigo race model, score from 0-1 with the probability that a voter is AAPI

modeled_race_black

Indigo race model, score from 0-1 with the probability that a voter is Black

modeled_race_latino

Indigo race model, score from 0-1 with the probability that a voter is Hispanic or Latino/a

modeled_race_native_american

Indigo race model, score from 0-1 with the probability that a voter is Native American

modeled_race_white

Indigo race model, score from 0-1 with the probability that a voter is White

religion

Modeled religion from L2 based on name and census data: Buddhist, Catholic, Christian, Eastern Orthodox, Greek Orthodox, Hindu, Islamic, Jewish, Lutheran, Mormon, Protestant, Shinto, Sikh

state_fips

Registration state fips code, two digit fips as determined by the census formatted as a string

state

State abbreviation

county_fips

Registration address county fips code, three digit fips as determined by the census formatted as a string

county

Registration address county name in uppercase

precinct

Registration precinct name

reg_address

Registration address

reg_city

Registration city name

reg_state

Registration state abbreviation

reg_zip

Registration address zip5 as string

reg_zip4

Registration address zip4 as string

reg_lat

Registration address latitude

reg_long

Registration address longitude

reg_latlong_accuracy

Accuracy of reg_lat and reg_long columns, ordered from most to least accurate: GeoMatch9Digit, GeoMatchRooftop, GeoMatchBuilding, RangeInterpolation, ExactMatch, AverageOfApartments, ParcelCenter, GeoMatch5Digit, KnownAlternateName, DirectionPrefixRemoved, DirectionSuffixRemoved, StreetCenter, Intersection

mailing_address

Mailing address

mailing_city

Mailing city name

mailing_zip

Mailing address zip5 as string

mailing_zip4

Mailing address zip4 as string

mailing_state

Mailing address state abbreviation

phone

Best phone number for voter, prioritizing cell phones over landlines, 9 digits formatted as a string

phone_type

Type of phone: CELL, LANDLINE

phone_confidence_code

Confidence in quality of phone number with 1 being highest confidence and 5 being lowest confidence

phone_cell

Cell phone number, 9 digits formatted as a string

phone_cell_confidence_code

Confidence in quality of phone_cell number with 1 being highest confidence and 5 being lowest confidence

phone_landline

Landline phone number, 9 digits formatted as a string

phone_landline_confidence_code

Confidence in quality of phone_landline number with 1 being highest confidence and 5 being lowest confidence

email

Email address

party

Party identification, based on voterfile and modeled data

party_3way

Party identification grouped into DEM, REP, and IND based on voterfile and modeled data

party_source

Source of party and party_3way data: voterfile, modeled

district_congressional_2020

Congressional district, three digits zero padded, i.e. 002, 011, 024

district_congressional_2010

2010 congressional district, three digits zero padded, i.e. 002, 011, 024

district_congressional_proposed_2024

Proposed 2024 congressional district where available, three digits zero padded, i.e. 002, 011, 024

district_stleg_upper_2020

Upper state legislative district, state senate. For numeric districts, district names are three digits and zero padded, i.e. 003, 021, 041B. For non-numeric district names, strings are uppercase.

district_stleg_upper_2010

2010 upper state legislative district, three digits zero padded

district_stleg_upper_proposed_2024

Proposed 2024 upper state legislative district where available, three digits zero padded

district_stleg_lower_2020

Lower state legislative district, including state house or state assembly depending on the state. For numeric districts, district names are three digits and zero padded, i.e. 003, 021, 041B. For non-numeric district names, strings are uppercase.

district_stleg_lower_2010

2010 lower state legislative district, three digits zero padded

district_stleg_lower_proposed_2024

Proposed 2024 lower state legislative district where available, three digits zero padded

district_stleg_floterial_2020

Floterial districts district, only used in New Hampshire

district_stleg_floterial_2010

2010 floterial districts district, only used in New Hampshire

commercial_hh_donatestocharity

Binary if someone in the household donates to charity based on commercial data

commercial_dwellingtype_duplex

Binary if home is a duplex based on commercial data

commercial_dwellingtype_apartment

Binary if home is an apartment based on commercial data

commercial_dwellingtype_singlefamilyhome

Binary if home is a single family home based on commercial data

commercial_edu_hsonly

Binary if education level is high school or less based on commercial data

commercial_edu_somecollege

Binary if educaiton level is some college based on commercial data

commercial_edu_bachdegree

Binary if education level is bachelors degree based on commercial data

commercial_edu_graddegree

Binary if education level is graduate degree based on commercial data

commercial_hh_income

Estimated household income based on commercial data

commercial_homepurchasedate

Estimated home purchase date based on commercial data

commercial_homepurchaseprice

Estimated home purhcase price based on commercial data

commercial_ispsa

Index of social position for small areas, mix of education and income data that estimates where a voter lies on a sacle from 0 to 9 on the socio-economic ladder

commercial_gun_owner

Binary if voter is a gun owner based on gun registrations and subscriptions to hunging / gun magazines

commercial_veteran

Binary if voter is a veteran based on commercial data

census_block_2020

Census block ID, 15 digits formatted as a string

census_area_medianincome

Median household income in census block based on census data

census_area_medianhousingvalue

Median home value in census block based on census data

census_area_pctspanishspeaking

Pct of census block that is Spanish speaking based on census data

fec_avg_donation_amount

Average donation dollar amount in federal races over the last four election cycles

fec_total_donation_amount

Total dollar amount donated in federal races over the last four election cycles

fec_last_donation_date

Date of most recent donation in a federal race over the last four eleciton cycles

fec_primary_party

Partisanship of candidate or organization who voter donated the largest amount to over the last four election cycles: D, R

modeled_turnout_midterm_primary

Indigo midterm primary turnout model, score from 0-1 with the probability that a voter will turn out to vote in a midterm primary election

modeled_turnout_midterm_general

Indigo midterm general turnout model, score from 0-1 with the probability that a voter will turn out to vote in a midterm general election

modeled_turnout_presidential_primary

Indigo presidential primary turnout model, score from 0-1 with the probability that a voter will turn out to vote in a presidential primary election

modeled_turnout_presidential_general

Indigo presidential general turnout model, score from 0-1 with the probability that a voter will turn out to vote in a presidential general election

modeled_dem_partisanship

Indigo partisanship model, score from 0-1 with the probability that a voter indentifies as a Democrat

g08_voted*

1 if voted, 0 if registered to vote but didn't vote, null if wasn't registered to vote

g08_election_date*

Date of election formatted as date YYYY-MM-DD

g08_ballot_type*

Indicated method of voting if voted. Note that not all states report type of ballot for all historic elections, a null value indicates lack of reporting

*For general, primary, and presidential primary elections from 2008-present day, we have _voted, _election_date, and _ballot_type

columns for each election. We only include statewide and federal elections, and in instances where more than on election of the same type occurred in one election year, we chose the election with the highest turnout level. In situations where presidential and normal primaries are combined into a single election, they are represented as normal primaries. The naming convention for these columns in [election stage - g/p/pp][election year - 09/14/22]_[column type], i.e. pp09_voted, p14_election_date, g22_ballot_type.