Home 5 Campaign Resources 5 Modeling: For a lack of voter data infrastructure

Modeling: For a lack of voter data infrastructure

by | Aug 21

Candidates and consultants are always looking for the next big thing in campaigns, and in 2014 it seems that new thing is here: Modeling.

Modeling has been around for decades. The underpinnings of it are in statistical analysis that can pair voter datapoints to polling results or election outcomes, and devise a mathematical statement that can describe some political leaning or likelihood of a voter to turnout and vote. These formulas are then run across the entire voter file giving every voter a score which can then be used as a way to drive campaign resources.

In some states there is little voter data infrastructure, and in the absence of a strong voter file, modeling can be the only tool available to a campaign. The best example could be in states where voters can register at the polls and don’t declare a permanent partisan status – they just pick a party they want to be with on Election Day. In that data-poor landscape, modeling must be used to help you find voters who should be Democrats or Republicans, and who is likely to vote, based on statistical regressions of income, age, ethnicity, geography and other available factors against historic turnout and vote histories within each precinct.

In some states, modeling is the only way to target. But in California with an already robust PDI voter file, the hope with modeling is to create so-called “marginal” gains. A few percentage points better targeting here, and a couple more finely tuned messages over there, and these marginal gains start adding up to real value for the campaign. A marginal gain of 3-5 percentage points might not seem like a lot, but when an election is decided 48% – 52% then you’re really looking at the deciding factor.

Here are some tips from PDI based on our experience with voter models:

Know your data first.
A couple years ago an East Coast firm sent us modeling data splitting the state in half and asked for two mail files based on those scores. We found out later that they had modeled who was likely to have a vote-by-mail ballot. But, of course, PDI actually knows who has a ballot – not by a model, but because we get that data from each county. So, the modeling might have been ingenious, except for the fact that they were trying to model something for which we already had actual data.

In another case a vendor did a model for Asian ethnicity without working with us first to identify all the Asian surname flags we already had, or data on voters who live with a person who has an identified ethnicity or was born in an Asian country or lives in a house that has requested foreign language materials. The mathematics behind the model may have been great for what data they had, but the ability to create the best model was hampered by the vendor not knowing at the front-end all the data that was already written to the PDI Voter File.

In short, when modeling it is important to know your data for two important reasons: 1) to make sure that you’re not wasting time modeling something that is actually in the file already, and 2) to make sure you can have on the front end of your model the best available datapoints from which to create the calculations.

Know why you are modeling
Modeling in California is often done to help campaigns narrowly target voter universes. It can maximize the efficiency of voter contact and save money in the long run if a campaign can send less mail with greater impact. Some examples of why a campaign could model includes:

Targeting Effective Messages – in a polling survey a campaign may find out that the “no new taxes” message works with a particular subset of voters, but actually works against them with another. Modeling the voters that are triggered to support your measure or candidate can help the campaign use a message to a niche set of identifiable voters.

Finding a hard-to-reach group – What if your campaign is trying to reach parents of kids attending public schools? PDI has some commercial data on likely parents, and, of course we can identify voters in households, of particular ages, who own their homes, and other datapoints that could be used to model “Likely Parent of Public School Kid” for a targeted outreach. Same kind of model could target “Likely Blue-Collar Worker” or “Likely Work Commuter” or other things that a campaign could find useful in targeting messages or get out the vote efforts.

Modeling Turnout – This comes with the caveat that PDI’s turnout universes are already highly effective and we have a free voter model score for campaigns who request it, but a campaign could create their own turnout modeling to dig more deeply into a subset of voters – like modeling Democratic Armenians who are infrequent voters and which ones could be most effectively persuaded to vote from phone calls and precinct walking.

Always use a fresh and complete voter file
We are talking about marginal gains, so it would make absolutely no sense to begin with a voter file that is six months old. The “error” rate on an older voter file generally used by other vendors is about 5%. On a statewide file, that means the combination of dead/moved voters on the rolls, and newer registrants missing from the rolls, is about 900,000 voters. The impact of that bad data would totally wash out the potential benefit of a great model.

The truism is “good data in, good data out” and you cannot effectively model if you’re settling for a substandard voter file for either the creation of the model formulas or the writing of model scores back to the voter file.

Get more datapoints
There have been countless stories in national media about how companies have targeted customers based on their consumer behavior, or how Starbucks uses “big data” to determine where to open a new store. The same dataset used by these companies is available matched against the voter file for modeling and voter targeting.

We aren’t hunting for Starbucks locations, but in a political campaign we might want that data about public broadcasting donations to put into a model for a public school bond.   And we might be interested to find out if data on a voter being a surfer was statistically correlated to increased support for a Water Bond. With the PDI voter file matched against commercial data consultants have made finely tuned political targeting based in part on characteristics like a voter being a coupon cutter. There are literally hundreds of fields for different types of socioeconomic data, along with datapoints that describe the purchasing habits, consumer behavior, even hobbies of voters.

More isn’t always better. In the end, the modeling usually comes down to a small set of 5-8 criteria that are the most strongly correlated to an outcome. But, starting with 200 criteria and winnowing down to 5-8 allows the statisticians to identify the very best data to incorporate into the final model.

Know where you are housing your data
PDI provides free model housing, with model scores written to all voters on the voter file, for campaigns that purchase their modeling data and polling samples through us. PDI housed data can be used by your campaign either in the purchase of lists meeting certain model scores, on the online campaign center to do counts, cut universes, and utilize the model scores with other factors, or provided to your pollster for use in crosstabs in current or even past voter surveys.

Housing data offline or with other data vendors who do not have the most complete and up-to-date files can hamper the effectiveness of the model by restricting the ways that data can be integrated to all the campaign efforts, and losing model scores for voters because of old or missing data.