Updated November 2015
In 2013, before the first open enrollment period (OE1) under the Affordable Care Act began, Enroll America partnered with Civis Analytics to create a model to predict the likelihood that an individual is uninsured. This model was updated again last year after OE1 and this year after OE2. Over the last three years, the uninsured model has proven to be timely and accurate, and has played a critical role in our ability to understand who and where the uninsured are and get them the help they need to enroll in coverage.
This year, the uninsured model estimates the insurance status of the 180 million non-elderly American adults in Enroll America’s consumer database. Each individual in the database is given an uninsured score between 0 and 100, representing the probability that the individual is uninsured going into OE3. This score can be used to rank individuals in order of who is most likely to be uninsured, enabling prioritization of outreach efforts. This scoring procedure also enables us to estimate uninsured rates amongst particular demographic and geographic subgroups — from the national level down to ZIP code. Our model estimates the uninsured rate, not the number of people eligible to enroll in coverage.
2015 Model Methodology
The 2015 Enroll America/Civis Analytics uninsured model was constructed from 12,461 completed live phone survey responses collected nationally in May 2015, where each respondent was asked “Are you currently covered by a health insurance plan?” The survey sample size used to build the 2015 uninsured model was much larger than in past years – 8,191 survey respondents in 2014 and 10,020 survey respondents in 2013. This year’s larger survey size allows us to feel more confident about the model, especially given the dynamic nature of the uninsured landscape and the dramatic changes that have occurred over the last two years.
After uninsurance status was collected from the survey, a training dataset was built by matching a random portion of survey responses to a database of consumer information, public and administrative data, past uninsured models, and other sources, containing over 800 data points describing each individual. Matching survey responses to consumer data gives us deep knowledge of the characteristics of both insured and uninsured survey respondents.
Next, a two-step machine-learning process was used to develop the uninsured model from the training dataset. The first step was to determine which data features were important predictors to include in the final model. In the second step, the selected predictors were used to construct a logistic regression model using the ridge regression method. Civis considers this process a best practice for the prediction of binary outcomes like insurance status.
The final model includes 54 variables that cover a range of themes, including individual demographics, socio-economic status, voter history, geography, consumer history, address history, and household characteristics.
To validate the model’s accuracy, the model was used to predict uninsured scores for the portion of initial phone survey respondents who were not included in the training dataset (the holdout sample) and then these scores were compared to actual responses. Comparing predicted uninsured scores with actual uninsured rates from the holdout sample, the model rank-orders correctly and accurately predicts uninsured status for survey participants. For example, among survey respondents, the top decile of the uninsured model predicts that 23.0 percent will be uninsured, which closely matches actual survey response rates (23.4 percent report being uninsured). An example validation plot is shown below.
Figure 1. Model Validation by Uninsured Score Decile.
Applying the Model
The final model was used to assign over 180 million non-senior adults in Enroll America’s consumer database a score from zero to 100, indicating the individual’s modeled probability of being uninsured.
The uninsured score can be interpreted as the probability that the individual does not currently have health insurance. For example, if you were to contact 100 people from the database who were each assigned a score of 30, you could expect to reach 30 uninsured people. Furthermore, individual scores can be averaged to compute the estimated uninsured rate in a geographic region or demographic group.
While every survey methodology has strengths and weaknesses, Enroll America’s uninsured model has several distinct advantages over other data sources. First, the model provides an up-to-date (but still accurate) picture of the uninsured landscape, whereas other data sources can require months and even years of development before uninsured rates can be released. Furthermore, the uninsured model enables Enroll America to estimate uninsured rates for small geographic areas and custom subgroups, which is not feasible using most other data on insurance status in the US.
It is important to note that the model is used to estimate the uninsured rate in a given area, not the number of people who are uninsured, nor the number or percentage of the uninsured who are eligible to enroll in coverage. Some of those who the model predicts are likely to be uninsured may be undocumented, in the Medicaid gap, or not eligible for affordable coverage for other reasons.
This year, Enroll America estimates that 10.7 percent of Americans aged 18-64 are uninsured. This is comparable to other national estimates from Gallup, the Urban Institute, and the Department of Health and Human Services (HHS). Enroll America’s uninsured score estimates that the overall uninsured rate for adults is 10.7 percent. This is slightly lower than HHS’ uninsured estimate of 12.6 percent and the Gallup’s July 2015 uninsured estimate of 11.4 percent, and slightly higher than the March 2015 Urban Institute estimate of 10.1 percent. All of these estimates are based on different survey methodologies and slightly different population samples, so some variation is expected. Gallup measures the uninsured rate of all adults ages 18 and over, while the other surveys listed below are based on adults ages 18 to 64.
Figure 2. Public Estimates of Uninsured Rates.
For more data from our model, please visit our maps, state snapshots, and upcoming blog posts. If you’d like to learn more about how Enroll America uses this data to find and reach uninsured Americans, or about the model behind this visualization, please contact firstname.lastname@example.org.