The Factors That Do and Do Not Affect Your Health Insurance Cost

Khislat (Haley) Zhuraeva
6 min readMar 10, 2020

--

How insurance companies set premiums:

According to healthcare.gov, under the current healthcare law there are five factors that are considered by insurance companies when setting premiums: Age, Location, Tobacco use, Individual vs. family enrollment, and Plan category.

Research purpose:

In this research we will be using a data-set that includes most of those factors and a few additional ones. The purpose of this research is to quantify the relationship between each of those factors and insurance charges. First, let’s review what the factors are.

  1. Age: Age of a primary beneficiary
  2. Sex: Gender of a primary beneficiary
  3. BMI: Body mass index
  4. Children: The number of children covered by health insurance / number of dependents
  5. Smoker: Yes or No
  6. Region: The beneficiary’s residential area in the US: northeast = 0, northwest = 1, southeast = 2, southwest = 3
  7. Steps: Number of steps per day
  8. Charges: Individual medical cost billed by insurance

Data source:

Data description:

After performing data exploration and cleaning, the head looks like this:

Note that sex and smoker were converted to 0 and 1.

Sex: 0 - Female, 1- Male

Smoker: 0- Not a smoker, 1- Is a smoker

Now let’s take a look at the shape of the DataFrame:

So we have 1338 observations and 8 features.

We can also check how many different unique ages we have:

Data findings:

Now let’s review and visualize the relationship of each factor with charges.

Age: Often times, older people tend to use more health care services than younger people, as there’s an increased chance of developing a chronic condition as people age.

According to healthcare.gov, “premiums can be up to three times higher for older people than for younger ones”.
The easiest way to check whether the data is accurate is to visualize it.

The bar-chart below reflects the number of beneficiaries split by gender. We can see that the number of younger beneficiaries is almost twice higher than number of older ones in the sample.

Even though we have more of younger beneficiaries in our data-set, let’s see whether it affects the correlation between age and charges.

As it was expected, the insurance charges grow with the age. Therefore, the correlation between age and charges is positive.

Sex:

Based on the information provided by healthcare.gov, gender does not affect the insurance cost. The only way to confirm it is to again quantify the relationship between sex and charges.

The visualization above shows that the insurance costs are slightly higher for men than women but overall difference is not significant.

BMI: Another factor that might affect the insurance cost is BMI.

Healthcare.gov does not include this factor into the ones that affect the monthly premium. However, let’s create a scatter-plot and see whether it actually does or does not affect the insurance cost.

Based on the scatter-plot, the higher the BMI, the higher is the insurance premium. Overweight and obese people get charged more than people with normal or healthy weight based on their BMI. So there should be a positive correlation between these two variables.

And in fact, correlation is positive even though it is not that high.

Steps:

We have all heard that 10,000 steps a day is the gold standard for many people. However, does the number of steps we take per day affect the insurance cost? Let’s find out by creating another visualization.

It can also be interpreted this way.

According to both plots, the more steps a person takes, the less the insurance cost is. However, the charges might not be directly related to the number of steps but to the individual’s health overall. A healthier person requires less medical attention than an unhealthy person. And a healthier person is more likely to lead an active lifestyle.

We would expect a negative correlation between these two variables, because the less steps a person takes, the higher the insurance costs are:

Children:

In reference to healthcare.gov, “Insurers can charge more for a plan that also covers dependents (children)”.

Based on the graph, the price increases until the number of dependents reaches three and then goes down.

Even though logic dictates that premium costs should increase with increasing number of dependents, there are often policies mandated by the government that cap spending costs for families. According to health.ny.gov “for larger families the monthly fee is capped at three children”, which explains the curve of the graph.

Smoker:

Healthcare.gov states that charges for smokers can be up to 50% higher than for non-smokers. Here is the visualization of the data.

Based on the graph, tobacco users are charged over 50% more than non-users. As it would be expected, correlation between smoker and charges is positive and very high (0.787).

Region:

Healthcare.gov states that the location plays a significant role in the insurance cost. “Differences in competition, state and local rules, and cost of living account for this.”

According to businessinsider.com, among the states with the highest insurance premiums are: Virginia, North Carolina, Tennessee. This data explains the visualization above that the Southeastern states have higher charges than others.

Wrapping up the findings:

Out of seven variables (age, sex, BMI, steps, children, smoker, and region) there is only one variable that does not play a significant role while setting insurance premium, which is sex. Both men and women get charged similar premiums.

The correlation is the highest between smoker variable and charges meaning that smoking has a bigger impact on the insurance costs than other factors do. But overall, the plot below summarizes the correlation between all seven factors with charges.

1- highest correlation and lighter colors

0- lowest correlation and darker colors

Where to find my code: https://github.com/Khislatz/DS-Unit-1-Build/blob/master/Khislat_Zhuraeva_Medical_Cost_Project.ipynb

--

--