In our prior blog Top 3 Challenges to Adoption of Predictive Analytics Projects in Value-based Care we debunked the myth that the best predictive models should have hundreds of input variables. Instead, a predictive model should utilize a few key input variables that maximize its accuracy.
Build Models that are Specific to Your Population
One of our biggest learnings is that population health is local. Diabetic patients in Detroit, Michigan and Boca Raton, Florida do not have the same healthcare characteristics. Therefore, any population health effort needs to start with building customized predictive models to capture the social and clinical determinants of the local population under consideration. For example, VitreosHealth recently built a customized Medicaid predictive model for a California-based health plan leveraging their historical 3-year data and delivered accuracies of 2.5 times better compared to actuarial based model results published by Society of Actuaries in their report “Accuracy of Claims-based Scoring Models” (October 2016 report).
Factors to Consider Before Deciding on Input Variables
To develop high accuracy models, good AI-based predictive modelling process should consider the right and optimal number of input variables with a good understanding of:
The Business and Clinical Objectives of the Organization
The key to any predictive model in value-based care is to understand the clinical parameters driving the business objectives.
For example, let us consider a common population health objective – predict the top high-risk 5% of the population that will contribute to 65-70% of next year’s costs. What are the driving factors of these high costs and which are avoidable or impactable?
We need to predict the high-risk list monthly to effectively drive care management and member engagement programs. This is critical because in certain populations like Medicare and Medicaid, there is 7%-9% churn rate on this high-risk list monthly, which means if you start with a traditional risk stratification list based on historical costs and check the high-risk list at the end of the year, there is a 90% churn in the membership mix in the high-risk cohort. This is one of the major reasons why most of the cost savings projects by care management programs don’t translate to real savings as measured by financial community and acknowledged by most CFO’s. Also, the leadership wants to know which care management and member engagement programs to invest in for the highest return on investment within the calendar year.
Available Data Sets
Do we have only claims data – Medical, Pharmacy, Mental Health – or do we also have clinical encounter data from the provider’s EMR? What about lab data, socio-economic, and behavioral health data? This determines what questions we can answer at a member level and which specific gaps-in-care to focus on when designing the care management programs.
Clinical experts need to understand the drivers of the poor clinical outcomes that lead to high cost if they want their models to be actionable. Also, understand the differences between correlation and causation. Well-designed models have variable that answer What and Why. Why are these members risky and what can I do about it through my care management and member engagement interventions? Currently in the industry, most of the organizations use national actuarial-based model risk scores to sort members from high to low scores based on historical utilization. The care managers wonder, “Why are the members high priority now?” and“What do the risk scores mean?”.
Is it Medicare, Medicaid, self-insured, or commercial populations? The controllable and non-controllable variables differ depending on the population and design of the health plan. That is the reason why VitreosHealth builds customized models for each customer population and does not use national actuarial-based models.
How Do We Decide Which Inputs to Use?
So how do you tackle this problem? There are many approaches, but the underlying principles remain the same. Let me share with you the Vitreos approach. Below are the sequential steps we follow at VitreosHealth:
- Identify all the members that have bad outcomes in the target population. A bad outcome for our example can be a PMPM cost of $1000 or higher.
- Use the Vitreos Health Risk Analyzer, a multi-dimensional OLAP model to plot all these high cost members on a grid for the most recent time point as shown in Figure 1.
- Outcomes (PMPM greater than $1000) on X-Axis
- Member Clinical Risk on Y-Axis
- Member Behavioral Risk on Z-axis
- Once we identify the high cost members, we use historical data to track their positions on the grid for the last 24-36 months. This gives us ‘Mover’ cohorts and then we can understand the common characteristics of the members in each unique cohort.
- Our data scientists, clinicians, care managers, and analysts all come together to discuss the profile of each of the ‘Mover’ cohorts to understand the nature of the drivers of these movements. These are some of the most heated discussions we see in our conference room. We debate which variables need to be input into the model, what type of supervision we need to provide in the AI-modeling process, etc. Most of these discussions start with the questions:
- So what?
- What can I do with the information?
- Is it a correlation or a causation?
- Can I sacrifice predictive accuracy for prescriptive accuracy?
- Is it avoidable or impactable?
- The outputs of these meetings provide the design framework to our data scientists before they start the AI-driven modelling process.
The next phase is the AI-modelling. Kirit Pandit, Cofounder and Head of AI-Modeling at VitreosHealth will share some of the insights on the art and science of using AI-based predictive modeling techniques for value-based care in Part 4 of this predictive modeling blog series.