What is Dark Data?

Gartner, the leading industry analysts, define dark data as the information organizations collect, process, and store during regular business activities, but generally fail to use for other purposes. According to IDC Health Insights, up to 90 percent of total data generated in healthcare could be dark data. In healthcare, dark data has traditionally been collected and not used effectively due to siloed departmental governance policies, lack of technology investments and lack of compelling business use cases. All these reasons become mute with the emergence of value-based care and the need to bring cost, quality and outcomes data together for analysis at a very granular, member level.

There is lot of information hidden in the medical and pharmacy claims, lab results, mental health files, billing data, and demographics of your population. Within and across each of these data sources, there is rich information that has so far not been utilized adequately or appropriately. For example, pharmacy claims not only show you which prescriptions have been filled, but have information on fill rates, prescribing physician, and patient co-pay among other things. By analyzing the patterns in this previously unused data, you could get valuable insights into medication adherence, network leakage, and correlation of adherence with social determinants. By cross-mapping these to diagnosis files, you could also uncover opportunities for Risk Adjustment Factor (RAF) score improvement for Medicare members.

We need to start understanding the cost of quality, a novel concept in healthcare under fee-for-service payment models.

So, let us take the use case of understanding the “Cost of Quality” for population under consideration. To understand the “Cost of Quality” one needs to understand the “Cost of Clinical Encounter”, which is huge challenge in the current healthcare system.

As the Head of Product Engineering Services, this is the reality I encounter every day. A typical value-based customer of ours has multiple risk contracts – MSSP ACOs, Medicare Advantage Plan, 5-6 commercial pay-for-performance contracts. We get medical claims and pharmacy claims from these with different levels of granularity sent to us at different times monthly. In addition, we get EMR data extracts from 2-3 different EMRs, along with HRAs, Labs files and mental health files. We also buy external social determinant data from external bureaus for the members.

The challenges for my team are:

How do we turn “Extract-Transform-Load” (ETL) this data within 1-2 weeks ready to be analyzed by our Clinical business intelligence and Data Science team?

The biggest challenge in Predictive and Prescriptive analytics is “Garbage in – Garbage Out” i.e. without clean input, the output is inaccurate and unreliable. .

Finally, the healthcare market cannot pay a huge price just to deliver clean data as a service, considering 70% of the provider and payer market is small and mid-market with limited financial resources

Necessity is the Mother of Innovation

Over the last 6 years, as we were working with customers, we developed an AI-based ETL process with the following capabilities to address the different challenges:

  1. NLP capability: To read notes from memo-style fields and flowcharts to capture relevant data from the EMRs required for our predictive modeling
  2. Mapping algorithms: To map a clinical encounter in EMR that could generate multiple claims which may be filed at different times to different payers. Mapping the clinical encounter to multiple claims requires rule-based learning algorithms to be built. AI is emerging as a powerful way to learn these rules.
  3. Data validation: Ensure all the clinical parameters (BMI, blood pressure, A1C, etc.) are within reasonable range. Imagine if one data element was filled with a distorted number, for example, a BMI of 365. If this data set crept into the huge data models we build for predictive analytics, it could lead to disaster.
  4. Proprietary calculations: Gaps-in-Care calculations based on best evidence clinical care guidelines and ensuring the outputs are within range.
  5. Disease specific risk scores: Ensure the automated quality scripts calculate these scores and catch all erroneous data elements.

As we provide “Insights-as-a-Service”, we cannot put the blame on others for quality of data, timeliness of the data, and outputs from our predictive models. We are paid for outcomes from our predictive and prescriptive analytical services by our long-term customer partners.

Traditional data modeling techniques did not meet our challenges. They took too long, needed lot of people, were prone to errors, and cost too much. This has pushed us to leverage AI-based supervised learning techniques for our internal data cleansing, data mapping, and data validation processes. It has taken us years to build the deep clinical based knowledge algorithms to automate these processes.

To learn more about how Vitreos is utilizing the power of AI to improve the efficiency and efficacy of our customer’s population health efforts, watch our webinar!

Leveraging AI-driven ETL to Shed Light on the Healthcare Dark Data Problem

Leave a Reply

Your email address will not be published. Required fields are marked *