Scott Morrison and Isabel Stacey from Diaceutics demonstrate how they are using KNIME to label healthcare data and standardize the process to ensure analysts are working from the same base to aggregate and analyze data quickly and easily to ultimately enable better testing and better treatment for patients.
Diaceutics data delivers the insights that medical experts need, with data sources including lab data, medical claims data, prescription data, and lab demographics. There are multiple data points including patient demographics, physician information, test results and reports, sample requirements, assay sensitivity, and many more. Stakeholders involved are patients, physicians, laboratories, and payers.
With so much data available and so many details within the data itself, an approach was needed to streamline the analysis of this data and empower analysts to add their medical knowledge to further enrich it.
By using KNIME and taking advantage of all the tools, it is possible to cleanse and label the data and then input it in a standard workflow for project-specific analysts to easily use. This saves time and improves project quality.
Since starting at Diaceutics, KNIME has been an integral part of my everyday work - Isabel Stacey, Senior Data Analyst, Diaceutics
KNIME workflows are easy to build and allow a straightforward way to standardize business processes. All nodes and sections can be annotated, which not only provides a self-documenting workflow, but enables a new user to understand what is happening at each stage. One of the biggest benefits of KNIME is the linked component functionality. This ensures that changes can be made to the master workflow, and all versions of that downloaded workflow will get a notification warning the user that a change has been made.
Labeling of Clinical Data Allows Data-Driven Insights
There are several reasons why patient data needs to be labeled. Primarily, healthcare data is transactional. In this raw form, it offers little insight, from which no data-driven insight can be made. Labeling data appropriately allows insights to be uncovered and provides a cleaner and easier dataset to work with. It also allows the creation of groupings and filters. Not only for project-specific internal analysts to work with, but also for the Diaceutics DXRX platform, which clients can directly interact with themselves. The data that needs to be labeled varies and includes time point, disease, disease stage, patient history, biomarker tested, and test method.
For some of these, the task is to standardize or group the existing data. For example, with time points it is possible to group a specific data field by year, quarter, and/or month using a simple SQL statement. However, many parts of the data require a new label, created using a combination of logic and business rules - for example, disease stage.
In terms of labeling the data, straight-forward data can be hard-coded in SQL. However, for most data, control files and flexible SQL coding is used. In KNIME, linked components are used, which are files that contain all the logic for diseases: stage, biomarkers, methodologies, and business rules. A Build SQL Component builds out the SQL for all combinations specified in the control files, or the options that are chosen by the business analyst or on DXRX.
A Standardized, Six-Step Process
With a standardized approach, there is one agreed-upon method. This is easier to use, more consistent, and saves time.
- Connect to the database where patient-level data is located.
- Pull out column names to create filter options in the menu.
- Introduce control files using linked components.
- Combine control files and columns from the original data table to create an interactive menu. Here, the user picks the options they need, the SQL code is automatically built out based on the combination the analyst has selected, and any business rules are built as variables for potential further analysis in the database query.
- Variables from step four are implemented into query, and the analyst can choose to aggregate the data however they need.
- Data is read out and ready to view
The Results: Better Data, Better Testing, and Better Treatment
It’s possible to label healthcare data with many different variables including disease, disease stage, tested Biomarkers, method, and results. Labeled patient data and a standardized process ensures all analysts are working from the same base. Anyone working with the data has the same starting point with same patient cohort and methods. This means data can be analyzed and aggregated at a high level quickly and efficiently. In depth analysis can be performed more easily (when needed) as the patient cohort is readily available. This ultimately leads to better data, better testing, and better treatment.