Spark and laboratory data | Diaceutics

Spark and laboratory data

July 25th, 2018

Spark and laboratory data

At Diaceutics we are using a number of tools to leverage our data and to generate actionable insights.  We mine data to forecast market trends and to understand the changing biomarker landscape.  We are able to understand the use of new biomarker targets over time, review the dissemination of new companion diagnostics across the globe and build an understanding of the patient journey and related events (numbers of tests and timeliness, for example) linked to health outcomes. There are many methods that can be used to generate valuable insights through use of lab data.  

A key technology we leverage is Apache Spark®.  We use this big data technology to analyze our proprietary laboratory test data for our pharmaceutical clients.  We incorporate Spark in our regular processes to handle weekly uploads to our data warehouse and it takes just one hour to parse the entirety of our historical data.  This is possible through use of parallel processing using Amazon Web Services (AWS) as a cloud hosting provider to scale up data processing as needed.  When we deploy Spark we efficiently use the right number of machines to process data, while Amazon manages the machines and software.  Furthermore, Spark is extensible, featuring machine learning libraries that can be used to mine deeper insights into our laboratory data.  

We have used Spark for both analysis as well as for processing. One example of how we leverage processing power is to aggregate all biomarkers tested per patient over their entire history.  For analysis, we can take the list of tests performed and calculate probabilities of how likely one test will lead to another.  We can also take the list of biomarkers and cluster patients according to treatment histories.  Having one place to bulk process records is valuable as we discover deep insights using our machine learning algorithms through reviews of many different slices of data.   

Spark can be used to correct anomalous data and augment gaps in data.  Data that is manually entered or missing can be forecast or given context.  For example, body sites of a biopsy are often manually entered by a physician or pathologist and at times there are spelling errors that may prevent accurate tracking of a sample’s origin.  When there are gaps in information, Spark allows us to cross reference a body site for a given test event with the entire history of a sample, allowing us to interpolate where in the body the sample originated.  

The possibilities for using Spark are great and many large companies use Spark regularly, including Alibaba, Amazon, Autodesk, Tencent and TripAdvisor.  For example, Trip Advisor is capable of processing every review that has been added to their site through use of Spark – they apply natural language processing to the reviews to make the content more useful.  

Spark has been around since 2012 and its use in the health sciences continues to grow.  At Diaceutics, we are continually looking for new ways to leverage the latest technologies to actively break down barriers to deliver better testing, therefore better treatment, for patients.  

#ApacheSpark #Labdata #Diaceutics #MachineLearning #BigData  


Webinars & Podcasts

July 5th, 2019
Liquid Biopsy in NSCLC
August 9th, 2018
Podcast: Oncology Patient Research
Why do we need to talk biomarkers with patients? Senior Director of Market Research at Diaceutics, Marianne Fillion, recently spearheaded an effort to gather insights directly from oncology patients to get an understanding of what they know about ...
View all

Expert Insights

April 19th, 2018
The CMS National Coverage Decision on NGS
I. Introduction On March 16, 2018, the Centers for Medicare and Medicaid Services (CMS) finalized a National Coverage Determination (NCD) that cove...
January 19th, 2018
What does the EU IVD Regulation mean for companion diagnostics and LDTs?
Dave Smart, PhD, Director at Diaceutics, discusses the introduction of the EU IVD Regulation. While it is considered a necessary step, the Regulati...
View all expert insights

Competitive Benchmarking Reports

March 16th, 2018
PM Readiness Report 2018 Summary
March 14th, 2017
Pharma Readiness for Diagnostic Integration 2017
View all reports


June 25th, 2019
FLT3 testing in relapsed Acute Myeloid Leukemia setting is becoming increasingly common, but laboratory turnaround times (TAT) may be a barrier to treatment with second generation FLT3 inhibitors
The treatment landscape in AML has developed at an astonishing pace in the last 3 years, with 5 therapies being approved by the FDA. FLT3 inhibitors gilteritinib (Gil) and quizartinib (Quiz) present an unprecedented opportunity for improved surviv...
May 23rd, 2019
Diaceutics reviews the ongoing debate on diagnostics reform legislation
Thought leaders at Diaceutics recently authored a peer-reviewed article that covers the ongoing national debate over diagnostics reform legislation in the United States. The article is now available online ahead of print in the Journal of Molecula...
View all publications