Spark and laboratory data | Diaceutics

Spark and laboratory data

July 25th, 2018

Spark and laboratory data

At Diaceutics we are using a number of tools to leverage our data and to generate actionable insights.  We mine data to forecast market trends and to understand the changing biomarker landscape.  We are able to understand the use of new biomarker targets over time, review the dissemination of new companion diagnostics across the globe and build an understanding of the patient journey and related events (numbers of tests and timeliness, for example) linked to health outcomes. There are many methods that can be used to generate valuable insights through use of lab data.  

A key technology we leverage is Apache Spark®.  We use this big data technology to analyze our proprietary laboratory test data for our pharmaceutical clients.  We incorporate Spark in our regular processes to handle weekly uploads to our data warehouse and it takes just one hour to parse the entirety of our historical data.  This is possible through use of parallel processing using Amazon Web Services (AWS) as a cloud hosting provider to scale up data processing as needed.  When we deploy Spark we efficiently use the right number of machines to process data, while Amazon manages the machines and software.  Furthermore, Spark is extensible, featuring machine learning libraries that can be used to mine deeper insights into our laboratory data.  

We have used Spark for both analysis as well as for processing. One example of how we leverage processing power is to aggregate all biomarkers tested per patient over their entire history.  For analysis, we can take the list of tests performed and calculate probabilities of how likely one test will lead to another.  We can also take the list of biomarkers and cluster patients according to treatment histories.  Having one place to bulk process records is valuable as we discover deep insights using our machine learning algorithms through reviews of many different slices of data.   

Spark can be used to correct anomalous data and augment gaps in data.  Data that is manually entered or missing can be forecast or given context.  For example, body sites of a biopsy are often manually entered by a physician or pathologist and at times there are spelling errors that may prevent accurate tracking of a sample’s origin.  When there are gaps in information, Spark allows us to cross reference a body site for a given test event with the entire history of a sample, allowing us to interpolate where in the body the sample originated.  

The possibilities for using Spark are great and many large companies use Spark regularly, including Alibaba, Amazon, Autodesk, Tencent and TripAdvisor.  For example, Trip Advisor is capable of processing every review that has been added to their site through use of Spark – they apply natural language processing to the reviews to make the content more useful.  

Spark has been around since 2012 and its use in the health sciences continues to grow.  At Diaceutics, we are continually looking for new ways to leverage the latest technologies to actively break down barriers to deliver better testing, therefore better treatment, for patients.  

#ApacheSpark #Labdata #Diaceutics #MachineLearning #BigData  

Refs: 

http://spark.apache.org/powered-by.html 

https://conferences.oreilly.com/strata/big-data-conference-ca-2015/public/schedule/detail/40463 

http://engineering.tripadvisor.com/using-apache-spark-for-massively-parallel-nlp/ 

Webinars & Podcasts

August 9th, 2018
Podcast: Oncology Patient Research
Why do we need to talk biomarkers with patients? Senior Director of Market Research at Diaceutics, Marianne Fillion, recently spearheaded an effort to gather insights directly from oncology patients to get an understanding of what they know about ...
April 20th, 2018
Podcast: PM Readiness Report 2018
Peter Keeling discusses the landscape and challenges for precision medicine, companion diagnostics, CDx or biomarker and conduit diagnostics are discussed including global laboratory test data analysis and forecasts for budget impact and value.
View all

Expert Insights

April 19th, 2018
The CMS National Coverage Decision on NGS
I. Introduction On March 16, 2018, the Centers for Medicare and Medicaid Services (CMS) finalized a National Coverage Determination (NCD) that cove...
January 19th, 2018
What does the EU IVD Regulation mean for companion diagnostics and LDTs?
Dave Smart, PhD, Director at Diaceutics, discusses the introduction of the EU IVD Regulation. While it is considered a necessary step, the Regulati...
View all expert insights

Competitive Benchmarking Reports

March 16th, 2018
PM Readiness Report 2018 Summary
March 14th, 2017
Pharma Readiness for Diagnostic Integration 2017
View all reports

Publications

September 11th, 2018
BRAF mutation testing in melanoma – Poster presented at European Congress of Pathology 2018
BRAF mutation testing in melanoma: a study including Austria, Germany and UK, highlighting concordance for current technologies, and potential requirement of more sensitive technologies in future applications.
August 3rd, 2018
A Way Forward: Leveraging Advanced Diagnostic Testing to Unlock the Value of Precision Medicine
Read about how advanced diagnostic testing, ensuring that right patient gets the right therapy, at the right time, particularly with respect to therapies where the test result determines whether or not the therapy should be used (precision medicin...
View all publications
Facebook
Twitter
YouTube
LinkedIn