Boston skyline from Charles River, Cambridge, MA. Photo: Ozan Aygun

JurkatQC Scraper: analytics solution for real-time monitoring of streaming Mass Spectrometer data


Click here to test the app!

It was a Christmas break and Cambridge had nothing but snow. While staying at home I have been scraping some messy quality metrics data that turned out to be useful for monitoring longitudinal mass spectrometer performance. Let me introduce this specialized analytics solution I developed while working at the Broad Institute.


In this problem, the data is generated by a laboratory equipment, called Mass Spectrometer. It is used to understand which proteins are produced relative to each other in a given biological sample, such as a tissue biopsy from a patient. They are highly sophisticated and extremely useful to advance biological research. Like many complicated tools, they are also expensive to maintain. Repairs can cost thousands of dollars per year, so scientists regularly use standard samples and collect quality metrics data to monitor the performance of their instruments.

I developed this tool to effectively scrape the quality metrics data generated by multiple instruments in real-time, streamline data cleaning, feature extraction and harmonization steps. Here is the architecture of the analytics tool to give you an idea how it works:



The app welcomes you with some information about the data, a dashboard you can choose amongst many useful analytics tools that give you real-time information about the streaming data.



A daily summary of quality metrics I implemented using Google's Gvis package. This interactive plot is handy when you want to rapidly switch between bar plots and line plots for a given time-series data:



It is often necessary to look into individual points in the longitudinal data and this plot exactly serves for this purpose. I have implemented in ggplot2, but it nicely coverts into an interactive graphic thanks to ggplotly extension:



What if you just want to know which is the best performing instrument today? Alternatively, what if you want to compare the current performance of the instruments relative to their historical performance? I have implemented these gauges using Google's Gvis package to provide some insight:



Mass Spectrometers are expensive instruments, they are expensive to obtain, expensive to run, and expensive to repair. Since it is all about the costs of the operation, it all comes to getting most data out of them. This implies running them as much as possible, 24 hours a day, 7 days a week. Often this is not the case though. Data acquisition is interrupted by unexpected events, such as lack of high quality samples or problems in instrument calibration. Here is a simple tool I implemented to monitor the downtime associated with a given instrument:



An interesting aspect of quality metrics data is the impact of different days, and users on them. From operational perpective, this may help to identify systematic performance attributes.



The app also provides an interface to monitor instrument vs user analytics. It is helpful to determine whether certain performance characteristics are associated with the practice of particular users in various systems.



Liquid Chromatography (LC) instruments are coupled to Mass Spectrometers and they have a profound impact on overall system performance. The app also parses out the LC label from data file names and associates them with each data point. Hence, it enables powerful analytics to observe the relationship between individual LCs and Mass Spectrometers.



I hope you enjoyed reading about the features of this app. If you would like to leverage it for your lab, feel free to fork the sourcecode available in my GitHub repository!