Data extraction

 

Moderator: Elwin VERHEIJ (TNO, Zeist)

 

Background:

The application of LC-MS, GC-MS and NMR for metabolomics results in very rich and complex raw data often obtained for large numbers of samples. Data extraction, i.e. translation of raw data / signals into accurate and concise clean data, is a daunting task. The increasing popularity of high resolution mass spec, e.g. FT and Orbitrap systems and GCxGC-MS and high throughput systems (e.g. UPLC) results in the generation of even larger amounts of complex data.

 

Several data extraction strategies exist for the various analytical techniques, e.g. for NMR: binning, peak picking, deconvolution and for LC-MS/GC-MS: target processing, peak picking, deconvolution, etc. using software provided by instrument vendors, independent companies, or homemade tools (public domain or proprietary). All data extraction methods and softwares have their pros and cons with respect to critical issues such as throughput and data accuracy/quality.

 

Metabolomics collaboration is hindered by the application of a wide variety of data extraction strategies and tools, especially because N data extraction tools (and a multitude of user defined settings) applied to the same raw data results in at least N different clean data sets, and finally in at least N different statistical models.

 

Goal:

Improve possibilities for nutritional metabolomics collaboration by sharing experiences with different data extraction methods and proposing a standard or reference method(s).

 

Approach:

In order to make this workshop a success we invite experts to contribute to this workshop and present their view on the data extraction as described above.

Topics discussed in the session on data extraction will include:

  • What’s around for the various analytical techniques, and what are the pros and cons?
  • Is there a need for organizing round robins / benchmarking studies?, and
  • if so, how do we arrange this? (who will do what, selection of well characterized raw datasets, etc)
  • What’s the ideal situation, how do we get there?
  • Standardization of tools?
  • Clear standardized documentation of clean data sets, how was it obtained, what does the data represent, etc.

 

Result:

Criteria to be defined on what is a well characterized raw dataset. Based on these citeria, data set(s) should be selected for the testing of different extraction methods of potential interest.

Decision on a round robin/benchmarking study

Recent publications:
We strongly encourage all the participants of this session to have a look at the selection of recent publications prepared by the Moderators and Organizing Committee. Each participant of this session should have critical ideas on the present state of the art, and will actively contribute to the elaboratation of recommendations when taking part to this session.

The pdf files for the session data extraction can be found at:
pdf files_data extraction

The slides shown during this session can be found in the attachments



AttachmentOpenMS-DataExtraction-NuGO-2007-12.pdf2743 KB
PowerPoint PresentationNUGO Workshop Dec 2007 Data Extraction.ppt12709 KB
.