QA/QC Metrics

You are here: Reference Content > Results > QA/QC Metrics

A universal benchmark for “good” or “acceptable” NTA performance has not yet been established by the community. Study goals and approaches often vary, making a one-size-fits-all approach to NTA performance assessment impractical. Furthermore, knowledge of chemicals contained within a given sample (or knowledge of samples that belong to a specific group) is often necessary for the appropriate calculation and implementation of performance metrics. To date, only a small subset of NTA studies have performed examinations on standardized samples and reported one or more of these metrics.

Despite these challenges, we believe that researchers should always conduct performance self-assessments and transparently report findings using shared terminology. Additionally, since NTA studies aim to characterize the unknown in complex samples, we believe that it is critical to define QA/QC approaches and performance metrics. Further, it is important that researchers deliberately develop study designs that incorporate QA/QC and yield the necessary data to support such performance assessments. Although universal QA/QC protocols for NTA studies do not exist, Schulze et al. (2020) provide an excellent overview of the QA/QC procedures applied to date for NTA in environmental matrices and make preliminary recommendations for QA/QC guidelines. Here, we focus on developing terminology, metrics, and approaches that support clear reporting of QA/QC and NTA performance assessments.

Our proposed approach to performance assessment for NTA studies is described in detail in a recent publication (and below), and summarized in the following video.

We propose that NTA performance be assessed with respect to both Data Acquisition and Data Processing & Analysis (see detailed content below for each of these two subcategories). We further propose that NTA performance be communicated by describing four aspects: the quality, boundaries, accuracy, and precision of the NTA methods (including both acquisition and data processing/analysis methods) and results.

For the purpose of NTA performance self-assessment, we define quality, boundary, accuracy, and precision as follows (adapted from IUPAC 2019):

Quality: The quality assurance and quality control (QA/QC) practices, benchmarks, and assessments for the non-targeted analysis.

Boundary: Describes the chemical and analytical space of the non-targeted analysis.

Accuracy: The closeness of agreement between results of a non-targeted analysis (e.g., mass error, sample classification, chemical identification) and the known (true) value.

Precision: The closeness of agreement, in terms of repeatability and reproducibility, between results of a non-targeted analysis when components of the experiment are replicated and results are reproduced.

For each of these aspects, we provide (in the subcategories below) examples of performance data to assess and report with respect to Data Acquisition and Data Processing & Analysis, as well as questions to consider when evaluating the reporting about performance in NTA studies. We emphasize that although researchers can (and we hope will) use the information contained in this section to develop strategies for performance evaluation and critically examine the outcomes of such performance assessments, the focus of the proposed questions is on the quality and completeness of reporting about the performance assessment, rather than the quality of the data/study itself.


Data Acquisition QA/QC

You are here: Reference Content > Results > QA/QC Metrics > Data Acquisition QA/QC

The data acquisition QA/QC subcategory covers performance with respect to Objectives & Scope, Sample Information & Preparation, Chromatography, and Mass Spectrometry. Assessments of data acquisition performance most often rely on data from QC Spikes & Samples, but certain aspects can also be evaluated with data from real samples (i.e., samples with unknown chemical constituents).

We note that NTA researchers can and should consider aspects of data acquisition performance during study design. For example, prior to completing any work, a researcher should develop quality assurance (QA) protocols for sample handling, should consider the anticipated chemical boundary of their sample preparation and analytical methods, and should develop a data acquisition plan that will support subsequent assessment of accuracy and precision (e.g., by repeated analyses of a spiked QC sample throughout the analytical batch). Detailed information on these study design aspects and appropriate reporting can be found in the Methods sections of the website.

In Table 5.1, we present performance aspects and supporting data that are intended to support the assessment of observed data acquisition performance (i.e., after data acquisition has been completed), along with proposed questions for evaluation of the quality and completeness of the reporting about these aspects. In some cases, an accurate and complete description of the study design and method conditions may be sufficient to describe data acquisition performance, such as to indicate the chemical boundary of the acquired data. However, we suggest that the researcher could provide a data-supported confirmation of the implied chemical boundary and/or report any deviations from expected results, where relevant examples are found.

Table 5.1 – Results Reporting: Data Acquisition QA/QC
Recommended Data to Assess and ReportProposed Questions for Evaluation of Performance Reporting
Performance AspectPerformance Data
QualityAdherence to QA practices and QC benchmarks for sample preparation and data acquisition, such as:
● Adherence to sample handling/processing protocols
● Adherence to randomization and replication protocols
● Assessment of carryover from samples to blanks
● Assessment of QC data relative to stated benchmarks
● Assessment of analytical batch or other time/storage-related effects

See Sample Information & Preparation and QC Spikes & Samples for additional information
Did the authors communicate their adherence to or deviation from stated QA/QC protocols and benchmarks for sample preparation and data acquisition?

Did the authors discuss the possible implications of any deviations from QA/QC protocols and benchmarks?
BoundaryObserved chemical space of the sample preparation method, such as:
● Extraction recoveries for QC spikes

See Sample Information & Preparation and QC Spikes & Samples for additional information
Did the authors discuss possible impacts and/or report observed impacts (e.g., extraction recoveries) of the sample preparation method(s) on the recoverable chemical space?
Observed capabilities of the chromatographic method, such as:
● Separation of isomeric compounds of interest
● Polarity range of detected compounds

See Chromatography for additional information
Did the authors discuss possible impacts and/or report observed impacts (e.g., poor separation of specific QC compounds) of the chromatographic method(s) on the observable chemical space?
Observed capabilities of the mass spectrometry method, such as:
● Description of ionization mode impact on observed chemical space
● Observed matrix suppression or enhancement of QC Spikes & Samples
See Mass spectrometry for additional information
Did the authors discuss possible impacts and/or report observed impacts (e.g., limits of detection) of the mass spectrometry method(s) on the observable chemical space?
AccuracyAssessment of chromatographic accuracy, such as:
● Deviation of retention time (RT) from expected RT for QC Spikes & Samples
Did the authors report the expected RTs of known chemicals, and describe the deviation of observed RTs from expected RTs?
Assessment of mass accuracy, such as:
● Observed mass error range for QC Spikes & Samples
Did the authors list the monoisotopic masses of known chemicals, and describe deviations of the observed accurate masses from the known values?
PrecisionAssessment of variability (i.e., repeatability/reproducibility) across replicates, samples, or batches for the following:
● Mass error
● RT
● Peak intensity (area or height)

Note: Variability can be reported as RSD/SD/CV or as a trend, and can be evaluated for QC Spikes & Samples or for features/compounds detected in real samples.
Did the authors communicate the repeatability of key measures (e.g., accurate mass, RT, peak intensity) across the sample set?

Did the authors communicate the reproducibility of key measures (e.g., accurate mass, RT, peak intensity) across batches or relative to previous analyses?

Did the authors discuss the possible sources of observed variability and report any corrections that were applied to address observed variability?

Data Processing and Analysis QA/QC

You are here: Reference Content > Results > QA/QC Metrics > Data Processing and Analysis QA/QC

The data analysis and processing QA/QC subcategory covers performance with respect to Data Processing, Statistical & Chemometric Analysis, and Annotation & Identification. These assessments of performance most often rely on data from QC Spikes & Samples, but certain aspects can also be evaluated with data from real samples (i.e., samples with unknown chemical constituents).

In Table 5.2, we present performance aspects and supporting data that are intended to support the assessment of observed data processing and analysis performance, along with proposed questions for evaluation of the quality and completeness of the reporting about these aspects.

Table 5.2 – Results Reporting: Data Processing & Analysis QA/QC
Recommended Data to Assess and ReportProposed Questions for Evaluation of Performance Reporting
Performance AspectPerformance Data
QualityResults of QC checks throughout the data processing & analysis workflow, such as:
● Detection of QC features/compounds (either intentionally spiked or known to be present)
● Alignment of features across technical replicates
● Filtering of compounds known to be present in blanks

See Data Processing for additional information
Did the authors communicate the extent to which stated QC benchmarks for data processing and analysis were met?
BoundaryDescription of the capabilities of the data processing and analysis methods, such as:
● Chemical space (e.g., Kow, ionizability, etc.) of the selected library/database
● Information available (e.g., MS/MS spectra) in the selected library/database
● Constraints to chemical space introduced by use of data analysis approaches such as mass defect analysis, molecular networking, etc.
● Estimated or semi-quantitative limits of detection/identification

See Data Processing for additional information
Did the authors discuss the extent to which the data processing & analysis methods may have limited their abilities to observe/identify individual chemicals, chemical classes, or general chemical space?

Did the authors report observed limits of detection/identification (if assessed)?
AccuracyAbility of the method to correctly classify samples or identify known chemicals.

Accuracy can be described using various performance calculations associated with the confusion matrix for:
1. samples with known classification/grouping, or
2. known compounds in QC spikes/samples


Definitions and associated calculations are provided below (see: The Confusion Matrix in NTA & Corresponding Performance Calculations )
Did the authors communicate the method’s ability to correctly classify known samples or identify known chemicals, and provide performance measures (if calculated)?

Did the authors communicate the possible implications of the observed performance with respect to the accuracy/selectivity of the classification method (samples) or the detection/identification workflow (compounds)?
PrecisionRepeatability/reproducibility of performance (e.g., correct identification) for QC samples or QC spikes analyzed multiple times (e.g., across replicates, samples, or batches).

Precision can be described using various performance calculations associated with the confusion matrix for:
1. samples with known classification/grouping, or
2. known compounds in QC spikes/samples

Definitions and associated calculations are provided below (see: The Confusion Matrix in NTA & Corresponding Performance Calculations )
Did the authors communicate the repeatability (e.g., across replicates, samples) or reproducibility (e.g., across batches) of the methods’ ability to correctly classify known samples or identify known chemicals, and provide performance measures (if calculated)?

Did the authors discuss possible sources of observed variability and communicate any possible implications of the observed variability?

The Confusion Matrix in NTA & Corresponding Performance Calculations

A confusion matrix is an error matrix (often presented as a table) that is used to summarize the results of a performance evaluation when the true values are known. To support use of a confusion matrix (Figure 5.1), we first define the four confusion matrix classifications (true positive, true negative, false positive, and false negative). We then present a suite of corresponding performance calculations. Generally, these metrics can be applied to assess the performance of sample classification or of compound annotation/identification. Detailed discussion of the complexities and NTA-relevant applications of the confusion matrix and the performance calculations will be presented in a forthcoming BP4NTA manuscript (in preparation, anticipated submission 2021), and updated in this section of the website.

True Positive (TP): A sample that is correctly reported as the positive condition; a compound that is correctly reported as present in the sample; or a feature that is correctly reported as a real detection.

False Positive (FP): A sample that is incorrectly reported as the positive condition; a compound that is incorrectly reported as present in the sample; or a feature that is incorrectly reported as a real detection.

True Negative (TN): A sample that is correctly reported as the negative condition; a compound that is correctly reported as absent from a sample; or a feature that is correctly reported as an artifact/noise.

False Negative (FN): A sample that is incorrectly reported as the negative condition; a compound that is incorrectly reported as absent from a sample; or a feature that is incorrectly reported as an artifact/noise.

Figure 5.1. A general confusion matrix. TP, FP, FN, and TN each refer to the number of samples or compounds that are classified as such – the sum of TP, FP, FN, and TN is equal to the total number of samples or the total number of compounds considered in the study.

The performance calculations are presented in pairs or groups of related metrics. We note that the terms “accuracy” and “precision” were defined above as overall aspects of NTA performance that can be assessed and reported on via the compilation of data and questions described in the Data Acquisition and Data Processing & Analysis performance tables. However, these terms are also used to describe specific calculations related to the confusion matrix. While potentially unclear at first pass, this duplicate usage of the same terminology is in keeping with broader definitions of these terms and use of these calculations beyond NTA. Careful reporting will support disentanglement of the specific calculations for accuracy and precision from the overall accuracy or precision of an NTA method (which can be described using both calculated accuracy and calculated precision, as well as the suite of other, interrelated performance calculations that are presented below).

True Positive Rate (TPR; also known as “Recall”, “Sensitivity”, or “Hit Rate”) and False Negative Rate (FNR; also known as “Miss Rate”)

  • TPR = TP/(TP+FN)
  • FNR = FN/(FN+TP)
  • FNR = 1-TPR
  • For sample classification:
    • TPR = the proportion of samples that were correctly reported as the positive condition, relative to all samples that actually are the positive condition.
    • FNR = the proportion of samples that were incorrectly reported as the negative condition, relative to all samples that actually are the positive condition.
  • For compound annotation/identification:
    • TPR = the proportion of compounds that were correctly reported as present in a sample, relative to all compounds actually present in the sample.
    • FNR = the proportion of compounds that were incorrectly reported as absent from a sample, relative to all compounds actually present in the sample.

False Positive Rate (FPR; also known as “Fall-Out”) and True Negative Rate (TNR; also known as “Specificity” or “Selectivity”)

  • FPR = 1-TNR
  • FPR = FP/(TN+FP)
  • TNR = TN/(TN+FP)
  • For sample classification:
    • FPR = the proportion of samples that were incorrectly reported as the positive condition, relative to all samples that actually are the negative condition.
    • TNR = the proportion of samples that were correctly reported as the negative condition, relative to all samples that actually are the negative condition.
  • For compound annotation/identification:
    • FPR = the proportion of compounds that were incorrectly reported as present in the sample, relative to all compounds that are actually absent from the sample.
    • TNR = the proportion of compounds that were correctly reported as absent from the sample, relative to all compounds that are actually absent in the sample.

③ Calculation Pair 3: Precision (“Positive Predictive Value”) and False Discovery Rate (FDR)

  • Precision = 1-FDR
  • Precision = TP/(TP+FP)
  • FDR = FP/(FP+TP)
  • For sample classification:
    • Precision = the proportion of samples that were correctly reported as the positive condition, relative to all samples reported as the positive condition.
    • FDR = the proportion of samples that were incorrectly reported as the positive condition, relative to all samples reported as the positive condition.
  • For compound annotation/identification:
    • Precision = the proportion of compounds correctly reported as present in the sample, relative to all compounds reported as present in the sample.
    • FDR = the proportion of compounds incorrectly reported as present in the sample, relative to all compounds reported as present in the sample.

④ Calculation Pair 4: F1 Score, Accuracy, and Matthew’s Correlation Coefficient (MCC)

  • F1 score
    • The harmonic mean of TPR (i.e., Recall) and Precision
    • F1 score = 2*TP/(2TP + FP + FN) or F1 score = (2 * (precision * TPR))/(precision + TPR)
    • Perfect recall and precision will yield an F1 score = 1.
  • Accuracy
    • Accuracy = (TP + TN)/(TP + TN + FP + FN)
    • For sample classification:
      • Accuracy = the proportion of samples that were correctly reported (for both positive and negative conditions), relative to all samples.
    • For compound annotation/identification:
      • Accuracy = the proportion of compounds correctly reported (both present and absent), relative to all compounds.
  • Matthew’s Correlation Coefficient (MCC)
    • MCC = (TP * TN – FP * FN) / [(TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)]1/2
    • MCC is a measure of classification quality that treats each class as a variable and computes their correlation coefficient. MCC ranges from -1 (perfect misclassification) to 1 (perfect classification), with values near zero indicating a random guess (Depner et al. 2020, Chicco et al. 2020).

Literature Examples

  1. Nuñez et al., 2019, J. Chem. Inf. Model.: This study is an example of assessing performance for compound annotation/identification, and used Precision, TPR (Recall), FNR, FDR, and F1 Score. Importantly, this study analyzed synthetic blinded samples from EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT) study, and did not attempt to annotate all observed features, but instead focused on a well-defined list of chemicals whose presence/absence was known (before unblinding, a suspect list that included all spiked chemicals was used; after unblinding, a list of all spiked chemicals was made available). For each compound on the well-defined list, a decision of “Observed” or “Not Observed” was reached.
  2. Sobus et al., 2019, Analytical and Bioanalytical Chemistry: This study is an example of assessing performance for compound annotation/identification. The authors used TPR and FNR and attempted to assign a formula and compound ID to each observed (“real”) feature. This study also analyzed synthetic blinded samples from the ENTACT study and focused on a chemical space defined by a suspect screening list (before unblinding) and the known chemicals (after unblinding).
  3. Chao et al., 2020, Analytical and Bioanalytical Chemistry: This study is an example of assessing performance for compound annotation/identification. The authors used TPR and FPR (and related receiver operator characteristic [ROC] curves) to examine the effectiveness of candidate filtering procedures based on matching of experimental and in silico spectra. Importantly, for this study, multiple candidates could be considered TNs or FPs for any given feature, which affects final performance metrics.
  4. Self et al., 2019, Journal of Food Safety: This study is an example of assessing performance for sample classification. The authors used FP and FN to demonstrate method performance for identification of decomposed seafood samples using chemical profiles generated via non-targeted analysis.

References & Other Relevant Literature

Acevedo, M. A., Corrada-Bravo, C. J., Corrada-Bravo, H., Villanueva-Rivera, L. J., & Aide, T. M. (2009). Automated classification of bird and amphibian calls using machine learning: A comparison of methods. Ecological Informatics, 4(4), 206-214. doi:10.1016/j.ecoinf.2009.06.005

Alexandrov, T., Ovchinnikova, K., Palmer, A., Kovalev, V., Tarasov, A., Stuart, L., . . . Shahidi-Latham, S. (2019). METASPACE: A Community-Populated Knowledge Base of Spatial Metabolomes in Health and Disease. bioRxiv. doi:10.1101/539478

Alygizakis, N. A., Samanipour, S., Hollender, J., Ibáñez, M., Kaserzon, S., Kokkali, V., . . . Thomas, K. V. (2018). Exploring the Potential of a Global Emerging Contaminant Early Warning Network through the Use of Retrospective Suspect Screening with High-Resolution Mass Spectrometry. Environmental Science & Technology, 52(9), 5135-5144. doi:10.1021/acs.est.8b00365

Bader, T., Schulz, W., Kümmerer, K., & Winzenbacher, R. (2016). General strategies to increase the repeatability in non-target screening by liquid chromatography-high resolution mass spectrometry. Analytica Chimica Acta, 935, 173-186. doi:10.1016/j.aca.2016.06.030

Bader, T., Schulz, W., Kümmerer, K., & Winzenbacher, R. (2017). LC-HRMS data processing strategy for reliable sample comparison exemplified by the assessment of water treatment processes. Analytical Chemistry, 89(24), 13219-13226. doi:10.1021/acs.analchem.7b03037

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., & Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16(5), 412-424. doi:10.1093/bioinformatics/16.5.412

Broadhurst, D., Goodacre, R., Reinke, S. N., Kuligowski, J., Wilson, I. D., Lewis, M. R., & Dunn, W. B. (2018). Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics, 14(6), 72. doi:10.1007/s11306-018-1367-3

Callao, M. P., & Ruisánchez, I. (2018). An overview of multivariate qualitative methods for food fraud detection. Food Control, 86, 283-293. doi:10.1016/j.foodcont.2017.11.034

Chao, A., Al-Ghoul, H., McEachran, A. D., Balabin, I., Transue, T., Cathey, T., . . . Sobus, J. R. (2020). In silico MS/MS spectra for identifying unknowns: a critical examination using CFM-ID algorithms and ENTACT mixture samples. Analytical and Bioanalytical Chemistry, 412, 1303-1315. doi:10.1007/s00216-019-02351-7

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 6. doi:10.1186/s12864-019-6413-7

Chong, E. Y., Huang, Y., Wu, H., Ghasemzadeh, N., Uppal, K., Quyyumi, A. A., . . . Yu, T. (2015). Local false discovery rate estimation using feature reliability in LC/MS metabolomics data. Scientific Reports, 5, 17221. doi:10.1038/srep17221

DeFelice, B. C., Mehta, S. S., Samra, S., Čajka, T., Wancewicz, B., Fahrmann, J. F., & Fiehn, O. (2017). Mass Spectral Feature List Optimizer (MS-FLO): A Tool To Minimize False Positive Peak Reports in Untargeted Liquid Chromatography-Mass Spectroscopy (LC-MS) Data Processing. Analytical Chemistry, 89(6), 3250-3255. doi:10.1021/acs.analchem.6b04372

Depner, C. M., Cogswell, D. T., Bisesi, P. J., Markwald, R. R., Cruickshank-Quinn, C., Quinn, K., . . . Wright, K. P. (2020). Developing preliminary blood metabolomics-based biomarkers of insufficient sleep in humans. Sleep, 43(7). doi:10.1093/sleep/zsz321

Drotleff, B., & Lämmerhofer, M. (2019). Guidelines for Selection of Internal Standard-Based Normalization Strategies in Untargeted Lipidomic Profiling by LC-HR-MS/MS. Analytical Chemistry, 91(15), 9836-9843. doi:10.1021/acs.analchem.9b01505

Dudzik, D., Barbas-Bernardos, C., García, A., & Barbas, C. (2018). Quality assurance procedures for mass spectrometry untargeted metabolomics. a review. Journal of Pharmaceutical and Biomedical Analysis, 147, 149-173. doi:10.1016/j.jpba.2017.07.044

Frousios, K., Iliopoulos, C. S., Schlitt, T., & Simpson, M. A. (2013). Predicting the functional consequences of non-synonymous DNA sequence variants–evaluation of bioinformatics tools and development of a consensus strategy. Genomics, 102(4), 223-228. doi:10.1016/j.ygeno.2013.06.005

IUPAC. Compendium of Chemical Terminology, 2nd ed. (the “Gold Book”). Compiled by A. D. McNaught and A. Wilkinson. Blackwell Scientific Publications, Oxford (1997). Online version (2019-) created by S. J. Chalk. ISBN 0-9678550-9-8. https://doi.org/10.1351/goldbook.

Kind, T., & Fiehn, O. (2007). Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics, 8, 105. doi:10.1186/1471-2105-8-105

Knolhoff, A. M., Premo, J. H., & Fisher, C. M. (2021). A Proposed Quality Control Standard Mixture and Its Uses for Evaluating Nontargeted and Suspect Screening LC/HR-MS Method Performance. Analytical Chemistry, 93(3), 1596-1603. doi:10.1021/acs.analchem.0c04036

Kunzelmann, M., Winter, M., Åberg, M., Hellenäs, K. E., & Rosén, J. (2018). Non-targeted analysis of unexpected food contaminants using LC-HRMS. Analytical and Bioanalytical Chemistry, 410(22), 5593-5602. doi:10.1007/s00216-018-1028-4

Machine Learning Crash Course. (2020, 2/10/2020). Retrieved from https://developers.google.com/machine-learning/crash-course/classification/true-false-positive-negative

Nuñez, J. R., Colby, S. M., Thomas, D. G., Tfaily, M. M., Tolic, N., Ulrich, E. M., . . . Renslow, R. S. (2019). Evaluation of In Silico Multifeature Libraries for Providing Evidence for the Presence of Small Molecules in Synthetic Blinded Samples. Journal of Chemical Information and Modeling, 59(9), 4052-4060. doi:10.1021/acs.jcim.9b00444

Nürenberg, G., Schulz, M., Kunkel, U., & Ternes, T. A. (2015). Development and validation of a generic nontarget method based on liquid chromatography – high resolution mass spectrometry analysis for the evaluation of different wastewater treatment options. Journal of Chromatography A, 1426, 77-90. doi:10.1016/j.chroma.2015.11.014

Palmer, A., Phapale, P., Chernyavsky, I., Lavigne, R., Fay, D., Tarasov, A., . . . Alexandrov, T. (2017). FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry. Nature Methods, 14(1), 57-60. doi:10.1038/nmeth.4072

Ramsundar, B., Eastman, P., Walters, P., & Pande, V. (2019). Deep Learning for the Life Sciences: O’Reilly Media, Inc.

Rostkowski, P., Haglund, P., Aalizadeh, R., Alygizakis, N., Thomaidis, N., Arandes, J. B., . . . Yang, C. (2019). The strength in numbers: comprehensive characterization of house dust using complementary mass spectrometric techniques. Analytical and Bioanalytical Chemistry, 411(10), 1957-1977. doi:10.1007/s00216-019-01615-6

Scheubert, K., Hufsky, F., Petras, D., Wang, M., Nothias, L. F., Dührkop, K., . . . Böcker, S. (2017). Significance estimation for large scale metabolomics annotations by spectral matching. Nature Communications, 8(1), 1494. doi:10.1038/s41467-017-01318-5

Schrimpe-Rutledge, A. C., Codreanu, S. G., Sherrod, S. D., & McLean, J. A. (2016). Untargeted Metabolomics Strategies-Challenges and Emerging Directions. Journal of the American Society for Mass Spectrometry, 27(12), 1897-1905. doi:10.1007/s13361-016-1469-y

Schulze, B., Jeon, Y., Kaserzon, S., Heffernan, A. L., Dewapriya, P., O’Brien, J., . . . Samanipour, S. (2020). An assessment of quality assurance/quality control efforts in high resolution mass spectrometry non-target workflows for analysis of environmental samples. TrAC Trends in Analytical Chemistry, 133, 116063. doi:10.1016/j.trac.2020.116063

Schymanski, E. L., Singer, H. P., Slobodnik, J., Ipolyi, I. M., Oswald, P., Krauss, M., . . . Hollender, J. (2015). Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysis. Analytical and Bioanalytical Chemistry, 407(21), 6237-6255. doi:10.1007/s00216-015-8681-7

Self, R. L., McLendon, M. G., & Lock, C. M. (2019). Determination of decomposition in Salmon products by mass spectrometry with sensory‐driven multivariate analysis. Journal of Food Safety, 39(5), e12676. doi:10.1111/jfs.12676

Sleighter, R. L., Chen, H., Wozniak, A. S., Willoughby, A. S., Caricasole, P., & Hatcher, P. G. (2012). Establishing a measure of reproducibility of ultrahigh-resolution mass spectra for complex mixtures of natural organic matter. Analytical Chemistry, 84(21), 9184-9191. doi:10.1021/ac3018026

Sobus, J. R., Grossman, J. N., Chao, A., Singh, R., Williams, A. J., Grulke, C. M., . . . Ulrich, E. M. (2019). Using prepared mixtures of ToxCast chemicals to evaluate non-targeted analysis (NTA) method performance. Analytical and Bioanalytical Chemistry, 411(4), 835-851. doi:10.1007/s00216-018-1526-4

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437. doi:10.1016/j.ipm.2009.03.002

Sumner, L. W., Amberg, A., Barrett, D., Beale, M. H., Beger, R., Daykin, C. A., . . . Viant, M. R. (2007). Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics, 3(3), 211-221. doi:10.1007/s11306-007-0082-2

Trullols, E., Ruisánchez, I., & Rius, F. X. (2004). Validation of qualitative analytical methods. TrAC Trends in Analytical Chemistry, 23(2), 137-145. doi:10.1016/s0165-9936(04)00201-8

Xu, L., Wang, X., Jiao, Y., & Liu, X. (2018). Assessment of potential false positives via orbitrap-based untargeted lipidomics from rat tissues. Talanta, 178, 287-293. doi:10.1016/j.talanta.2017.09.046

Zhu, X., Li, S., Shan, Y., Zhang, Z., Li, G., Su, D., & Liu, F. (2010). Detection of adulterants such as sweeteners materials in honey using near-infrared spectroscopy and chemometrics. Journal of Food Engineering, 101(1), 92-97. doi:10.1016/j.jfoodeng.2010.06.014