Glossary

Accuracy- Overall, the closeness of agreement between results of a non-targeted analysis (e.g., mass error, sample classification, chemical identification) and the known (true) value. For sample classification, the proportion of samples that were correctly reported (for both positive and negative conditions), relative to all samples. For compound annotation/identification, the proportion of compounds correctly reported (both present and absent), relative to all compounds. [for further discussion, see section on QA/QC Metrics]

Analytical sequence– Information about the analytical sequence encompasses both the analytical run order (see Run order) and the use of single vs. multiple analytical batches. [for further discussion, see section on Data Acquisition]

Annotation of a feature- The attribution of one or more properties or molecular characteristics to an MS1 feature (or component thereof, such as an isotopologue, adduct, or in-source product ion), or MS/MS product ion. Specific component, feature, or product ion annotations may not provide enough evidence to confidently identify a single compound. Examples of annotations include designation of an observed m/z@RT as a specific adduct, assignment of a molecular formula to a feature or an MS/MS product ion, and assignment of a suggested substructure to an MS/MS product ion. [for further discussion, see section on Data Processing and Analysis]

Blank A blank is a type of sample used to detect and distinguish artificially introduced contamination (due to sample handling, processing, instrumental analysis, etc.) from true sample content. [for further discussion, see section on Study Design]

Boundary- The chemical and analytical space of the non-targeted analysis. [for further discussion, see section on QA/QC Metrics]

Chemical space The physicochemical property space spanned by the detectable and identifiable chemicals in a given NTA study, as defined by method, analytical, and data processing choices. [for further discussion, see section on Study Design]

Compound database- A structured collection of chemical substance information (e.g., chemical identifiers, intrinsic properties, structural identifiers, and retention times) in an exchangeable format and, commonly, a visually interpretable format. Databases are often used for compound-level annotation queries as well as for data compilation, organization, and management. [for further discussion, see section on Data Processing and Analysis]

Confusion matrix- An error matrix (often presented as a table) that is used to summarize the results of a performance evaluation when the true values are known. [for further discussion, see section on QA/QC Metrics]

False Negative (FN)- A sample that is incorrectly reported as the negative condition; a compound that is incorrectly reported as absent from a sample; or a feature that is incorrectly reported as an artifact/noise. [for further discussion, see section on QA/QC Metrics]

False Positive (FP)- A sample that is incorrectly reported as the positive condition; a compound that is incorrectly reported as present in the sample; or a feature that is incorrectly reported as a real detection. [for further discussion, see section on QA/QC Metrics]

True Negative (TN)- A sample that is correctly reported as the negative condition; a compound that is correctly reported as absent from a sample; or a feature that is correctly reported as an artifact/noise. [for further discussion, see section on QA/QC Metrics]

True Positive (TP)- A sample that is correctly reported as the positive condition; a compound that is correctly reported as present in the sample; or a feature that is correctly reported as a real detection. [for further discussion, see section on QA/QC Metrics]

False Negative Rate (FNR) For sample classification, the proportion of samples that were incorrectly reported as the negative condition, relative to all samples that actually are the positive condition. For compound annotation/identification, the proportion of compounds that were incorrectly reported as absent from a sample, relative to all compounds actually present in the sample. [for further discussion, see section on QA/QC Metrics]

False Positive Rate (FPR) For sample classification, the proportion of samples that were incorrectly reported as the positive condition, relative to all samples that actually are the negative condition. For compound annotation/identification, the proportion of compounds that were incorrectly reported as present in the sample, relative to all compounds that are actually absent from the sample. [for further discussion, see section on QA/QC Metrics]

True Negative Rate (TNR) For sample classification, the proportion of samples that were correctly reported as the negative condition, relative to all samples that actually are the negative condition. For compound annotation/identification, the proportion of compounds that were correctly reported as absent from the sample, relative to all compounds that are actually not present in the sample. [for further discussion, see section on QA/QC Metrics]

True Positive Rate (TPR)- For sample classification, the proportion of samples that were correctly reported as the positive condition, relative to all samples that actually are the positive condition. For compound annotation/identification, the proportion of compounds that were correctly reported as present in a sample, relative to all compounds actually present in the sample. [for further discussion, see section on QA/QC Metrics]

Data acquisition method The data acquisition method consists of the instrumental method and analytical sequence that encompass operator decisions used to analyze the samples, blanks, standards, and controls, and should be designed with the scope of the NTA experiment in mind. [for further discussion, see section on Data Acquisition]

Data format conversion- Conversion of raw data files to usable formats for subsequent data processing that does not intentionally interpret the raw data. For example, conversion of data files from proprietary vendor format to open data formats such as mzML, mzXML, or netCDF. [for further discussion, see section on Data Processing and Analysis]

Data processing- Encompasses all steps that transform the raw or converted data into meaningful information, prior to statistical & chemometric analyses and/or annotation & identification efforts. The inputs to data processing are raw or converted data file(s), and the output is a list of features in each sample (with associated chromatography, MS, and MS/MS data for each feature) for further analysis. [for further discussion, see section on Data Processing and Analysis]

F1 Score- The harmonic mean of TPR (i.e., Recall) and Precision. [for further discussion, see section on QA/QC Metrics]

False Discovery Rate (FDR)- For sample classification, the proportion of samples that were incorrectly reported as the positive condition, relative to all samples reported as the positive condition. For compound annotation/identification, the proportion of compounds incorrectly reported as present in the sample, relative to all compounds reported as present in the sample. [for further discussion, see section on QA/QC Metrics]

Feature- A set of mz@RT that is a grouping of associated MS1 components (e.g., isotopologue, adduct, and in-source product ion m/z peaks), and is represented as a tensor of observed retention time, monoisotopic mass, and intensity (e.g., peak height or peak area). Associated MS2 product ions may also be grouped with the MS1 components of a feature during HRMS data processing, depending on the software algorithms. If no groupings exist, a feature can be a single mz@RT. The term “molecular feature” may also be used. [for further discussion, see section on Data Processing and Analysis]

Identification of a feature- The attribution of a specific compound, within a stated identification scope (or at a stated confidence level), to a detected feature(s), when the annotated components or product ions provide enough evidence. [for further discussion, see section on Data Processing and Analysis]

Instrumental Method- Comprised of a detailed list of conditions and parameters that are chosen for the acquisition of (high resolution) mass spectrometry raw data files (including conditions and parameters for associated chromatographic separations). [for further discussion, see section on Data Acquisition]

m/z-retention time pair (mz@RT)- A unique pairing of mass-to-charge ratio and retention time (RT) values. [for further discussion, see section on Data Processing and Analysis]

Matthew’s Correlation Coefficient (MCC)- MCC is a measure of classification quality that treats each class as a variable and computes their correlation coefficient. MCC ranges from -1 (perfect misclassification) to 1 (perfect classification), with values near zero indicating a random guess. [for further discussion, see section on QA/QC Metrics]

Non-targeted analysis (NTA)- A theoretical concept that can be broadly defined as the characterization of the chemical composition of any given sample without the use of a priori knowledge regarding the sample’s chemical content. The resulting detections may be used to classify samples (using the entire chemical profile), and/or subsequent analyses may focus on the identification of individual chemicals. Also referred to as “non-target screening” and “untargeted screening,” among several other related terms. [for further discussion, see section on Study Design]

Precision- Overall, the closeness of agreement, in terms of repeatability and reproducibility, between results of a non-targeted analysis when components of the experiment are replicated and results are reproduced. For sample classification, the proportion of samples that were correctly reported as the positive condition, relative to all samples reported as the positive condition. For compound annotation/identification, the proportion of compounds correctly reported as present in the sample, relative to all compounds reported as present in the sample. [for further discussion, see section on QA/QC Metrics]

QC sample- Encompasses multiple different sample types used for quality control, including (but not limited to): QC spike controls, spiked matrix samples, or pooled matrix samples. [for further discussion, see section on Study Design]

QC spike- A set volume or mass of a solution of analytical standards (single or multi-component; native or isotope-labeled; of known identity and purity) that is added to the sample(s) either before sample preparation or immediately prior to sample analysis. [for further discussion, see section on Study Design]

Quality- The quality assurance and quality control (QA/QC) practices, benchmarks, and assessments for the non-targeted analysis. [for further discussion, see section on QA/QC Metrics]

Run order- The arrangement of the samples, replicates, blanks, standards, and QC controls as they are sequentially analyzed by the instrument. [for further discussion, see section on Data Acquisition]

Spectral library- A repository of mass spectra (MS, MS/MS, MSn) formatted for direct spectral matching to support annotation and identification. The spectral library may include association with compound-level information (e.g., chemical identifiers such as CAS number and intrinsic properties). [for further discussion, see section on Data Processing and Analysis]

Statistical & chemometric analysis- Approaches used to aid interpretation of the reduced (but often still highly rich and complex) data that is produced by data processing, and to provide information about trends, clusters, or other relationships between samples and/or detections. [for further discussion, see section on Data Outputs]

Suspect screening- The identification of chemicals and/or chemical classes detected by an instrument, typically a mass spectrometer, by comparison to a predefined user list or library containing known chemicals of interest. [for further discussion, see section on Study Design]