Online Databases & Libraries

These online resources contain lists of chemicals that may be detected using non-targeted analysis as well as information about the chemicals such as structures, molecular properties, mass spectral data, and/or toxicity information.

PubChem (https://pubchem.ncbi.nlm.nih.gov/) is an open chemistry database at the National Institutes of Health (NIH). Since the launch in 2004, PubChem has become a key chemical information resource for scientists, students, and the general public. PubChem mostly contains small molecules, but also larger molecules such as nucleotides, carbohydrates, lipids, peptides, and chemically-modified macromolecules. They collect information on chemical structures, identifiers, chemical and physical properties, biological activities, patents, health, safety, toxicity data, and many others.

ChemSpider (http://www.chemspider.com/) is a free chemical structure database providing fast access to over 90 million structures, properties, and associated information. By integrating and linking compounds from hundreds of high-quality data sources, ChemSpider enables researchers to discover the most comprehensive view of freely available chemical data from a single online search. It is owned by the Royal Society of Chemistry.

ChEBI (https://www.ebi.ac.uk/chebi/) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The term ‘molecular entity’ refers to any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.

CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) is a part of a suite of databases and web applications developed by the US Environmental Protection Agency’s Chemical Safety for Sustainability Research Program. These databases and apps support EPA’s computational toxicology research efforts to develop innovative methods to change how chemicals are currently evaluated for potential health risks. EPA researchers integrate advances in biology, biotechnology, chemistry, and computer science to identify important biological processes that may be disrupted by the chemicals. The combined information helps prioritize chemicals based on potential health risks.

MassBank of North America (MoNA) (https://mona.fiehnlab.ucdavis.edu/) is a metadata-centric, auto-curating repository designed for efficient storage and querying of mass spectral records. It intends to serve as a the framework for a centralized, collaborative database of metabolite mass spectra, metadata and associated compounds. MoNA currently contains over 200,000 mass spectral records from experimental and in silico libraries as well as from user contributions.

MassBank (Europe) (https://massbank.eu/MassBank/) is a member of the MassBank Consortium (https://github.com/MassBank) alongside the founding MassBank of Japan (http://massbank.jp/) and MassBank of North America (https://massbank.us). The aim of the European MassBank is to provide an open access and vendor-independent repository for mass spectral data, to support the screening and identification of unknown compounds, with a particular focus on environmental samples due to members of the European community.  MassBank.EU now contains >88,000 spectra of >16,500 compounds from >40 instrument/ionisation types and >45 contributors. 

The NIST Chemistry Webbook (https://webbook.nist.gov/) provides users with easy access to chemical and physical property data for chemical species through the internet. The data provided in the site are from collections maintained by the NIST Standard Reference Data Program and outside contributors. Mass spectra are provided from the NIST Mass Spectrometry Data Center. Supplementary data, such as the source of the spectrum, instrument type, instrument parameters, and the EPA MS number are displayed below the spectrum.

mzCloud (https://www.mzcloud.org/) is a mass spectral database that assists analysts in identifying compounds in areas such as life sciences, metabolomics, pharmaceutical research, toxicology, forensic investigations, environmental analysis, food control and various industrial applications. mzCloud™ features a freely searchable collection of high resolution/accurate mass spectra using a new third generation spectra correlation algorithm. Online access to the database is free of charge and no registration is required.

The NIST MS Data Center (https://chemdata.nist.gov/dokuwiki/doku.php?id=chemdata:start) site provides information and access to NIST mass spectral data products including EI and tandem MS libraries (small molecule and peptide), a GC retention index collection as well as certain freely available, specialized spectral libraries. Freely available data analysis tools include AMDIS (Automated Mass Spectral Deconvolution and Identification System for GC/MS), the Mass Spectrum Interpreter (connects chemical structures with mass spectra), and the Mass Spectral Digitizer Program.

METLIN (https://metlin.scripps.edu) is a data management system to assist in metabolite and chemical entity identification by providing public access to its repository of comprehensive MS/MS metabolite data. METLIN’s annotated list of molecular standards include metabolites and other chemical entities. The METLIN database was developed and is maintained solely by the Siuzdak laboratory at The Scripps Research Institute.