Systematic Process for the Identification of “Known Unknowns” in Commercial Products by GC-MS and LC-MS
Note: Details on other subjects are found in “My Topics” tab under sailboat picture above or on the sidebar links to the right..
Introduction: In the last 34 years, we have developed a systematic process for the identification of “known unknowns” by GC-MS and LC-MS in commercial products. We define “known unknowns” as non-targeted species which are known in the chemical literature or mass spectrometry reference databases, but unknown to the investigator. The process is shown in the following simplified flowchart:
This process is described in detail in the February 2013 copy of LCGC, “MS-The Practical Art.” The article is entitled: “Identifying “Known Unknowns” in Commercial Products by Mass Spectrometry.” A copy with associated ads is shown below:
or the direct link to the Chromatography Online Website:
NIST Search of EI and CID Spectra: The initial step in our process utilizes computer searches of EI (GC-MS) or CID (LC-MS) spectra against reference databases using the NIST MS search. The computer EI searches normally work better than CID ones, but the latter are still very useful. We employ both purchased, in-house, and internet databases:
We use the NIST and Wiley commercial databases, but there are many other specialty databases that others might find useful.
The NIST search interfaces easily with a wide variety of manufacturers’ data processing and structural drawing programs:
“Spectra-Less” Database Searching: If the NIST search is not successful, then accurate mass data is used to obtain a molecular formula (MF), a monoisotopic mass, or an average molecular weight. This data is used to search very large databases such as the CAS Registry (>70 million entries) or ChemSpider (>28 million entries) via web interfaces. We define them as “spectra-less” databases because they contain no computer-searchable mass spectral data. We had originally used this approach to search the TSCA database or our Eastman Chemical plant material listing.
The candidate structures from the CAS Registry or ChemSpider searches are prioritized by either the number of associated references or key words. Other ancillary information such as mass spectral fragments in EI or CID spectra; isotopic abundances, UV spectra; types of ion adducts; CI data; number of exchangeable protons; etc. are used to narrow the list to one structure. This website has many screenshots (SciFinder1, SciFinder2, ChemSpider) that illustrate these approaches with many examples.
Model EI and CID Spectra from NIST Structure Search: The NIST MS Search program ranks model compounds employing structural searching of both our commercial and in-house databases. This is particularly valuable for finding model compounds for a proposed structure found in searches of “spectra-less” databases such as the CAS Registry and ChemSpider.
As noted in the table above, there are ~800,000 structures associated with the EI spectra and 100,000 structures with CID mass spectra in our computer-searchable databases. We use the NIST MS Interpreter program to automatically correlate fragment ions in the EI and CID spectra with the component’s substructure.
“No Results” from Process: There will be non-targeted species in the sample which are “unknown unknowns”, those not found in any reference libraries or “spectra-less” databases. A few thoughts on their identification are discussed in another section.
Future Improvements Needed: The approach works well for the majority of our samples which are fairly simple and contain components with molecular weights <500 daltons. On the other hand, improvements are needed for complex samples and components with molecular weights >500 daltons.