An unknown to an investigator, in many cases, is often known in the chemical literature. We refer to these types non-targeted species as “known unknowns.” The term originated in a much different context in a quote by Donald Rumsfeld.
The CAS Registry/CAplus is the largest collection of known substances and associated references. It is easily accessed by SciFinder which is a subscription web-based service offered by the American Chemical Society. The CAS Registry contains >70 million compounds (Dec. 2012) and >10,000 are added daily! The compounds are associated with more than 36 million references and patents in CAplus.
We refer to this type of database as “spectra-less” because it contains no computer-searchable EI or CID mass spectra. However, very useful results can be obtained by searching it with either a molecular formula (MF) or an average molecular weight. The resulting candidates are sorted by the number of associated references or key words to find the most likely identifications. The resulting candidates are further evaluated by their mass spectral fragmentation and other ancillary information to arrive at the identification of the “known unknown.”
We presented our initial results at the ASMS meeting in Salt Lake City, Utah in 2010 and published additional information in the Journal of the American Society for Mass Spectrometry (JASMS) in 2011. The article was featured on the cover of the Feb 2011 JASMS:
Prepress Version of ASMS Article with Screenshots Illustrating Examples:
Journal American Society for Mass Spectrometry, (2011) DOI: 10.1007/s13361-010-0034-3.
New June 2012: Search SciFinder by Average MW with Screenshoots:
They have added the capability to search the web-based version of SciFinder by average molecular weight, see the following screenshots to illustrate example:
Poster Session ASMS 2010:
Additional Related Information:
A similar approach based on a free web-based product, ChemSpider, was published in the Journal of the American Society for Mass Spectrometry in 2012.
NIST offers a similar approach as part of their NIST MS Search Software, with a somewhat smaller collection of compounds.
Roger Schenck from CAS mentions our approach in a brief internet article.