Posted by: tvasailor | December 30, 2012

Identifying “Unknown Unknowns”

Return to Home Page

There will be many non-targeted species found in environmental and commercial samples which are “unknown unknowns.”  In other words, compounds which are not found in any spectral databases or even large “spectra-less” databases such as the CAS Registry or ChemSpider.  These species are noted as yielding “No Results” in our process flowchart.

Many of the these will be “transformation products” formed by oxidation, dealkylation, hydrolysis, etc. of “known unknowns” identified in the sample of interest.   It is much more difficult to identify these components, but information obtained in the identification of other components in the sample will often yield information that facilitates their identification.

For example, many components will be metabolites formed from other components.  Thus software designed to identify metabolites by generating possible molecular formula from expected biotransformations will prove useful.  Furthermore, using the mass defect filter technique [1-3] to search for related compounds of identified components could be utilized.

In most of our commercial samples, “unknown unknowns” are routinely identified from common fragment ions and neutral losses noted for components identified in the sample.  With this data and accurate mass information, additional components can often be easily identified.  These newly identified components are then added to our in-house mass spectral/structural database for future reference.


1.  Mass defect filter technique and its applications to drug metabolite identification by high-resolution mass spectrometry, J Mass Spectrom,  2009 Jul;44(7):999-1016. doi: 10.1002/jms.1610

2.  Metabolite Identification Using Multiple Mass Defect Filters and Higher Energy Collisional Dissociation on a Hybrid Mass Spectrometer, Spectroscopy 2008.

3.  Mass Defect Filtering:  a New Toot to Expedite Screening, Dereplication, and Identification of Natural Products, poster, XenoBiotic Laboratories Poster

Return to Home Page

Posted by: tvasailor | December 28, 2012

CID (MS/MS) Libraries

Return to Home Page

We routinely use both EI and CID (MS/MS) libraries to identify unknowns using the NIST MS Search.  The libraries employed include purchased commercial libraries and our in-house created libraries.  We do both spectral searches to identify compounds and structure searches to find model compounds.  Even if the component of interest is not in the library, spectral search results give valuable insight into substructural groups present in the structure of the unknown!


We can obtain nominal mass CID (collision induced dissociation), accurate mass CID (TOF), in-source CID (quad and TOF), and tandem CID spectra (QTOF, triple quad).  Below is an accurate mass QTOF spectrum obtained on our Agilent 6500 QTOF.


The following PDF file details information on creating and using in-house CID user libraries in NIST format:

Creating and Using_User_CID_Libraries

Return to Home Page

Posted by: tvasailor | May 24, 2012

Overview “Known Unknowns”

New:  ASMS 2020 Reboot Workshop:  Library Searching Presentation

Return to Home Page

Systematic Process for the Identification of “Known Unknowns” in Commercial Products by GC-MS and LC-MS

Note: Details on other subjects are found in “My Topics” tab under sailboat picture above or on the sidebar links to the right..

Introduction:  In the last 34 years, we have developed a systematic process for the identification of “known unknowns” by GC-MS and LC-MS in commercial products.  We define “known unknowns” as non-targeted species which are known in the chemical literature or mass spectrometry reference databases, but unknown to the investigator.  The process is shown in the following simplified flowchart:


This process is described in detail in the February 2013 copy of LCGC, “MS-The Practical Art.”  The article is entitled:  “Identifying “Known Unknowns” in Commercial Products by Mass Spectrometry.”  A copy with associated ads is shown below:

LCGC PDF with Advertisements

or the direct link to the Chromatography Online Website:

LCGC Article On-Line Link

which can be printed using the instructions in this link.  The article originated from work presented at Pittcon in 2012.

NIST Search of EI and CID Spectra:  The initial step in our process utilizes computer searches of EI (GC-MS) or CID (LC-MS) spectra against reference databases using the NIST MS search.  The computer EI searches normally work better than CID ones, but the latter are still very useful.  We employ both purchased, in-house, and internet databases:

We use the NIST and Wiley commercial databases, but there are many other specialty databases that others might find useful.

The NIST search interfaces easily with a wide variety of manufacturers’ data processing and structural drawing programs:


“Spectra-Less” Database Searching:  If the NIST search is not successful, then accurate mass data is used to obtain a molecular formula (MF), a monoisotopic mass, or an average molecular weight.  This data is used to search very large databases such as the CAS Registry (>70 million entries) or ChemSpider (>28 million entries) via web interfaces.  We define them as “spectra-less” databases because they contain no computer-searchable mass spectral data.  We had originally used this approach to search the TSCA database or our Eastman Chemical plant material listing.

The candidate structures from the CAS Registry or ChemSpider searches are prioritized by either the number of associated references or key words.  Other ancillary information such as mass spectral fragments in EI or CID spectra; isotopic abundances, UV spectra; types of ion adductsCI data; number of exchangeable protons;  etc. are used to narrow the list to one structure.  This website has many screenshots (SciFinder1, SciFinder2, ChemSpider) that illustrate these approaches with many examples.

Model EI and CID Spectra from NIST Structure Search:  The NIST MS Search program ranks model compounds employing structural searching of both our commercial and in-house databases.  This is particularly valuable for finding model compounds for a proposed structure found in searches of “spectra-less” databases such as the CAS Registry and ChemSpider.

As noted in the table above, there are ~800,000 structures associated with the EI spectra and 100,000 structures with CID mass spectra in our computer-searchable databases.  We use the NIST MS Interpreter  program to automatically correlate fragment ions in the EI and CID spectra with the component’s substructure.

“No Results” from Process:  There will be non-targeted species in the sample which are “unknown unknowns”, those not found in any reference libraries or “spectra-less” databases.  A few thoughts on their identification are discussed in another section.

Future Improvements Needed:  The approach works well for the majority of our samples which are fairly simple and contain components with molecular weights <500 daltons.  On the other hand, improvements are needed for complex samples and components with molecular weights >500 daltons.

Return to Home Page

Posted by: tvasailor | May 23, 2012

In Silico MS Fragmentation Data

Return to Home Page

Eastman employs four different”in silico” software programs for processing MS/MS (CID) and EI spectra in our identification efforts.  The results are definitely not as useful as searching or examining experimentally acquired CID spectra.  Nevertheless, the use of in silico software programs and their associated spectra can definitely assist in shortening the time it takes to manually interpret and prioritize substructural information for candidate species.  In addition ABSciex has some interesting in silico capabilities and a section of a recent review article discusses the topic.

The software programs we use are listed below.

1.  Use of NIST MS Interpreter for EI and MS/MS Spectra

We routinely use NIST MS Interpreter to understand EI and MS/MS (APCI, electrospray) spectra.  It is critical part of entering compounds into both our EI and MS/MS libraries.  It quickly allows us to determine if all the major fragments in our spectrum are consistent with the proposed structure.  In addition, it can also be used to correlate fragment ions in a model compound’s spectrum to its substructure in the NIST and Wiley EI and MS/MS libraries and our in-house libraries.


It is supplied free of charge with the NIST Version 2.0 Software.  I have included screenshots of the MS Interpreter and a poster presented by NIST:

ms_interpreter_screenshots (2 pages)


2.  MetFrag with SciFinder and ChemSpider

MetFrag looks very interesting.  It allows me to use an automated connection to ChemSpider and a manual connection to SciFinder to obtain candidate structures.  In silico CID spectra are generated for all the candidate structures which are then compared to my acquired accurate mass CID spectrum.  The best results appear to be obtained in my examples by sorting by the “# of Explained Peaks” instead of “Score” in many cases.

I have shown the results for the approach with both ChemSpider and SciFinder for trazodone:


Their web links are:



If you want to try the trazodone example, the following link contains the mass spec data and the SDF file needed for the MetFrag query:


3.  Identifying “Known Unknowns” Using ChemSpider and Automated MS/MS Structure Correlation

A poster session was given at the 2012 ASMS conference in Vancouver.  The work was done as a collaborative effort between Agilent, ChemSpider (RSC), and Eastman Chemical Company.  The approach is described below with a link to the poster session in PDF format:

  • Accurate mass data acquired by LC/MS in data-dependent MS/MS mode
  • Agilent MassHunter Qualitative Analysis software used to find compounds and generate molecular formulas using monoisotopic mass, isotope abundance, and isotope spacing, as well as look for matching formulas for each fragment ion and its neutral loss from the precursor
  • Prototype Agilent MassHunter Molecular Structure Correlator software uses ChemSpider interface to obtain candidate structures for a target molecular formula and associated number of references for candidate structures
  • Calculation of correlation scores to explain MS/MS fragmentation pattern for each candidate structure using a “systematic bond-breaking” approach
  • Results sorted by number of references and/or correlation score to evaluate structure candidates for identification of components


3.   MathSpec

MathSpec, Inc. is a privately held company based in the Chicago area. The founder, Dr. Daniel L. Sweeney, believes that mass spectral fragmentation data can best be interpreted by viewing small molecules as mathematical partitions of the molecular weight. Dan has been doing LC-MS and interpreting MS/MS data for over twenty years.

It seems to work well and he is also modifying to find similar compounds.  He added the capability to sort by number of associated synonyms which is similar to the concept of sorting by the number of references in other programs.


Also see his You Tube videos describing his new software including add-ins in Excel.  Just search with Google for “MathSpec You Tube.”

Return to Home Page

Return to Home Page


I presented a paper at PittCon 2012 describing the identification of “known unknowns.”  The name of the session was “Accurate Mass and Novel Applications of Mass Spectrometry for Unknown Environmental Analyses Symposium” which was arranged by Michael Thurman and Imma Ferrer, University of Colorado.

My presentation summarized the approaches we employ at Eastman for the identification of unknowns including searching mass spectral databases (EI, CID) and  “spectra-less” databases such as the CAS Registry and ChemSpider.

Here is a link to the talk:


Links to related articles on ChemSpider and CAS Registry (SciFinder/STN Express) can be found on the side bar of my webpage.  Also additional information on creating, using, and distributing personal libraries in NIST format is found in those links.

Return to Home Page

Return to Home Page

NIST has greatly facilitated the utilization of the NIST Version 2 library search software package with a wide variety of other data processing and structural drawing packages.


The NIST software includes a data processing program:


We use AMDIS, but also use the NIST search with a variety of other data processing programs.  I have included some links with instructions to setting up several of these programs below.

David Sparkman indicated that sometimes problems are noted in transferring spectra from a manufacturer’s data processing to the NIST search for processing.  This can often be fixed by deleting the AUTOIMP.MSD file found in the same directory as the installed NIST search software.  The file will then be created correctly when starting the transfer of the spectrum from the manufacturer’s program to the NIST search.

Agilent MassHunter:




I occasionally had problems getting MassLynx to work, see attached fix:




Various Structural Drawing Packages:


Agilent LC-MSD software:


ACD Software:


Wsearch: This program exports spectra to NIST search and supports many different file formats.


Using NIST as Corporate Database/Archive:


Return to Home Page

Return to Home Page

David Sparkman and Steve Stein showed me some new exact mass search capabilities introduced with the NIST 11 Library at ASMS conference in Denver.  They thought that these new capabilities might be useful in the identification of “known unknowns.”

An unknown to an investigator, in many cases, is often known in the chemical literature.  We refer to these types of compounds as “known unknowns.”  The term originated from a quote by Donald Rumsfeld:


Recent changes in the software allow one to search by accurate mass values with an associated error window.  The results allow one to display and sort the number of synonyms or associated databases.  Often the most likely component would be the one with the highest numbers for these fields.  Steve Stein refers the importance of this type of information in the identification of unknowns as “prior probability.”

I evaluated the software using a group of compounds that I had employed previously to evaluate ChemSpider and CAS.  The results were very encouraging.  I have attached my initial results and some instructions for searching Warfarin’s exact mass.  The approach was very successful for drugs, pesticides, natural products.  However, many polymer antioxidants and UV stabilizers were not found.  Thus, the choice of which database to search, NIST/ChemSpider/or CAS, will be dependent on the user’s needs.

The molecular weight distribution is somewhat lower for the NIST database than either the CAS Registry or Chemspider.  Thus the latter two databases will probably be more useful for higher molecular weight compounds found in LC-MS analyses.


We heavily use NIST for searching both our EI and MS/MS and searching commercial EI and MS/MS databases.  The current version of NIST includes 64,511 MS/MS spectra that can be very useful for identifying “known unknowns.”  Furthermore, we routinely run the same sample by LC-MS and GC-MS and utilize EI spectra in addition to our in-source MS/MS spectra.

Here is a file showing how to use the new NIST exact mass search for identifying a “known unknown” by accurate mass.  Of course, one can also search by molecular formula.


Here are the results from my evaluation with 90 compounds:


Links for our two papers describing our work with CAS and ChemSpider are shown below:



Return to Home Page

Return to Home Page

An unknown to an investigator, in many cases, is often known in the chemical literature.  We refer to these types of non-targeted species as “known unknowns.”  The term originated in a much different context in a quote by Donald Rumsfeld.

ChemSpider is a very valuable source of known substances.  It contains >28 million compounds which can be searched by a variety of parameters and includes valuable links to other web-based resources.


Here is a graph showing the molecular weight distribution of species:


ChemSpider is owned by the Royal Society of Chemistry (RSC) and provided as a free resource to the community.  We refer to this type of database as “spectra-less” because it contains no computer-searchable EI or CID mass spectra.  However, very useful results can be obtained searching this database by molecular formula or monoisotopic mass.

Eastman collaborated in the modification of ChemSpider’s web-based interface to facilitate the entry of monoisotopic molecular weight data and to sort the resulting search results by the number of associated references.  The use of monoisotopic mass does not require subjective limitations of the number of elements, types of elements, double bond equivalents or even the type of ion adduct in obtaining candidate structures.

Eastman had previously demonstrated the utility of using the number of associated references as a useful approach in the identification of unknowns with the CAS Registry.  The resulting candidates are further evaluated by their mass spectral fragmentation and a wide variety of other associated analytical information to arrive at the identification of the “known unknown.”

ChemSpider Prepress Journal Article with Screenshots Illustrating Examples:


Journal of The American Society for Mass Spectrometry, Volume 23, Number 1, 179-185, DOI: 10.1007/s13361-011-0265-y

I am very impressed in MasterView software developed by Sciex to use ChemSpider in their work flow for the identification of unknowns, see:

Sciex Software Using ChemSpider

MasterView Videos

Link to ASMS Poster at ASMS Conference 2011, Denver, Colorado:


Additional Related Information:

A similar approach was previously demonstrated utilizing the CAS Registry which was searched by SciFinder.


NIST offers a similar approach as part of their NIST MS Search Software employing a somewhat smaller collection of compounds.


Return to Home Page

Return to Home Page

We don’t have extensive information on the long-term precision and accuracy of our TOF instruments.  However, we have performed some basic studies to understand the limitations of the LCT and GCT.  Our Waters’ instruments are based on TDC (time-to-digital converters), which affects the accuracy and the precision of the instruments.


The TDC approach has two significant limitations:

1)  When two or more ions hit the detector in one flight cycle the TDC counts them as one event.

2)  When two ions hit the detector within a certain time interval, the TDC does not count the second ion (dead time loss)

The LCT and GCT correct for this problem via software.  Our ASMS poster session addresses these limitations and Excel ”program” models the effect.


deadtime_model (Excel)

Here is a good literature reference that discusses the topic:

Chernuschevich, I. V.; Loboda, A. V.; Thomson, B. A. Special Feature Tutorial: An Introduction to Quadrupole-Time-of-Flight Mass Spectrometry. J. Mass Spectrom. 2001, 36, 849-865.

Link to Paper

Return to Home Page

Return to Home Page

Eastman has had a vision for using databases to identify compounds using mass spectral data for the last 35 years.  The approach involves partnerships with many different organizations.  We currently use internal (NIST) and external searches (ChemSpider, CAS Registry) of databases.


The NIST search software offers a very cost effective means of sharing our EI and CID databases within the Eastman Worlwide Corporate Network.  Here is a diagram that outlines its basic attributes:


We use the NIST search with a variety of data processing programs including Xcalibur, WSearch, AMDIS, MassLynx, Agilent MassHunter, and Agilent MSD and with a variety of structural drawing packages.  Here is a link for detailed information on utilizing the NIST search with these other programs:

Using NIST with Other Programs

Automatic Updating of Database Nightly and System Manager Information:

We add new EI and MS/MS spectra with structures to the database daily.  The data base can be searched by mass spectrum, name, MF, MW, structure, etc.   The process employs DOS batch files, some various utilities from NIST, and a Windows scheduler program.  The following document gives details of the process for our automatic nightly update and a general overview of the responsibilities of the system manager:


Here is a zip files with some of the necessary utility programs:


Interfacing to Manufacturers’ Data Processing and Drawing Programs:

As I mentioned above, the program can be easily interfaced to an instrument manufacturer’s data system.  I have included below a simple descripition that I wrote and a more detailed one from David Sparkman, a NIST consultant.



User Manuals for Using NIST Search and AMDIS:

NIST has manuals for both AMDIS and the NIST Search.  In addition, I wrote a manual for using the NIST search software for our Eastman users:




New Additions and More Information about NIST Search and AMDIS:

Separation Science


Overview and History of NIST Library:

Steve Stein summarized the strengths and pitfalls of library searching

Stein Paper

Here is an interesting link to the history of library by Steve Heller:


Return to Home Page

« Newer Posts - Older Posts »