Our approach works well for the majority of our samples which are fairly simple and contain components with molecular weights <500 daltons. On the other hand, improvements are needed for complex samples and components with molecular weights >500 daltons. I have discussed some of my thoughts and opinions below and a good review of small molecule identification was recently published.
1. CAS Registry/SciFinder: The major improvement needed in SciFinder is the addition to search by monoisotopic mass within a specified m/z error window. This would be especially useful for higher molecular weight components, but could be used effectively for all components eliminating the need for any subjective decisions in specifiying the elements searched, numbers of double bond equivalents, numbers of elements, etc. in the determination of a molecular formula. The resulting structures from the monoisotopic searches could be returned to the manufacturers’ software and further ranked/sorted by their isotopic fidelity and mass error.
Recently, CAS did add the ability to search by average molecular weight in SciFinder. This capability was only previously available in STN Express. This is useful, but the monoisotopic mass can be determined with much better precision and accuracy than the average molecular weight. At lower m/z values too many candidates are recovered due to the very large number of components in the CAS Registry between 300-500 daltons and the large measurement error.
ChemSpider has an applications program interface (API) which can be used by manufacturers to query the ChemSpider database and return results to their interfaces for further processing/displaying. A similar type approach would be very useful for the CAS Registry. The Registry does currently allow one to manually export a reasonable number mol files (structural connection tables) in an SDF file format. It would be very desirable to also included other fields in the SDF file such as the number of associated references.
Other needed improvements (limitations) are found on page 12 of a prepress article found on this website.
2. Chemspider: There are many duplicate structures in ChemSpider for the same compound. CAS Registry does a better job of listing a unique structure for each record. Several other suggestions for ChemSpider are found on page 8 of a prepress article.
3. In-Silico Fragmentation: Many programs exist for the in-silico fragmentation of structures including MS Interpreter (NIST), MetFrag, Molecular Structure Correlator (Agilent), MathSpec (D. Sweeney), and Fragmentation Prediction Tool in PeakView Software (AB Sciex). Much more needs to be done in this area to improve the reliability of ranking candidate structures by comparison of in-silico spectra to acquired CID spectra. A section in a review article discusses the current state of in silico work.
4. Automated Processing and Reporting: Manufacturers are improving their automatic processing and reporting for LC-MS and GC-MS, but we still spend an unacceptable amount of time hand-annotating chromatograms for customers. Much of this could be avoided if manufacturers would further improve their software.
It would be very powerful if both ChemSpider and CAS Registry used application program interfaces to obtain structures and associated numbers of references which would be returned to the manufacturers’ software programs. They could then be ranked by individual scores considering the number of associated references, isotopic fidelity, mass accuracy, isotopic spacing, elemental composition from fragmentation data, in-silico fragmentation, commercially availability, etc. Then an overall score could be assigned by weighting these individual scores to allow sorting of the candidate structures.