I Don’t Know Your Name, But You Look Familiar: Using Chemical Structure Searches

Searching for information on complex organic molecules can be an onerous task. Many organic compounds have both common and systematic names. For example, caffeine can also be known as guaranine, methyltheobromine, 1,3,7-trimethylxanthine, theine, or 1,3,7-trimethylpurine-2,6-dione. Researchers performing comprehensive literature searches will be interested in research involving not only a specific compound but also similar compounds or other molecules containing the same substructure. Searching on a specific systematic name may inadvertently omit stereoisomers, tautomers, molecules containing the same substructure of interest, or molecules differing by the substitution of as little as one atom. Conceiving of all appropriate synonyms for a name-based search would be a Herculean task for both an organic chemist and a science librarian.

Fortunately, a number of databases provide an alternative to name-based organic molecule searching. Searching by chemical structure allows for the quick retrieval of information and bibliographic references about particular chemical compounds, similar compounds, and relevant reaction information.

Databases

There are a number of databases and programs that allow chemical structure searching. The list below is far from comprehensive but highlights a small sample of the available choices. The first two listed are freely available on the web, while the last two are common subscription databases.

ChemSpider: Owned by the Royal Society of Chemistry, ChemSpider contains information about over 43 million chemical structures. The database contains information about, and links to, physical properties, spectra, and literature references. ChemSpider is free to use, although not all linked content is freely accessible.

PubChem: PubChem is maintained by the National Center for Biotechnology Information, part of the United States National Library of Medicine. PubChem links three databases covering chemical compounds, substances, and bioactivity results. Search results also link to the PubMed database to find literature references and published studies. PubChem is free to access.

SciFinder: SciFinder is a subscription database from the Chemical Abstracts Service of the American Chemical Society. SciFinder provides both physical property information and bibliographic references, and also has the capability to search patent information.

Web of Science: Web of Science is Thomson Reuters’ subscription citation indexing service that provides comprehensive citation searches in a number of disciplines. Structure searching may provide both reaction results and compound record results.

Types of Searches

In addition to searching for the exact structure presented, there are several other options for searching, depending on the individual database. Searches can be made of compounds that are isomers and tautomers, compounds that have the same skeleton structure as the molecules, compounds that incorporate the molecule as a substructure or include similar substructures, or compounds that are similar in structure. When similar compounds are searched, the searcher will often have the ability to determine the degree of similarity, allowing the scope of the search to be increased or decreased.

Loading the Structure

Most databases which support structure searching provide a number of options for loading the structure of interest. The screenshots below are all taken from ChemSpider.

Drawing: Structures can be drawn directly using the database’s interface. This option can be used for simpler molecules, non-standard molecules, and substructure searching. The drawing tool can also be used to modify more complex structures that have been imported through other means.

Structure Name: For larger or common structures, the option exists to load the structure based on the substance’s common name, chemical name, trade name, or registry number in a number of chemical identification schemes. Chemical structure strings under the SMILES and InChI standards can also be used.

Importing Files: Structures can also be uploaded in various formats including formats specifically designed for chemical structures (.MOL, .SDF) and general picture formats (.JPG, .GIF, .PNG). Other formats may be supported.

Search Results

The types of results available, as well as the filters for narrowing results list vary by database. As you can see from the ChemSpider results, the basic results list includes names of the compound, physical properties, spectra, vendors, and bibliographic references. Access to the articles will depend on institutional subscriptions, as not all of the articles are open access.

Structure searching alone is not a substitute for traditional searching methods, but chemical structure searching is a very useful tool for finding relevant results as it greatly simplifies the ability to search for similar molecules without needing to determine numerous possible synonyms or the names of similar molecules for a word-based search. Using structure searching in conjunction with other search techniques increases the reliability of searches and improves the confidence that a literature search has been sufficiently comprehensive to find all of the results that are of interest to the researcher.

Eric Prosser, Science Liaison Librarian, Fort Lewis College, ejprosser@fortlewis.edu

We welcome your comments and suggestions. If you have a resource that you would like to see highlighted please leave us a comment.

One thought on “I Don’t Know Your Name, But You Look Familiar: Using Chemical Structure Searches

Leave a comment