Chemical Information Sources/Chemical Name and Formula Searches
- 1 Introduction
- 2 Subscription Databases
- 3 Freely Available Databases
- 4 Freely searchable; subscription required for data
- 5 Summary
Although structure searching is generally the only definitive way to search for chemical substances, searching by substance identifiers (chemical names and various identifying numbers) or molecular formula can be convenient or, in some cases, necessary for print sources and electronic sources lacking structure search capabilities. Certainly, one can type in ‘aspirin’ much faster than drawing out its structure. However, depending on the database, name searching may require an exact match right down to the punctuation and spacing. More complex chemicals may have only systematic names that tend to be quite lengthy or the particular synonym one searches for may not be in the database being consulted. In addition, closely related compounds may be missed. A search for ‘1,2-dichloroethene” may not pick up the cis- and trans- isomer records, but only the generic mixed isomers or non-specified substance record. Where chemical name segment/fragment searching is available, one can sometimes retrieve far too many records.
Although many resources have a molecular formula search option, molecular formulas are far from unique, retrieving anywhere from a few to thousands of compounds. Molecular formula conventions vary from database to database, especially in regard to inorganics and multicomponent substances like salts, organometallics, polymers, and complex oxides. It is also easy to make a mistake in counting up atoms or entering the molecular formula into a search box. However, one advantage of molecular formula searching is that one will pick up all the small variations in a compound such as isotopes, tautomers, mineral forms, and stereo differences.
There are a number of excellent substance identifier code systems, with the CAS Registry Number (CAS RN) and the InChI codes (IUPAC) currently being among the most prominent. Each system assigns a unique identification code to a specific structure or named substance. Although these systems are very helpful, one must keep in mind that the slightest variation in structure (isotopes, stereochemistry, ratios, etc.) will be assigned a different code number. Hence, the precision of these identification codes is both their biggest advantage and, at times, their biggest disadvantage. However, they generally are a fast and comprehensive way to retrieve all information on that specific structure or named substance such as a trademarked commercial material.
Many substances, especially in commerce, do not have known, explicit, or fully defined structures or molecular formulas. In those cases, one must rely on nomenclature and substance identifier codes such as the CAS RN.
Because of the many variations in format, search capabilities, and database conventions, it is highly recommended that users:
1) Consult the search help documentation for any resource they are searching, and 2) Test the system by searching the substance identifier or molecular formula for a simple, very common compound like formaldehyde or aspirin to make sure one is searching the system in the correct way and retrieves the appropriate record(s).
If a) the substance identifier or molecular formula search fails to retrieve any hits, b) the structure of the compound is known, and c) the resource has a structure search feature, a structure search should be performed to verify that the database does not contain any information on the desired compound.
Subscription databases provide carefully curated, professionally maintained information.
Chemical Abstracts Service (CAS) provides the most widely distributed and best known chemical databases in the world. They offer their main substance (File REGISTRY) and main literature database (File CAplus) on many different platforms, both third-party systems and ones they are directly involved in. CAS sponsored platforms are:
1) SciFinder, an easy-to-use, powerful, flat rate platform broadly focuses on chemical, material science, and biomedical information and offered to academic, non-profit, and corporate institutions
2) STN International, a broad-based platform with scores of databases from many vendors including CAS tied together by a full-featured, highly sophisticated search system permitting full use of Boolean logic to combine terms and scores of search sets in complex patterns, if desired.
Substance information on both platforms originate from File REGISTRY data, although in SciFinder, this database is accessed through a link labeled “Explore Substances”. SciFinder is designed to be used directly by chemists and is very common in both academia and industry. Most of rest of this section will assume access to CAS content through the widely used SciFinder interface.
Despite the power of the SciFinder interface, STN International offers advanced search features for more expert users not available in SciFinder such as searching elemental composition, material composition tables including weight percent ranges, ring system data such as number of rings, and direct access to all compound class identifiers including incompletely defined substances. These special power techniques are beyond the scope of this introductory chapter. STN International users should consult documentation on File REGISTRY such as the STN database summary sheet.
In SciFinder, under the ‘Explore Substances’ search option, ‘Substance Identifier’ search terms include CAS Registry Numbers, chemical names or name fragments, and codes. Multiple terms (up to 25) can be searched simultaneously by entering each on a separate line. CAS provides interactive tutorials and "how to" guides including:
Although CAS likely provides the largest and most carefully constructed inventory of known substances in the world,searchers requiring an exhaustive search for purposes such as determining the novelty of a structure for patent purposes are advised to consult as many additional sources as possible including Markush structure searchable databases, patent databases, and Reaxys which may include compounds, especially those reported prior to 1907 and not post-1907 in the literature.
CAS Registry Number Searching
The Chemical Abstracts Service Registry File (CAS REGISTRY) is the largest single collection of data for identification of chemical substances.
The CAS Registry Number (CAS RN) is a number of the format Y-XX-X, where Y can be from two or more digits, and X is a single check digit, for example, 494-12-2. CAS Registry is the final authority since it is the only database where replaced or corrected CAS RNs are linked to the current and correct CAS Registry record.
CAS Registry coverage was originally based on substances identified from the scientific literature from 1957 to the present with some classes (fluorine- and silicon-containing compounds) going back to the early 1900s. Recently, CAS assigned registry numbers retrospectively to identified substances indexed in the CAplus File from 1907-1966, but only assigned the CAS Role PREP. A Database Counter provides information on the cumulated number of registered substances and biosequences.
CAS RNs are the unique identifier for CAS REGISTRY records which are created for: organic and inorganic substances, metals, alloys, minerals, polymers, coordination compounds (2), elements, isotopes, peptides, enzymes, bio-molecular sequences, and nuclear particles.
Brief CAS Registry records display a CAS RN, structural diagram, molecular formula, and the CA index name, with links to all References (CAplus file records), Reactions (CAS REACT), Commercial Sources (CHEMCATS), Regulatory Information (CHEMLIST), Spectra and Experimental Properties.
Clicking on the brief record CAS RN displays the full CAS REGISTRY record, which includes additional CA Index Names and synonyms, deleted registry numbers, a link to references with a tabular display of the CAS Roles for limiting retrieval (e.g. preparation, uses, etc.), Predicted (ACD/Labs) Properties and Experimental Properties (including NMR, IR & Mass Spectra) with literature references.
Inorganic compounds and alloy records may display composition tables, and biomolecule records may display protein or nucleic acid sequences.
Thus, one can use CAS Registry to locate:
• literature references related to the substance • experimental and predicted physical property data • commercial availability • preparative methods • spectra (MS, IR, NMR, UV) • regulatory information from international sources
Some CASRNs do not have any literature references. This, because companies can obtain CASRNs (through CAS Client Services) before that substance appears in literature, or because CASRNs are assigned to substances from chemical catalogs, external substance collections on the web, or from compiling chemical inventories (e.g., TSCA for the EPA) or when CAS registers a compound from a journal or patent literature source that is a salt, two Registry Numbers are created: one for the salt and one for the parent compound. The Reference Link is only displayed for the salt, not for the parent compound.
CAS RNs (which are linked to their CA Registry File records) appear, in the substance indexing for CAplus File records, instead of a CA Index Name. For example, 107326-35-2 instead of:
1H-Pyrido[3,4-b]indole-1-carboxylic acid, 2-(3-butenyl)-2,3,4,9-tetrahydro-
Decisions on indexing substances are based on CA’s indexing philosophy, which focuses on new information and the main points of an article. This can lead to unexpected results. For example, in an article on 'the effect of different cations on the IR spectra of Mo(CN)8 complexes' (CAN 111:122893), each individual salt (e.g., Tripotassium octacyanomolybdate(3-) trihydrate) is indexed. However, an article on the 'kinetics of the permanganate ion-potassium octacyanomolybdate(IV) reaction' (CAN 80:137539) only the Octacyanomolybdate(IV) ion is indexed, not the Potassium Octacyanomolybdate(IV) salt mentioned in the title.
CASRNs are commonly found in chemistry handbooks (e.g., Merck Index, CRC Handbook of Chemistry and Physics, Lange’s Handbook, Combined Chemical Dictionary, etc.), chemical supplier catalogs (e.g. Sigma-Aldrich, Strem, Lancaster, etc), and journal articles. They are excellent search terms for specific chemical substances.
CASRNs, however, are simply accession numbers. Thus, chemical derivatives, salts, etc. are NOT linked to their parent compounds as they are in the freely searchable Combined Chemical Dictionary.
Molybdate(3-), octakis(cyano-κC)-, potassium (1:3), (DD-8-11111111) or as it is more commonly known, Potassium octacyanomolybdate(V) < K3Mo(CN)8 > is a good example of the difficulty in comprehensively searching for various salts, as CAS has separately registered the salt, both its hydrates and the anion:
Similar difficulties are encountered with isomers, since each isomer, the racemic mixture & the unspecified ‘generic’ compound will have different Registry Numbers:
Sugars are registered as both open-chain and ring structures with different CASRNs:
In 2008, CAS entered into a cooperative venture with Wikipedia to provide CAS Registry Numbers for chemical substances of widespread general interest. The result is Common Chemistry, a Web resource where approximately 7,900 substances can be searched without cost by chemical name or CAS Registry Number. Entering the CAS RN for Isatin, 91-56-5, brings up a record with the CAS Preferred Name, 1H-Indole-2,3-dione, 18 other names for Isatin, the molecular formula, a 2D structural drawing, and the link to the Wikpedia article on Isatin.
Chemical Name/Name Fragment Searching
The CAS substance dictionary (Explore Substances in SciFinder and the REGISTRY File in STN International)is the largest single source of chemical names in existence. It includes trade, common, inverted, non-inverted, laboratory code, obsolete, and official CAS Index names. Whole names and name fragments (segments) can be searched with varying degrees of specificitiy depending on which search platform is used. Often one must follow certain protocols for special characters that are part of names. Greek characters, for example, are spelled out in their entirety and may have a period before and after the Greek part of the name. Note that in the SciFinder system, the search will work with or without the periods around the Greek leeter, but in STN command-language searching, the dots are mandatory.
As the rest of this section will make clear, searching chemical names is tricky. Zero results does not mean the compound is not in the database. The name may not be present in the database, a simple typo input, or search conventions inadvertently not followed. Only a properly done structure search, CAS registry number, or molecular formula search is conclusive.
Chemical Abstracts’ (CA) chemical nomenclature has changed over time since 1907. Thus, substances may have several CA Index Names, as well as synonyms used in the literature and in commerce. Until late 2006, major changes were made only at the beginning of each collective index period. Beginning in 2007, however, changes to CA Index Names are made as needed.
In the print CA, the Chemical Substance Index (CSI) linked the indented forms of CA Index Names, for individual substances (e.g., Benzene, azido-), to their relevant abstracts. Compound class names (e.g., Aryl Azides), however, were indexed in the General Subject Index, and synonyms and trade/common names were only related thru the CA Index Guide.
For some background information on CA Index Names, see: Charles H. Davis's Chemical Nomenclature Lite.
CAS Registry records may contain a variety of older CA Index Names, synonyms and codes, especially for commercial chemicals. For example, the CAS Registry record for Benzene also includes the following searchable terms: 1,3,5-Cyclohexatriene; Benzol; Benzole; Coal naphtha; Cyclohexatriene; NSC67315; Phene; Phenyl hydride; Pyrobenzol; Pyrobenzole; and Annulene.
Superscripts/Subscripts are searched as normal characters and Greek letters are spelled out:
Dicholormethane-d2 for Dichloromethane-d2 alpha-Acetylnaphthalene for α-Acetylnaphthalene
When searching SciFinder, a ‘chemical name’ is first searched for an exact match with a CA Index Name or a synonym and, if not found, the search term is truncated or segmented, and then searched again. When searching with CA Index Names, all characters must be searched:
Benzene, 1,4-dibromo- not Benzene, 1,4-dibromo
Even fairly complicated names can be searched. For example:
is not a CA Index Name or listed as a synonym, although it is the direct form of the CA Index Name. Thus, it is fragmented and the search retrieves 8 CA Registry records including the record with the CA Index Name:
“1H-Pyrido[3,4-b]indole-1-carboxylic acid, 2-(3-buten-1-yl)-2,3,4,9-tetrahydro-“
Care must be taken when searching with synonyms. For example, a search for ‘Potassium Octacyanomolybdate’ retrieves 11 substances. Some examples are:
- Aluminum potassium octacyanomolybdate(IV)
- Cobalt potassium octacyanomolybdate(IV)
- Gallium potassium octacyanomolybdate(IV)
- Iron potassium octacyanomolybdate(V)
- Nickel potassium octacyanomolybdate(IV)
- Potassium octacyanomolybdate(IV)
But, it does not retrieve all related substances, since the terms ‘Potassium’ and ‘Octacyanomolybdate’ may not always be present in a synonym name.
However, since all substances related to ‘Potassium Octacyanomolybdate(V)’ will have similar CA Index Names, one technique is to display its CA Index Name: Molybdate(3-), octakis(cyano-κC)-, potassium (1:3), (DD-8-11111111)- and then search various CA Index Name fragments:
- Molybdate(3-), octakis(cyano-κC)-, potassium retrieves Potassium octacyanomolybdate(V), & its di- and tri-hydrates.
- Molybdate(3-), octakis(cyano-κC)- retrieves ~28 substances, a variety of cations & the Mo(CN)8 anion.
- Molybdate(3-), octakis(cyano-κC)-, (DD-8-11111111) retrieves ~9 substances, a variety of organic cations & the Mo(CN)8 anion
- Molybdate(3-), octakis(cyano-κC)-, (DD-8-11111111)- retrieves only the Mo(CN)8 anion
Searching for the synonym fragments:
‘octacyanomolybdate’ retrieves ~68 substances. ‘octacyanomolybdate(IV) retrieves ~14 substances. ‘octacyanomolybdate(V) retrieves only the anion
Searching for synonym fragments is not reliable since synonyms are not added to all CAS Registry records. These name fragment search results generally contain a variety of inorganic salt combinations, hydrates, anions, and mixed salts with organic compounds.
Similarly, a search for ‘glucose’ only retrieves the open-chain substances: Glucose and D-Glucose, because it is a full name synonym for both substances. However, Glucose is not a synonym for β-Glucose, the ring isomer. Similarly, a search for ‘propanol’ only retrieves: Propanol [unspecified] and 1-Propanol, and not 2-Propanol whose synonym is iso-propanol.
It is also possible to search for name strings; e.g., ‘Molybdenum, compd. with nickel’.
In general, structure searching is preferred since substance identifier searching requires an exact match and may often fail to retrieve all relevant substances.
Codes include GenBank Numbers, Enzyme Commission Numbers, Colour Index Numbers, etc. CAS has a standard policy on Code numbers.
• Letters followed by numbers require a space [URB597 --> URB 597].
• Punctuation between like [numbers-numbers or letters-letters] is retained, although 1,000's commas are removed.
• Numbers followed by letters are required to be closed up.
• Punctuation between unlike [letters-numbers or numbers-letters] are removed except as in 1 above. Where the numbers are clearly specified locants, hyphens are retained; e.g., 2,4-D.
• Smiles and InChI strings are used in the structure editor to generate structures.
A molecular formula search will generally retrieve more than one substance because of possibility of isomeric compounds. For example, CAS Registry lists over 1600 substances with the formula C22H24FN3O2
The print CA provided a Formula Index that linked a chemical formula to its inverted CA Index Names and their relevant abstracts. Molecular formula searching in the print CA was based on the Hill System. The Hill System lists Carbon, if present, followed by Hydrogen and then any additional elements in alphabetical order (e.g., C22 H24 F N3 O2). In the absence of Carbon, all elements are listed in alphabetical order (e.g., Al6 Ca5 O14). This gives rise to molecular formulas quite different from normal conventions in the literature, e.g. H2O4S rather than H2SO$ for sulfuric acid.
SciFinder, however, is designed to search for substances without regard to element order or spaces between elements. For example, H4SiO4, H4O4Si and H4 Si O4 are all acceptable search terms, as is any combination of C22H24FN3O2. At times, the system will request that spaces or upper/lower case be input to clear up ambiguities, e.g. 'COS' could be Cobalt-Sulfur or Carbon-Oxygen-Sulfur.
Understanding the concept of dot-disconnected formula for salts (e.g., C15 H24 N2 . 2 Cl H), addition compounds (C6 H6 . C6 N4), and mixtures is essential for molecular formula searching. Additional rules/conventions cover polymers, coordination compounds and an important subset of coordination compounds, organometallics. Phenyl Lithium (MF: C6 H5 Li) is an example of an organometallic.
The conventions for molecular formula assignment and searching in Chemical Abstracts databases are complex, especially for complex substances like complex salts and polymers. It is not always obvious when substances receive dot-disconnected (multicomponent) molecular formulas. Correct query formulation is also dependent on the search platform used. It is important to consult documentation specific to the platform being used as well as additional resources listed at the end of this chapter.
It is also possible to search for molecular formula embedded in chemical names (i.e. as a Substance Identifier), but only if the search retrieval is less than 100 records (e.g. CuSO4 ~ 15, while NaCl > 100 and gives zero results)
- 1. Salts:
Simple salts like sodium chloride are searched as: < NaCl > or < ClNa >
Inorganic salts of oxy-acids, like Calcium Sulfate or Barium Phosphate, must be searched as: < Ca . H2 O4 S > or < Ba . 2/3 H3 O4 P >. This reflects the print volume policy, where all salts of Sulfuric or Phosphoric Acid, for example, were listed together under the formula of the acid (H2O4S or H3O4P).
This policy also applies to simple organic salts like Sodium Benzoate which are searched as: < C7H6O2 . Na >, where again the MF for Benzoic Acid is a search term.
Complex organic/organometallic salts are searched by entering the MF of the cation . x(MF of the anion), where x=the number of anions.
|Tris(2,2'-bipyridine)iron(2+)bis(tetrafluoroborate)||C30 H24 Fe N6 . 2BF4|
|Tetrakis(tetrabutylammonium)octacyanomolybdate(4-)||C16 H36 N . 1/4C8MoN8|
One can see from the above two examples that:
- The organic portion is treated as a neutral molecule, including the acidic hydrogen atoms.
- The metal is viewed as a separate, unattached fragment.
- The ratio between the organic acid and the metal atom is expressed. (If unknown, the ratio is expressed as "x".)
Other examples of salt MF's:
- Unknown ratio: C6 H8 O7 . x Na
- Mixed metal salt: C6 H8 O7 . Ca . Na
- Metal salt of an alcohol: C6 H6 O2 . 1/2 Ba
- 2. Multicomponent Substances:
Multicomponent Substances include salts, hydrates, addition compounds, mixtures, alloys, minerals, and intermetallics, where each component with a known structure may have its own connection table, i.e., structure. These component structures, however, may give no indication as to how the components are bonded together.
Minerals and alloys are examples of Multicomponent Substances. A molecular formula search for Kaolinite “Al2 H4 O9 Si2“ will retrieve both mineral (Kaolinite, Nacrite, Dickite) and non-mineral substances.
Alloys with a fixed number of elements are searched with dot-disconnected formula. For example, [Fe . Mn . Ni] retrieves >1000 substances, each with varying percent composition (e.g., “Nickel alloy, base, Ni 70,Fe 20,Mn 10” which is a searchable CA Index Name).
Tabular Inorganic Substances include Iron Hydroxide, which is an example of how CAS indexes substances with indeterminate or multivalent cations. For example, to search for all Iron Hydroxides:
< Fe . H O > retrieves 8 substances [e.g. Iron hydroxide (Fe5(OH)12)]
For additional information, see: http://library.caltech.edu/learning/classhandouts/scifinder.pdf
- 3. Elemental Composition Searching:
Despite the power of the SciFinder interface, certain more advanced search features are available only on platforms designed for more expert searchers such as STN International. Elemental composition searching illustrates this point. The following examples, using the STN International syntax, are derived from the molecular formula field:
- Element Symbol, indicating the presence of an element (/ELS), e.g., => S B/ELS and H/ELS
- Element Count, to specify the number of unique elements in a component or substance (/ELC or /ELC.SUB)
- Element Formula, the molecular formula of components without the numbers that depict the ratios (/ELF), e.g., => S AL CO LA O/ELF
- Periodic Group, the column and row designations for elements, e.g., => S B6/PC or => S LNTH/PG
Compound Class Identifiers
The main search query screen of SciFinder permits searches to be limited by some, but not all of the various classes of substances, as defined by Chemical Abstracts Service. However, all of these can be directly searched and used to limit a set of compounds to a specific class or classes.
|Incompletely Defined Substance||IDS|
|Manually Registered Substance||MAN|
An example of the use of the CI field in STN International searching is: => SEARCH PMS/CI (retrieves polymers)
Such searches are of use in combination with other Registry File searches in order to narrow an answer set. See the REGISTRY Database Summary Sheet for additional possibilities.
ROLES are standard CAS indexing terms assigned to every indexed substance and to controlled index terms for classes of compounds. These roles specify what the type of information provided about the substance in the given literature reference. The original nine roles were analytical study; biological study; nonpreparative formation, miscellaneous, occurence, preparation, properties, reactions, and uses. In the old print Chemical Substance Indexes, these roles were used to subdivide long listings of abstract numbers under more commonly cited substances.
In SciFinder, these roles have been expanded to 15 in number and retroactively assigned back to 1967. The Preparation role is assigned back to the beginning of the database, 1907. Whenever one requests literature references based on a set of retrieved substances, one has an option to limit to any role or set of roles. Definitions for these roles are given in SciFinder's Roles for Substances When Retrieving References Definitions.
In STN International, the 15 roles are subdivided into finely tuned searchable categories. For example, it is now possible to specify not just preparation, but specifically either smaller scale synthetic preparation or industrial manufacture. A complete list of roles and subroles (called 'super roles' and 'specific roles' in STN documentation) is given on the last page of the CAS Roles in CA/CAplus Quick Reference Card.
1. Wagner, A.B. 2011. Searching Coordination and Organometallic Compounds in SciFinder. Issues in Science & Technology Librarianship 67 (Fall 2011). [Internet]. [Cited March 17, 2012]. Available from: http://www.istl.org/11-fall/tips.html
2a. Kozlowski, A.W. 1986. Introduction to Coordination Compounds. In Searching Coordination Compounds, Chapter 2, pages 5-10. [Internet]. Chemical Abstracts Service, 1986; [cited 3/15/12]. Available from: http://www.cas.org/File Library/Training/STN/User Docs/searchcoordcomp.pdf
2b. Kozlowski, A.W. 1986. Structuring and Registration Policies for Coordination Compounds. In Searching Coordination Compounds, Chapter2, pages 11-22. [Internet]. Chemical Abstracts Service; [cited 3/15/12]. Available from: http://www.cas.org/File Library/Training/STN/User Docs/searchcoordcomp.pdf
3. Wagner, A.B. 2011. Searching inorganic substances in SciFinder. Issues in Science & Technology Librarianship 64 (Winter 2011). [Internet]. [Cited March 17, 2012]. Available from: http://www.istl.org/11-winter/tips.html
Reaxys is a web accessible chemical compound database that combines:
- The online versions of Beilstein and Gmelin databases, originally created by German institutes by those same names
- Updated material supplied by various organizations under the auspices of Elsevier which now owns Reaxys
- A new English language (organic) Patent Chemistry Database.
Beilstein is based on Beilstein’s Handbuch der Organischen Chemie (1771-1980) updated by articles from over 200 organic chemistry journals since 1981. Gmelin is based on Gmelin’s Handbuch der Anorganischen Chemie (1771-1994) updated by articles from over 100 inorganic/organometallic chemistry journals since 1995. A few print Gmelin volumes were not included. The Patent Chemistry Database is limited to English-language United States (US, 1976+), World Patent Office (WO, 1978+) and European Patent Office (EP, 1978+) patents, assigned to International Patent Classification codes C07 (organic chemistry), A61K (pharmaceuticals, cosmetics & related products), and C09B (dyes). Elsevier publishes a listing of the journals and patents covered by Reaxys.
Additional patent coverage (1869-1980) comes from Beilstein and Gmelin records. Please note that many foreign patents may have US and/or UK equivalents in SciFinder. See Caltech Library’s Patents and Standards/Trademarks LibGuide for additional information.
Since Beilstein and Gmelin are chemical compound databases, structure searching for organic compounds and formula searching for inorganic compounds is preferred. Both classes of compounds, however, can be searched with chemical names or formula.
Reaxys provides numerous search options on both their Form-based and Advanced Property query screens. Search options are displayed by clicking on the ‘[+]’ symbol to generate a hierarchical dropdown listing.
Under the simpler ‘Properties (Form-based) / Identification Data’, one can search by the Reaxys and CAS registry numbers, chemical names including synonyms, and molecular formulas (MF). Under the ‘Properties (Advanced) / Identification Data’, one can additionally search in many other ways including chemical name segments, MF ranges and fragments, element counts, number of elements or components, molecular weight, alloy composition, and fields specific to individual ligands. For each search field, clicking the ‘[…]’ box displays an internal dictionary, a search box, and selectable search terms. The Reaxys alternative for truncation (wildcard symbol) feature is the ‘Chemical Name Segment’ field.
Molecular Formula searches can be done for exact Hill order molecular formula for single- and multi-fragment compounds. For salts, the molecular formula of the anion is separated from the cation with an asterisk. For example, the MF of the copper salt of phthalimide is C8H4NO2*Cu.
Note that many research-level science librarians have the original Gmelin and Beilstein volumes in print which may be in storage, especially if the institution subscribes to REAXYS. Once one learns how the print volumes are organized and indexed, an efficient and effective search can be performed, though not up-to-date. Beilstein ceased print production after partially covering the literature up through 1979. Gmelin ceased print volumes in 1997.
For additional information see:
Freely Available Databases
Both the number and quality of free chemical information resources available openly on the internet have greatly increased in the decade. Here are a few of the finest sources. Other high quality sources can be found by consulting a library guide from a major academic research university, such as the University at Buffalo's Chemistry: Internet Resources guide.
ChemIDplus Lite and ChemIDplus Advanced are freely available structure and nomenclature authority files for approximately 400,000 substances (about 70% have structure data) cited in National Library of Medicine (NLM) databases. Search fields include chemical names, synonyms, CAS RNs, and molecular formulas.
There are two search options:
Results, from both interfaces include File Locator Codes that are hyperlinked at the substance or web site level to biomedical databases at NLM, Internet resources, and to the Superlist compilation of Federal and state regulatory agencies.
ChemIDplus Lite provides limited ‘Basic Information’ and ‘Search Navigation’, while ChemIDplus Advanced has an expanded listing, that provides:
- an ‘Enlarge Structure’ link that re-displays the chemical structure, with a check-box to ‘Display 3D Model’,
- a ‘Structure’ link (under Basic Information) that re-displays the chemical structure, provides 3D Representation, and InChI and SMILES structure descriptor notations,
ChemIDplus, while listing only about 400,000 compounds, contains a significant number of searchable common and trade name synonyms.
For additional information see:
PubChem, part of NCBI's Entrez information retrieval system, is designed to provide information on biological activities of small molecules, generally those with molecular weight less than 500 daltons. PubChem is comprised of three linked databases:
- PubChem Compound – The default database for searching that includes all unique structures with computed properties. These compounds are extracted from the PubChem Substance database.
- PubChem Substance – Contains descriptions of over 8 million deposited substances; i.e., chemical samples, that have been submitted to PubChem from a variety of sources. All unique, clearly identifiable compounds in these samples are extracted and linked to a PubChem Compound record.
- PubChem Bioassay – a database of bioactivity screens of chemical substances described in PubChem Substance. Included are over 180 bioassays from a variety of sources.
PubChem links its records to biological property information in PubMed and NCBI's Protein 3D Structure Resource.
With the default basic query screen, one can searches PubChem Compound records by chemical names, synonyms, molecular formula, or CAS RNs. The extensive ‘Advanced Search’ provides for atom counts, chemical property ranges, stereochemistry, bioassay ranges, links to Entrez databases, and an option of searching for molecules containing specific elements. There is also the ability to limit the search to a long list of specific fields in the database. Note on the search screens that there are separate tabs or drop down menus to switch to the Substance and Bioassay databases.
For additional information see:
ChemSpider, now sponsored and maintained by the Royal Society of Chemistry (RSC), is a freely searchable chemical structure database, with over 26 million structures derived from hundreds of data sources, that offers three query screens: Simple Search, Structure Search and Advanced Search.
Simple Search requires ‘molecule related’ text strings, such as Systematic names, Synonyms, Trade Names, Registry Numbers, Smiles, InChI or ChemSpider ID (CSID). Although it is not indicated in the examples shown under the search box, molecular formula can also be entered and searched. The Advanced Search screen permits more complex and field-specific queries including specifying which elements may, must, or must not be present in the compound.
Search results include names, synonyms, database identifiers, predicted properties (ACD, EPA/EPI, ChemAxon), spectra; CIFs, Wikipedia articles, patents, pharmacological links, and much more, as available. ChemSpider compounds are linked to SureChem (a patent database) and to journal articles and books from a variety of sources including RSC, PubMed, and Google Books.
Compounds are also linked back to ChemSpider, but only from the RSC article landing pages (via the Compounds tab). Clicking on the Compounds Tab provides a choice of patents from SureChem (USPTO Granted & Applications, European Granted & Applications, WO/PCT, & Japanese Abstracts), RSC articles; and compound properties. Records may also have a link to reactions in ChemSpider SyntheticPages.
For additional information see:
Freely searchable; subscription required for data
The Combined Chemical Dictionary (CCD) contains over 160,000 entries with over 540,000 compounds. The CCD, as well as the Handbook of Chemistry and Physics, is freely searchable, although a subscription is required to display the data. Non-subscribers can use the web version as an index for the respective print volumes that are held in many libraries and laboratories.
The CCD combines the contents of the following separately searchable databases:
- Dictionary of Carbohydrates
- Dictionary of Inorganic and Organometallic Compounds
- Dictionary of Natural Products
- Dictionary of Organic Compounds
- Dictionary of Drugs (formerly PharmaSource)
The CCD has both chemical name (including synonyms and CAS RNs) and molecular formula searching. It also has a ‘Molecular Formula by Element’ search that allows searching for all compounds in the database with a specific number of elements (e.g., 3 As)
Each entry provides both descriptive and numerical data on chemical, physical and biological properties; systematic and common names; literature references; structure diagrams, derivatives and isomers. The CCD is unique in that it provides a variety of unique literature references from reference sources and databases, not covered in Reaxys or SciFinder ( e.g., Aldrich spectra catalogs; Fieser & Fieser's Reagents…; Ullmann's and Kirk-Othmer encyclopedias; Extra Pharmacopoeia; Bretherick's Handbook of Reactive Chemical Hazards; RSC's Hazards in the Laboratory; Sax's Dangerous Properties …; Browning's Toxicity and Metabolism …); or in Reaxys (e.g., Organic Syntheses, which isn’t indexed between 1980 and 2008; Encyclopedia of Reagents for Chemical Synthesis; and references to book chapters).
For additional information see:
- The Combined Chemical Dictionary Classroom Handout (Caltech Library)
- Introduction to the Combined Chemical Dictionary Online (CRC Press)
Chemical nomenclature is an area of expertise claimed by few chemists today, but there are powerful search capabilities in databases and printed reference works that make use of chemical names, both trivial and formal names. On the other hand, all chemists use molecular formulas, and a system such as the Hill System for arranging molecular formulas in an index provides a useful, how-be-it usually not unique, retrieval mechanism. Chemical Abstracts Service (CAS) and many third parties use the Registry Number to index documents in reference works and databases. The precision of Registry Number searching is unparalleled. CAS databases and Elsevier's REAXYS are very large, robust databases competing against each other and in many ways complementing each other. Increasingly free sources like PubChem and ChemSpider permit everyone to have access to extensive, quality information about substances.