Chemical Information Sources/Subject Searches

From Wikibooks, open books for an open world
Jump to: navigation, search

Introduction[edit]

Almost all abstracting and indexing services, not to mention many other secondary and primary works, have subject indexes. In this chapter we will look closely at the subject indexes for some of the major works already covered, as well as note the existence of specialized abstracting and indexing services devoted to a particular document type, and full-text databases of primary and other literature types. Discussion of the type of subject search that uses the name of a specific chemical compound is deferred to a later topic, although words that stand for classes of compounds are discussed here.

The searches dealt with here are subject or topic searches, as opposed to structure, identifier number, author name or other search types. Topic searches are rarely, if ever, perfectly straightforward. We must find just the right words and phrases to elicit the needed information from a given information source. The searcher needs to consider variant spellings, the use of initialisms and acronyms, synonyms, alternative expressions, and other complicating factors. Moreover, the way the search system interprets the search terms is critical. For example, does the search system interpret two adjacent words as a phrase with the words required to appear in that exact order, or does it allow them to appear in any order, perhaps separated by other words? Does it assume that the presence of any of the search terms is sufficient for a valid hit, or does it require all search terms to be present?

A fundamental and long-standing concept in topic searching is the difference between KEYWORD (uncontrolled vocabulary) searching, and SUBJECT (controlled vocabulary) searching. With the advent of online full-text material, the option of FULL-TEXT searching presents additional opportunities, as well as complications. Each approach has strengths and limitations, and the aim of this chapter is to highlight these, as well as to discuss the subject search capabilities of a few major resources.

Ideally, searches that employ search terms selected from the CONTROLLED VOCABULARY or a THESAURUS, or other subject term authority list, address the challenge of having to think of, and include in the search, all possible alternative expressions of a search concept, all variant spellings, acronyms, and other complicating factors listed above. Controlled vocabulary terms can

  • Connect alternative or interchangeable terms for a specific concept, saving the searcher from having to remember to include possible alternatives, acronyms, etc. (x-ray photoelectron spectroscopy for XPS; matrix-assisted laser desorption ionization for MALDI-MS)
  • Provide a single, unifying term for an abstract concept, a collection of related concepts (photolysis; heat transfer), or a class of compounds (steroids; antitumor agents)
  • Indicate that the concept expressed by the subject term is a major focus of the material, rather than a topic of minor importance (especially helpful are indicators of major emphasis, such as major MeSH headings)

One example of a controlled vocabulary is the Library of Congress Subject Headings (LCSH) used by many academic libraries. Another is the MeSH (Medical Subject Headings) list that is used with the National Library of Medicine's MEDLINE database. Chemical Abstracts Service (CAS) uses the Index Guide and Subject Indexes, along with Chemical Substance and Author Indexes, to control search terms in Chemical Abstracts, now no longer published in print.1

The distinction between uncontrolled (keyword) searching and searching using controlled vocabulary is important and is the main point of this lesson, but that distinction becomes somewhat blurred in a tool such as SciFinder, the web-based version of Chemical Abstracts. Though built on the foundation of the extensive CA thesaurus of index terms, SciFinder does not directly expose the CA Index, but only indirectly brings the searcher to relevant index terms through the Analyze operation on a set of search results. Moreover, the SciFinder search algorithm has some built-in intelligence to automatically search for both singular and plural subject words, account for spelling variants and common acronyms and abbreviations. The searcher simply types into the Research Topic search box the natural language expression that defines the search, without trying to insert Boolean search terms.

1The CA Lexicon on the STN system shows the underlying structure of the CAS vocabulary control system, with its hierarchy of broader and narrower terms, linked terms, previously used terms, and related terms.

Keyword and Full-Text Searches[edit]

Keyword searches, by contrast, require that the searcher consider and explicitly include alternative expressions, acronyms, spelling variants, and so on when creating a topic search,, without reference to an authoritative subject list. Keyword searches should not be confused with full-text searches, nor does a keyword search necessarily search every word of a bibliographic item record.

Full-text searches can be run in full-text article or content repositories, such as those on publishers' websites (e.g., the American Chemical Society's online journal website), or JSTOR, allow the searcher to retrieve articles that contain the search term anywhere in the actual text of the article. Naturally, this can result in a large number of only marginally relevant search results, so successful search experiences in these databases depend on sophisticated relevancy ranking algorithms that weight results based on frequency of the occurrence of the search term, where the search term appears (e.g., in the title or abstract) and other factors. Full-text repositories typically also permit the searcher to restrict the search domain to the title or abstract of the items, more closely approximating the basic keyword search in bibliographic databases such as Web of Science.

Unlike full-text content repositories, bibliographic databases such as Web of Science consist of records that represent and describe the article or other information content (patent, book chapter, technical report). These records can be more or less highly structured, and can include controlled vocabulary terms assigned by indexers, or other data elements (such as citation counts or author institutional affiliations). Keyword searching in bibliographic databases consists of searching across a computer-generated "keyword index" of significant words in the record, - typically words from the title or abstract, or keywords supplied by the author. In one variation on the 'keywords' concept, Science Citation Index (a subfile of Web of Science) has for a number of years included a feature called "KeyWords Plus." KeyWords Plus are words or phrases that appear with a high degree of frequency in the titles of an article's references, even though they may not appear in the title or abstract of the article itself, thus enhancing retrieval.

Controlled Vocabulary Indexes: Library of Congress Subject Headings and Classification[edit]

Library of Congress Subject Headings (LCSH) are commonly used in the Library catalogs of college and research libraries, and LC breaks the broad area of chemistry into sub-areas. The subject headings are hierarchical, so that a subject search for a broad term such as Heterocyclic Compounds will suggest narrower, more specific terms to the searcher (e.g., Furans; Pyridine). Items indexed with these narrower terms will not be automatically included in the broader search, however. LC subject headings can be further modified with qualifying words or phrases (e.g., Analysis; Synthesis) or with terms describing the genre or format of the material (e.g., Periodicals; Encyclopedias; Handbooks, Manuals, etc.); so to find appropriate works one could search phrases such as:

Chemistry Inorganic Encyclopedias
OR
Chemistry Organic Handbooks

Understanding the general pattern and hierarchical arrangement of the Library of Congress Subject Headings, and how these can be searched in the library's online catalog, can make research far more productive and worthwhile.

Classification Schemes
Of course, one option to find a relevant book, journal, or database owned or leased by a given library is simply to browse an appropriate section of the library's stacks, using the following table as a road map in a library that uses the Library of Congress classification system. However, many libraries are shifting more and more of their purchases towards online books which may or may not be assigned Library of Congress classes, but naturally do not appear on the physical shelves. One should consult with appropriate library staff as to how online books appear in the catalog and how to best search for them.

MAJOR DIVISIONS OF THE
LIBRARY OF CONGRESS

CLASSIFICATION SCHEDULE FOR CHEMISTRY
Subjects LC Range
Chemistry (General) QD 1-65
Analytical Chemistry QD 71-142
Inorganic Chemistry QD 146-197
Organic Chemistry QD 241-441
Physical and Theoretical Chemistry QD 450-801
Crystallography QD 901-999

For an LC classification number of many chemical topics, consult this list of chemistry terms linked to LC Class Numbers.

Controlled Vocabulary Indexes: Chemical Abstracts Printed Subject Indexes[edit]

Prior to 1972, there were five- and ten-year Subject Indexes to Chemical Abstracts. Beginning with the 9th Collective Index period for 1972-76, the chemical name index entries for single chemical substances were put into a new work, the CHEMICAL SUBSTANCE INDEX. Everything else, including names for classes of substances (e.g., ethers), went into the GENERAL SUBJECT INDEX. Thus, searches for terms referring to classes of compounds, reactions, processes, equipment, or plant and animal species should be searched in the "General Subject Index" after the proper term or phrase has been found in the "Index Guide." From 2007, CAS no longer categorizes information by collective index periods, so the new CA index names no longer have a "CI" label. It is important to check the "Index Guide" that corresponds to the period you are searching in order to be sure of finding the correct term for use in the "General Subject Index," as these can change with time (e.g., "Pharmaceuticals" (14CI) v. "Drugs" (13CI and previous).

Not every preferred term or phrase is found in the "Index Guide," and if you do not find a listing there, assume that you have chosen the correct preferred term and look in the appropriate section of the "General Subject Index." Always be aware that preferred terms may change when the boundaries of the Collective Index periods are crossed.

Look at a sample record from the CA Student Edition, paying particular attention to the index terms and the use of abbreviations. As noted above, the SciFinder topic search will do some behind-the-scenes work to find appropriate terms to include in a search, so people who use that resource do not have to worry as much about controlled or uncontrolled vocabulary when they perform a research topic search. However, with some caution, as noted above, you may use synonyms in parentheses next to a related concept, for example, ESCA (XPS).

Controlled Vocabulary Indexes: Chemical Abstracts "Index Guide" and Supplementary Terms[edit]

One of the virtues of a keyword search is that the search terms can reflect the current, ever-changing vocabulary of science. As soon as a new name for a concept, technique, etc., is used in a document, it is available for searching. Controlled vocabulary lists are slower to adapt to changes in scientific terminology, but their great benefit is that they can guide you to a single preferred term for the concept. Hence, the searcher need only identify the preferred indexing term to find documents of interest.

A fascinating example from recent years can be seen in the emergence of the term "click chemistry," a term coined in the late 1990s by chemist K. Barry Sharpless. The expression can first be found in SciFinder in 1999 as a single reference in a meeting abstract and in 2001 the concept is fully described in a published journal article.2 The expression grew rapidly into a supplementary term (a transitional state between keyword and formal indexing term). SciFinder then shows the following trend: from being used 3 times as a supplementary term in 2002 to 155 times in 2006, and in 2007, "click chemistry" appears for the first time as a formal index term adopted by Chemical Abstracts, with 69 items receiving the index term "click chemistry" and some 202 items showing "click chemistry" as a supplementary term. In 2011, the most recent complete year, the term "click chemistry" results in 1460 references, of which 1097 are indexed with that term.

Prior to 2010, when Chemical Abstracts ceased print publication, the Index Guide was the publication that governed the six-month volume and five-year collective General Subject and Chemical Substance Indexes. The Index Guide, though no longer current, is still a useful document to browse for index term guidance. For example, looking in the "E" section of the "Index Guide" for ESCA directs you to the "P" section of the actual "General Subject Indexes":

ESCA (electron spectroscopy for chemical analysis)

     See Photoelectric emission 
x-ray
See Photoelectron spectroscopy
x-ray

Likewise, looking in the "X" section of the Index Guide for XPS leads to the same preferred phrases:

XPS (x-ray photoelectron spectroscopy)

     See Photoelectric emission 
x-ray
See Photoelectron spectroscopy
x-ray

Thus, by using the Index Guide, the searcher would discover that documents on this topic can be found in the "P" section of the "General Subject Index" to Chemical Abstracts. It is important to use the CA "Index Guide" before using the "General Subject Index" because there are no "see" references in the "General Subject Index" itself. Furthermore, each five-year collective index period has its own "Index Guide". There is a guide to Hierarchies of General Subject Headings to assist in selecting terms.

2Kolb, H. C., Finn, M. G., & Sharpless, K. B. (2001). Click Chemistry: Diverse Chemical Function from a Few Good Reactions. Angewandte Chemie International Edition, 40 (11), 2004–2021. doi:10.1002/1521-3773(20010601)40:11<2004::AID-ANIE2004>3.0.CO;2-5

Refining And Analyzing Search Result Sets in SciFinder[edit]

SciFinder, like many other bibliographic databases including Web of Science, provides tools with which searchers can filter, or refine, a set of search results. The Refine operation consists of applying limit options by research topic (keyword), author or company name, publication year, document type, language, or source database. 'Refine' is basically a single-step operation.

The Analyze step in SciFinder is a more nuanced way to act upon a set of research results. Each Analyze option results in a bar chart or histogram display of terms and their distribution within the answer set, allowing for further exploration and discovery on the part of the searcher. (Note that some of these options effectively eliminate references from MEDLINE, as they are based on CA-specific data elements.) In 2010, chemistry librarians Chuck Huber and A. Ben Wagner gave the following useful guidelines on the use of some of these analysis tools on CHMINF-L (slightly edited in the "mashup" below).

  • CA Section Title has its origins in the original print Chemical Abstracts which appeared in 80 major subject sections, collected under five broad headings (see The Sections of Chemical Abstracts for more information):
Section
Name
Section
Code
Section
Numbers
Biochemistry BIO/CC 1-20
Organic Chemistry ORG/CC 21-34
Macromolecular Chemistry MAC/CC 35-46
Applied Chemistry & Chemical Engineering APP/CC 47-64
Physical, Inorganic, & Analytical Chemistry PIA/CC 65-80

These are very broad categories. Note that the definitions and exact title of the sections changed a number of times over the years, which explains the variations you will see when you do an Analysis. This analysis automatically eliminates MEDLINE records (with no warning message), as they of course don't have CA Section Titles assigned. Index Term analyzes the controlled vocabulary of both CAPLUS and MEDLINE, i.e. subject headings, but not the chemical substance indexing. It does not search the Supplementary Terms.

  • CA Concept Headings analyzes CA "main heading" controlled vocabulary/index terms used in the old General Subject Index in the print world, i.e., this excludes chemical substance indexing. These headings appear in the CONCEPT column (the header box, not the detailed text modifier info) in the SciFinder record. This analysis excludes MEDLINE records, again without a warning message. If you are searching a set that has only CA references, this analysis appears to be identical to the Index Term analysis.
  • Supplementary Terms originally contained single terms from the CA keyword phrases, which are (or were) indexing terms used to prepare quick indexes to each issue of printed CA. Keywords reflect the content of the title and the abstract, using vocabulary found in the original document. MEDLINE records are not excluded from this analysis.

Here are some hints for how these can be used in SciFinder subject searches.

1) CA Section Title - assuming you do not care about MEDLINE records in the answer set, the CA Section Title limitation will help focus on a very broad category such as Enzymes or Biochemical Genetics or Mammalian Hormones. This is useful when one wants categories too broad to be defined by keywords or to eliminate noise from a disparate category. It can also be useful for large sets where an index term analysis is overwhelming. Be sure to move far enough down in the Analyze listing to pick up older variant section titles for the same section. References often are assigned to more than one section, so care is needed since it is unreasonable to expect every singe reference on enzymes regardless of the context to be in the Enzyme section. When you select CA Section Titles, you are making the assumption that you are selecting references where the major emphasis of the article (not unlike major MeSH headings in MEDLINE) is related to the section. Thus, CA Section Title is useful when you want to distinguish between two concepts with the same name, but radically different areas. For example, you search "plasma" and want to separate the stuff in your blood from the stuff in stars. Perhaps you want to home in on major concepts, e.g., when you are looking for uses of a particular type of catalyst. If you narrow to the papers placed in the Catalysis section, those papers would presumably be treating the catalytic role as a major rather than minor topic.

2) Index Term - This has the advantage of keeping MEDLINE records. It is useful when you get to a point in the search where you have put in all the concepts you can think of and all the limits that you feel are safe, but still have too many references to comfortably browse. The Analysis by index terms is the perfect solution showing us what is in the set when we don't know exactly what we want. It generates ideas as to what facets of the set we want to look at. Index Term is a dependable way to identify key terminology and/or more tightly focus a search on your topic. One problem with Analyze on Index Terms is that sometimes an index term which you would like to home in on is buried in the lower levels of both the sort by rank and the alphabetical sort. By using Categorize, you can work your way down hierarchically to the set of terms you want, and within the smaller subset of terms in the final Categorize column, you can find the terms you're looking for. However, Categorize doesn't run for really large answer sets.

3) CA Concept Headings - I have little use for this option since it basically performs an index term analysis. The only use I can think of is where I have a set of both CAPlus and MEDLINE records and want to simultaneously eliminate the MEDLINE records while looking at the CA indexing.

4) Supplementary Terms - Especially if going after a very new, specific, or unusual topic, it would be an extra precaution to check Supplementary Terms to make sure that an index term analysis had not missed some important records. This is a way to do a title term search in MEDLINE, something that otherwise could only be done with the Explore References: Journal search screen. It is a good idea to first do an Analyze by Index Term and use Supplementary Terms as a double check. You would likely seldom do an ST analysis alone. Another use of Supplementary Terms is when SciFinder "over-truncates." For example, "alcoholysis" gets truncated to "alcohol" which yields a huge number of false drops. But if you analyze by supplementary terms, you can pick out the papers in which the desired term appears untruncated in that field.

Specialized Abstracting and Indexing Services for Subjects or Document Types[edit]

There are many specialized abstracting or indexing services that cover either a subset of chemistry, e.g., Analytical Abstracts, or a particular format, e.g., Proquest's Dissertation & Theses database, or Derwent World Patents Index. Many of the techniques for subject searching discussed in this chapter are applicable to those works, but acquainting yourself with the guides, database summary sheets, and other user aids for any tools you choose to search is a very good idea.

Summary[edit]

Depending on the database in question, the searcher may simply enter a topic in natural language, or may need to consult an authoritative list of subject terms before performing a subject search. More highly developed databases (in terms of data structure, controlled vocabulary, and search engine optimization) reward searchers with more accurate, comprehensive, and relevant search results. Minimally developed databases require the searcher to be diligent and creative in thinking of alternative expressions, synonyms, acronyms, and other aspects of the search to find the most relevant information. There is always a trade-off between precision (the relevance of the retrieved articles) and recall (the number of relevant items in the database that were actually retrieved). A very narrowly defined search strategy may achieve nearly 100% precision, but find a relatively small percentage of the important relevant references in the database. Database producers and vendors have developed many techniques that allow a searcher to refine a search strategy and bring the desired information to the surface, and attention to these techniques will pay dividends in the long run.

CIIM Link for further study

SIRCh Link for Subject Searches

Problem Set on this topic