Chemical Information Sources/SIRCh/Chemistry Databases on the Web

From Wikibooks, open books for an open world
< Chemical Information Sources‎ | SIRCh
Jump to: navigation, search

SIRCh: Selected Internet Resources for Chemistry



To do:
This section is currently being revised.


American Mineralogist Crystal Structure Database

Includes every structure published in the American Mineralogist, The Canadian Mineralogist, European Journal of Mineralogy and Physics and Chemistry of Minerals, as well as selected datasests from other journals. The database is maintained under the care of the Mineralogical Society of America and the Mineralogical Association of Canada, and financed by the National Science Foundation.

Atomic Reference Data for Electronic Structure Calculations

Contains total energies and orbital eigenvalues for the atoms hydrogen through uranium, as computed in several standard variants of density-functional theory.

Aureus Sciences Databases (Aureus Sciences)

Aureus Sciences helps researchers transform data into knowledge to accelerate the drug discovery process. AurSTORE is a comprehensive data structuring system particularly suited for the integration of proprietary biological and chemical information generated through the client's own research program. Databases cover ADME, Kinases, Nuclear Receptors, Proteases, Ion Channels and GPCR.



BiGG (University of California at San Diego)

The BiGG database allows the exploration of hundreds of human disorders involving metabolism. It includes more than 3,300 known human biochemical reactions and allows scientists to create any cell in silico.

Binding MOAD (University of Michigan)

Binding MOAD's goal is to be the largest collection of well resolved protein crystal structures with clearly identified biologically relevant ligands annotated with experimentally determined binding data (Kd, Ka, Ki, IC50) extracted from the literature. Currently has 9836 entries, with 2950 entries with binding data.

BindingDB (University of Maryland Biotechnology Institute)

The Binding Database aims to make experimental data on the noncovalent association of molecules in solution searchable via the WWW. The initial focus is on biomolecular systems, but data on host-guest and supramolecular systems are also important and will be included in time. The database currently contains data generated by isothermal titration calorimetry (ITC) and enzyme inhibition (Enz. Inhib.) methods. BindingDB contains 15,000 small molecule ligands with 30,000 measured affinities to proteins represented in the PDB.

Biochemical Pathways Database under C@ROL (BioPath) (University of Erlangen)

BioPath is designed to support scientists to understand the impact of gene regulations on biological systems for drug target identification and provides the following features:

  • 1,175 molecules with connection tables including stereochemical information
  • 1,545 biochemical reactions stoichiometrically balanced, with marked reaction centers and atom-atom mapping numbers between educts and products
  • 1,000 unique enzymes represented by names and EC numbers
  • 202 pathways represented by names
  • Covered organism: prokaryotes, plants, yeasts and animals, general pathways
  • Subcellular localization of pathways


A freely accessible database of protein and genetic interactions. More than 116,000 interactions from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. Over 30,000 interactions have recently been added from 5,778 publications through exhaustive curation of the Saccharomyces cerevisiae primary literature. An internally hyper-linked web interface allows for rapid search and retrieval of interaction data. Full or user-defined datasets are freely downloadable as tab-delimited text files and PSI-MI XML. Pre-computed graphical layouts of interactions are available in a variety of file formats. User-customized graphs with embedded protein, gene and interaction attributes can be constructed with a visualization system called Osprey that is dynamically linked to the BioGRID.

Biological Macromolecule Crystallization Database, BMCD

Contains crystal data and the crystallization conditions, which have been compiled from literature. The current version of the BMCD includes 5247 crystal entries from macromolecules for which diffraction quality crystals have been obtained. These include proteins, protein:protein complexes, nucleic acid, nucleic acid:nucleic acid complexes, protein:nucleic acid complexes, and viruses.

Biological Magnetic Resonance Data Bank, BioMagResBank, BMRB

BMRB is a repository for data from NMR Spectroscopy on proteins, peptides and nucleic acids. There are many links to www resources on NMR and biomolecular structures.

BioMeta Database (CMBI)

BioMeta is a database of metabolites and metabolic reactions. Its contents are largely based on the KEGG Ligand database. Compared to the KEGG database, a large number of chemical structures have been corrected with respect to constitution and stereochemistry. The aim is to arrive at a database with correct representations of molecules and reactions. About 1,500 molecular structures have been corrected and about 55% of the unbalanced reactions have been "balanced". Currently (5/7/2007), the contents of the BioMeta database are based on version 36 of the KEGG Ligand database (October 25, 2005) but an update is underway.


The MOBY-S system defines an ontology-based messaging standard through which a client will be able to automatically discover and interact with task-appropriate biological data and analytical service providers, without requiring manual manipulation of data formats as data flows from one provider to the next.


BioPath.Explore provides a convenient electronic access to the entire information stored in the BioPath database by a variety of standard search methods and retrieval techniques. Additional information, such as computed physicochemical properties and 3D molecular models, allows for a broader range of application of BioPath.

Biopharmaceutical Products in the U.S. and European Markets

The database search features include Products/Active Ingredients, Product Class, Sales, Marketing and Approval Status, Indication/Disease, Source/host/expression, and Chemicals (non-active), among others.

Biophysical Chemistry Databases (University of Wisconsin)

The databases cover structures (proteins, nucleic acids, and viruses) and physical properties. The prediction and visualization tools are also included.

BioPrint (Cerep)

Cerep’s BioPrint provides a unique resource for supporting drug discovery. BioPrint places a new drug candidate in the context of all marketed drugs, anticipating potential in vivo liabilities, predicting off target activities, and predicting ADME characteristics. Cerep has systematically profiled the active ingredients from over 2,500 marketed drugs, failed drugs, and reference compounds, in a panel of more than 180 well-characterized in vitro assays including a diverse selection of molecular targets (GPCRs and other receptors, ion channels and transporters, enzymes, kinases, etc.), as well as solution properties and in vitro ADMET properties

BOND Biomolecular Object Network Database

BOND is the Biomolecular Object Network Databank - a powerful new databank that combines sequence, interaction, and related interactome data and content. BOND contains Genbank and BIND data as well as related tools and information. BOND combines various data silos into one resource. BIND is now a component database of BOND. All the great features familiar to you in BIND have been improved in BOND, with the added benefit that the features are easier to use and deliver more insightful results.

Bordwell pKa Table (Activity in DMSO) (University of Wisconsin)]

A handy tool to look up the PKa values of some popular reagents, developed by Hans J. Reich of the University of Wisconsin, Madison.

BRENDA, The Comprehensive Enzyme Information System

It offers a variety of search options, including quick search, fulltext search, advanced search, substructure search, taxtree search and so on.



Calculation of Molecular Properties and Drug-Likeness (Molinspiration)

You can calculate the properties or predict bioactivity after drawing chemical structures.

CAMEO Chemicals

A database of over 6,000 hazardous materials (with over 82,000 synonyms). Each chemical data sheet describes the material and its properties, and includes information on fire and explosive hazards, health hazards, firefighting techniques, cleanup procedures, and protective clothing.

CATH, Protein Structure Classification

(Clusters proteins at four major levels, Class (C), Architecture (A), Topology (T) and Homologous superfamily (H)) The CATH database is a hierarchical domain classification of protein structures in the Protein Data Bank (PDB, Berman et al. 2003). Only crystal structures solved to resolution better than 4.0 angstroms are considered, together with NMR structures. All non-proteins, models, and structures with greater than 30% "C-alpha only" are excluded from CATH. This filtering of the PDB is performed using the SIFT protocol (Michie et al., 1996). Protein structures are classified using a combination of automated and manual procedures. The database can be browsed by classes or searched by keyword.

CAZy, Carbohydrate-Active enZymes

CAZy describes the families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) of enzymes that degrade, modify, or create glycosidic bonds.

CCCBDBComputational Chemistry Comparison and Benchmark Database (US National Institute for Standards and Technology)

The CCCBDB ( is a collection of experimental and theoretical thermochemical properties for 615 neutral gas-phase species. The goal of the database/website is to provide a benchmark set of molecules and reactions for the evaluation of ab initio computational methods and to allow the comparison between different ab initio computational methods and experiment for the prediction of thermochemical properties. Users can evaluate the accuracy of ab initio methods applied to thermochemistry by using the data at the site. The experimental and computational data is available (enthalpies of formation in kJ/mol, computed energies in hartrees, lists of vibrational frequencies, geometries) and can be used in comparisons. For example the enthalpies of a user-specified reaction can be displayed for experiment and different levels of theory. Properties presently in the CCCBDB include enthalpies of formation, entropies, heat capacities, geometries, vibrational frequencies, and barriers to internal rotation.


The Candian Centre for Occupational Health and Safety maintains one of the best collections of Materials Safety Data Sheets (MSDS). Free searching across the databases is allowed, but there is a charge to view some of the records. Databases include MSDS, CHEMINFO (comprehensive health and safety information on pure chemicals), CHEMpendium (combined database of CHEMINFO, CESARS, CHRIS, DSL/NDSL, HSDB, NJHS Fact Sheets, NIOSH Pocket Guide, Transport TDG, and Transport 49CFR, plust RTECS, the Registry of Toxic Effects of Chemical Substances.

CEBS, Chemical Effects in Biological Systems

CEBS, produced by the National Institute of Environmental Health Sciences, catalogs all of the gene products associated with responses to toxins. It aims to be a public toxicogenomics knowledgebase.

ChEBI, Chemical Entities of Biological Interest (EMBL-EBI, European Bioinformatics Institute)

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on small chemical compounds. ChEBI encompasses an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.


ChemBank is a public, web-based informatics environment created by the Broad Institute's Chemical Biology Program and funded in large part by the National Cancer Institute's Initiative for Chemical Genetics (ICG). This knowledge environment includes freely available data derived from small molecules and small-molecule screens, and resources for studying the data so that biological and medical insights can be gained. ChemBank is intended to guide chemists synthesizing novel compounds or libraries, to assist biologists searching for small molecules that perturb specific biological pathways, and to catalyze the process by which drug hunters discover new and effective medicines.

ChemBioBase Suite(Jubilant Biosystems)

ChemBioBaseTM is a comprehensive set of target centric ligand databases, which covers all major targets that are of current research interest. For example, Kinase ChemBioBaseTM contains over 319,000 small molecules against kinase targets.

ChemDB, the UC Irvine ChemDB

ChemDB includes ChemicalSearch, Kpredictor, Smi2Depict, and Babel.
ChemicalSearch: find a chemical by basic criteria like molecular weight and predicted logP, or by the more abstract notion of structural similarity.
Kpredictor: submit small molecule structures online and get back selected physicochemical property values predicted using our kernel methods.
Smi2Depict: takes a list of molecule "SMILES" strings and generates respective 2D image depictions.
Babel: allows direct conversion between various molecule file formats. This is helpful for translating other formats such as SDF and Mol2 into SMILES strings that most of these interfaces expect.

ChemExper Chemical Directory

This database contains chemicals with their physical characteristics. The ChemExper Chemical Directory contains over 200,000 different chemicals, 10,000 MSDS and over 10,000 IR spectra.

ChemFinder (CambridgeSoft; requires plugin for substructure searching)

ChemFinder has been providing free chemical searching to hundreds of thousands of scientists since 1995. This free database provides chemical structures, physical properties, and hyperlinks.

Chemical Abstracts Service

Chemical Abstracts Service maintains the world's largest databases of chemically-related information and a suite of tools to access the information contained in them (SciFinder, STN on the Web, STN Easy, and STN AnaVist). The databases produced by CAS include:

  • Chemical Abstracts Plus (CAplus) is a current, comprehensive chemistry bibliographic database available from Chemical Abstracts Service (CAS). CAplus covers international journals, patents, patent families, technical disclosures, technical reports, books, conference proceedings, dissertations, electronic-only journals, and web preprints from all areas of chemistry, biochemistry, chemical engineering, and related sciences from 1907 to the present, as well as over 180,000 records for patents and journal articles dated before 1907.
  • The CAS REGISTRY File is a database containing records for substances identified by the CAS Registry System. These include substances indexed in CAplus and CA files, and special registrations, for example, registrations for regulatory lists such as TSCA and EINECS. It contains a substantial amount of empirical and calculated physical and chemical properties of chemical substances.
  • CHEMCATS (Chemical Catalogs Online) is a catalog file containing information about commercially available chemicals and their worldwide suppliers.
  • CHEMLIST, the Regulated Chemicals Listing, identifies substances from more than 100 inventories and regulatory lists from around the world. Records contain substance identity information, inventory status, source of information, and summaries of regulatory activity, reports, and other compliance information. The database began with national chemical inventories such as TSCA, EINECS, and ENCS. Today it includes international lists such as HPVs, pollutant release inventories, priority chemicals, and dangerous chemicals with transportation restrictions.
  • TOXCENTER (Toxicology Center) is a bibliographic database that covers the pharmacological, biochemical, physiological, and toxicological effects of drugs and other chemicals. It is compiled from various sources, including Medline, Pesticides Abstracts, CAplus, and others.

Chemical Acronyms Database (maintained by the Indiana University Chemistry Library)

There are two ways to search this chemical acronyms database (contains 14822 references): search by acronym and search by keyword. You can also enter a new record by filling an online form.

Chemical Kinetics Database on the Web (data on gas phase reactions from the US National Institute for Standards and Technology)

The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.

Chemical Structure Lookup Service (CSLS): a database locator service

CSLS allows look up compounds in over 80 databases with more than 39 million entries (27 million unique structures). The compound can be searched by InChI, SMILES, formula and other identifiers. Files containing structures and IDs can be uploaded to carry searching in CSLS.

ChemIDplus (US National Library of Medicine)

ChemIDplus is one program of Specialized Information Services in NLM. It allows searches by substance identification, toxicity, physical properties, molecular weight, locator codes and chemical structure.

ChemMine (Center for Plant Cell Biology, UC Riverside)

ChemMine is an integrated compound mining service and database. Its goal is to facilitate chemical genomics screens and to disseminate the generated knowledge. The database provides access to a wide variety of bioactive, natural and screening compounds from public and commercial providers. Their structures and functional annotations can be searched by chemical properties, substructure matches, structural similarities and biological activities. In addition to a comprehensive information retrieval system, ChemMine is also a cheminformatics service for analyzing the structural and chemical properties of lead compounds. This online service is available for compounds that are represented in the database and those provided by the user. The current set of online analysis tools includes structure-based clustering of compounds, generation of chemical descriptors, and various viewing and reformatting functionalities.

ChemSpider (Royal Society of Chemistry)

ChemSpider is a free chemical structure database providing fast access to over 25 million structures, properties and associated information. By integrating and linking compounds from more than 400 data sources, ChemSpider enables researchers to discover the most comprehensive view of freely available chemical data from a single online search.

ChemSynthesis Chemical Database

A freely accessible database of chemicals that contains substances with their synthesis references and physical properties such as melting point, boiling point and density. There are currently more than 40,000 compounds and more than 45,000 synthesis references in the database.

ChemxSeer (Penn State University)

An integrated digital library and database allowing for intelligent search of documents in the chemistry domain and data obtained from chemical kinetics. The data repository component contains experimental data obtained from various sources. Their tools can process, store and link data in multiple formats, e.g., Excel, XML, Gaussian, and Charmm. A metadata ad-on can help annotate the data and link multiple datasets.

Chirbase, a molecular database for chiral Chromatography

Chirbase contains 95,000 chiral separations, 30,000 molecular structures, and 4000 new separations updated every four months. It is a powerful information system for separation of enantiomers by chromatography.

Collaborative Drug Discovery Database

The CDD enables scientists to collaborate on efforts to more effectively develop new drug candidates for commercial and humanitarian markets. The data sets in the system include the FDA orphan and approved drugs and small molecule drug discovery data, covering diseases such as malaria, tuberculosis, African Sleeping Sickness, Chagas Disease, and Leishmania.

Crystallography Open Database, COD

There are ~43.000 entries in the COD (June 16, 2006). The COD offers 2 powerful options for a search: by combining in the way you choose: text (2 words or parts of words), elements (1 to 8, with formula numbers or not), volume (min and max), and strict number of elements; by cell parameters ranges amin-amax, bmin-bmax etc.

Cytochrome P450 Drug Interaction table (Indiana University - Purdue University - Indianapolis)

This table is designed as a hypothesis testing, teaching and reference tool for physicians and researchers interested in drug interactions that are the result of competition for, or effects on the human cytochrome P450 system. Clinicians and health care providers may find an abbreviated clinical table designed for practical use during prescribing more useful.



Data & Property Calculation Web Sites (University of Illinois, Chicago Thermodynamics Research Laboratory)

You can find links to many nano, quantum, statistical mechanics and thermodynamics data, and property calculation websites.

DIP (Database of Interacting Proteins)

The DIPTM database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data.

DisProt Database of Protein Disorder

The Database of Protein Disorder (DisProt) is a curated database that provides information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part. DisProt is a collaborative effort between Center for Computational Biology and Bioinformatics at Indiana University School of Medicine and Center for Information Science and Technology at Temple University.

Dr. Duke's Phytochemical and Ethnobotanical Databases

The database enables plant searches, chemical searches, activities searches, and Ethnobotany Searches. Browsing the database is also allowed.

DrugBank (University of Alberta)

It contains chemical, pharmaceutical, medical, and molecular biological information on more than 3,000 drug targets and 4,100 approved or experimental drug products.

DrugMatrix (Iconix Pharmaceuticals)

DrugMatrix is the first comprehensive research tool in the new field of toxicogenomics. It enables researchers to select the highest quality leads and drug candidates at the earliest, most cost-effective stages of drug discovery and development and eliminate likely failures.



ECOTOX (US Environmental Protection Agency)

The ECOTOXicology database (ECOTOX) is a source for locating single chemical toxicity data for aquatic life, terrestrial plants and wildlife. ECOTOX integrates three previously independent databases - AQUIRE, PHYTOTOX, and TERRETOX - into a unique system which includes toxicity data derived predominately from the peer-reviewed literature, for aquatic life, terrestrial plants, and terrestrial wildlife, respectively.

eMolecules (formerly Chmoogle)

A good source to search for supplier information of chemicals by drawing the structures or using their SMILES. The supplier contact information is provided in the record. Substructure and exact structure search are allowed in the advanced search. For expert search, you have to log in first. No reaction and chemical properties information are covered in eMolecules.

The National Cancer Institute (NCI) database consists of approximately 250,000 structures. The browser has the 

following capabilities:

  • Substructure search capabilities (e.g., search for H-bond acceptor/donor)
  • Expanded search criteria (log P [calc.], complexity, data availability)
  • PASS activity spectrum predictions (>500 mechanisms of action)
  • Output features like MDL Chime, 3D Java Viewer, VRML Scene, spreadsheets
  • 3-D pharmacophore search capabilities
  • Links to additional services for further processing of search results

Entrez, the Life Science Search Engine (US National Center for Biotechnology Information)

A comprehensive gateway to search life science databases.

Envirofacts Master Chemical Integrator (EMCI) (US EPA)

You can use this form to search the full text of information offered from the EMCI Chemical References Web Pages.

Environmental Fate Data Base (EFDB) (Syracuse Research Corporation)

EFDB contains files such as Datalog, Biolog, Chemfate, Biodeg, and Biodeg Summary.

EUROSPEC, International Spectroscopic Database

International Spectroscopic Data Bank (ISDB) is an electronic database and archive for spectroscopic and substance data for access by and for the scientific community.

EXTOXNET, the Extension Toxicology Network

Covers topics such as:

  • Adverse Health Effects and Risk
  • Diet and Cancer
  • Food Safety
  • Household Hazardous Waste
  • Indoor Air
  • Pesticides
  • Safe Drinking Water
  • Soil (gardening and chemicals)



Fisher Scientific

It can be used as an electronic chemical catalog. For example, you can search ACROS by text, template, and mixed search.

Flexweb; Analysis of Flexibility in Biomolecules and Networks

Flexweb is a portal to research in the flexibility of networks and associated software. It contains links to the various software applications hosted by Flexweb. Most of the software is available for download by academic groups as both executable-only and full source code distributions.



GADU VO at Open Science Grid

GADU (Genome Analysis and Database Update) VO (Virtual Organization) provides the following services:

  • Reliable, timely updates of the integrated public database, from a growing number of analysis tools and an increasing number and scale of source sequences.
  • The execution of user -specified models.
  • Web-based portal access to all services.
  • Customized sequence analysis workflows.

GC-EIMS of Partially Methylated Alditol Acetates

Electron-impact Mass Spectra of partially methylated alditol acetates.

Generated Database of Chemical Universe of Small Molecules

Contains every possible organic molecule of C, N, O, or F up to 11 atoms, taking into account chemical stability and synthetic feasibility; 26.4 million compounds.

Glossary of Common Terms and Abbreviations in Quantum Chemistry (Karl K. Irikura)

The literature of quantum chemistry, covering both molecular orbital theory and density functional theory, is cluttered with abbreviations, acronyms, and jargon. Some of the more common terminology is explained in this glossary. Literature cited is included.

Glycan Database

The database is one of the Consortium for Functional Glycomics specialty databases for glycan-binding proteins, glycan structures, and glycosyltransferases. Collectively, they provide an independent way of accessing CFG data, as well as integrating a large amount of publicly-available information. They are key to developing a bioinformatics resource that integrates the diverse types of data being generated and an important step towards an integrated systems biology approach to glycobiology.


The databases provided by this site include bibliographic search, Mass Spectroscopy search, NMR, structure and PDB search.



HIC-Up, Hetero-compound Information Centre, Uppsala

This site contains information about hetero-compounds encountered in files from the Protein Data Bank (PDB). It is updated a few times a year.

HIV/OI/TB Therapeutics Database (US NIAID Division of AIDS)

This databases can be searched by compound names and author names.

Human Metabolome Database

Since 2004 the Human Metabolome Project has assembled an inventory of about 2,500 molecules (early 2007) produced by metabolic reactions in body tissues and fluids. The database is designed to contain or link three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data. It includes both water-soluble and lipid-soluble metabolites as well as metabolites that would be regarded as either abundant (> 1 uM) or relatively rare (< 1 nM). Additionally, approximately 5500 protein (and DNA) sequences are linked to the metabolite entries.

Human Protein Reference Database

The Human Protein Reference Database represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data. HPRD has been created using an object oriented database in Zope, an open source web application server, which provides versatility in query functions and allows data to be displayed dynamically.



I-ChemiSt, Intelligent Chemical Structural Database] (Mass spectroscopy and NMR spectra databases)

The Mass Spectroscopy Database contains Gas Chromotgraphy - Electron Ion Mass Spectra (GC-EIMS) of Partially Methylated Acetate Alditols (PMAAs) of Complex Carbohydrate Molecules. Currently, the database consists of over 600 GC-EIMS spectra. NMR Spectra database is sill underconstruction

iHOP, information Hyperlinked Over Proteins

The iHOP service provides summary information on more than 1,500 organisms and 80,000 genes by automatically extracting key sentences from PubMed documents.

Indiana University Molecular Structure Center

Indiana University Molecular Structure Center is one partner site of the Reciprocal Net Site Network. The Reciprocal Net Site Network is a distributed database for crystallographic information and it is run by participating crystallography labs across the world. Select samples are available to the general public without authentication. Authorized users may jump to another site's database or visit the master server at after logging in.

Infotherm Thermophysical Properties Database (FIZ Chemie Berlin)

  • Type of data: experimental thermodynamic and physical properties of 27,000 mixtures and 7,200 pure substances from a total of 9,900 compounds
  • Database growth: about 1,000 tables per month
  • Content: 183,000 tables and diagrams of PVT state values, phase equilibrium data, transport and surface properties, caloric, optical and acoustic properties
  • Material types: 75% organic, 20% inorganic and 5% organometallic compounds with the focus on solvents
  • Sources: 13,100 publications from journals and reports as well as measurement protocols and data collections in printed and electronic format, from 1919 to present

IntAct (Protein Interaction Database)

IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.

IUPAC-NIST Solubility Database (Data compiled and evaluated by the International Union of Pure and Applied Chemistry)

There are over 67,500 solubility measurements, compiled from 18 volumes of the IUPAC Solubility Data Series. There are about 1800 chemical substances in the database and 5200 systems, of which 473 have been critically evaluated. The database has over 1800 references. Typical solvents and solutes include water, sea water, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds.



Joint Expert Speciation System, JESS (Tool for tool for thermodynamic and kinetic modeling of chemical speciation in complex aqueous environments)

JESS is a powerful research tool for modelling chemical speciation in complex aqueous environments. It is designed to solve problems requiring expert knowledge of solution chemistry. It currently comprises over 250 programs, 2000 subroutines and 234,000 lines of Fortran code.




KEGG API provides valuable means for accessing the KEGG system, such as for searching and computing biochemical pathways in cellular processes or analyzing the universe of genes in the completely sequenced genomes. The users can access the KEGG API server by the SOAP technology over the HTTP protocol. The SOAP server also comes with the WSDL, which makes it easy to build a client library for a specific computer language. This enables the users to write their own programs for many different purposes and to automate the procedure of accessing the KEGG API server and retrieving the results.

KEGG Ligand Database

Database of chemical substances and reactions that are relevant to life. It is a composite database currently consisting of COMPOUND, DRUG, GLYCAN, REACTION, RPAIR, and ENZYME databases. ENZYME is derived from the Enzyme Nomenclature, but the others are internally developed and maintained. KEGG Ligand includes tools to perform similarity searches and to predict reactions and reaction pathways.

KEGG pathway

KEGG PATHWAY is a collection of manually drawn pathway maps representing our knowledge on the molecular interaction and reaction networks for:

  • Metabolism
  • Genetic Information Processing
  • Environmental Information Processing
  • Cellular Processes
  • Human Diseases
  • Drug Development

Kinase Knowledgebase (KKB) (Eidogen-Sertanty)

The Kinase Knowledgebase (KKB) is Eidogen-Sertanty's database of kinase structure-activity and chemical synthesis data. Currently the Kinase Knowledgebase covers the following data (through August 2006): Journal articles and patents: >2,600; Number of Biological Activity Data Points: >222,000.

KiNET Proteomics Database

KiNET is the first Internet accessible subscription proteomics database of its kind. It has built in bioinformatics searching capabilities for cell signaling research. KiNET features over 200,000 measurements of the expression and phosphorylation states of hundreds of signal transduction proteins from over 6000 Western blots performed with control and treated tissue/cell samples.



Life Science Gateway

The Life Science Gateway (LSGW) provides application services for bio-informaticians. The LSGW provides the ability for end-users to apply the large scale resources of the TeraGrid to their problems, but it also allows application service providers to leverage local resources, creating a hierarchy of service providers that can begin to address the computational and data needs of the bio-informatics community as a whole.

LIPIDAT (thermodynamic and associated information on lipid mesophase and crystal polymorphic transitions)

LIPIDAT is a relational database of thermodynamic and associated information on lipid mesophase and crystal polymorphic transitions. There are 19,959 records in LIPIDAT. The database includes lipid molecular structures. The bulk of the entries in the current version of LIPIDAT are for the glycerolipids and the sphingolipids.


This database has been developed by the joint-research program between International Medical Center of Japan and Japan Science and Technology Agency. The LIPID database can be searched by key word, classification, branched chain, oil or fat name, number or position.

LIPID MAPS (Lipid Metabolites And Pathways Strategy)

Lipid Metabolites and Pathways Strategy, termed LIPID MAPS has been developed to apply a global integrated approach to the study of lipidomics.



MatWeb, a searchable database of material data sheets

The heart of MatWeb is a searchable database of material data sheets, including property information on thermoplastic and thermoset polymers such as ABS, nylon, polycarbonate, polyester, polyethylene and polypropylene; metals such as aluminum, cobalt, copper, lead, magnesium, nickel, steel, superalloys, titanium and zinc alloys; ceramics; plus semiconductors, fibers, and other engineering materials. It can be searched by text search, categorized search (material type, polymer manufacturer, polymer trade name, metal UNS number) and quantitative search.

MedChem/Biobyte QSAR Database (Pomona College)

Medical Subject Headings, MeSH (US National Library of Medicine)

MeSH is the National Library of Medicine's controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity. There are 22,997 descriptors in MeSH. In addition to these headings, there are more than 151,000 headings called Supplementary Concept Records (formerly Supplementary Chemical Records) within a separate thesaurus. There are also thousands of cross-references that assist in finding the most appropriate MeSH Heading, for example, Vitamin C see Ascorbic Acid. These additional entries include 24,050 printed see references and 112,012 other entry points. The MeSH thesaurus is used by NLM for indexing articles from 4,800 of the world's leading biomedical journals for the MEDLINE/PubMED® database.

MEDPHYT (The Medicinal Plants Database) Registration is required to use this database.

MetaCyc Encyclopedia of Metabolic Pathways

MetaCyc is a database of nonredundant, experimentally elucidated metabolic pathways. MetaCyc contains 700 pathways from more than 600 different organisms. MetaCyc is curated from the scientific experimental literature. It stores pathways involved in both primary metabolism(including photosynthesis), secondary metabolism, as well as associated compounds, enzymes, and genes.

Metalloprotein Database and Browser

The database contains quantitative information on all the metal-containing sites available from structures in the PDB distribution. It contains geometrical and molecular information that allows the classification and search of particular combinations of site characteristics, and answer questions such as: "How many mononuclear zinc-containing sites are five coordinate with X-ray resolution better than 1.8 Angstroms?" One can then visualize and manipulate the matching sites. The database also includes enough information to answer questions involving type and number of ligands (e.g., "at least 2 His") and includes distance cutoff criteria (e.g., "a metal-ligand distance no more than 3.0 Angstroms and no less than 2.2 Angstroms").

Metlin Metabolite Database

METLIN is a web-based data management system designed to assist in a broad array of metabolite research and metabolite identification by providing public access to its repository of current and comprehensive mass spectral metabolite data.

MINT Molecular Interactions Database

MINT focuses on experimentally verified protein interactions mined from the scientific literature by expert curators. The curated data can be analyzed in the context of the high throughput data and viewed graphically with the 'MINT Viewer'.


The myGrid project has developed a comprehensive loosely-coupled suite of middleware components specifically to support data intensive in silico experiments in biology. Workflows and query specifications link together third party and local resources using web service protocols. The software can be freely downloaded and has been used for building discovery workflows for investigations into Williams-Beuren Syndrome and Grave’s Disease by collaborating Life Scientists. (cited from myGRID homepage)



National Toxicology Program (US Department of Health and Human Services)

The NTP is an interagency program whose mission is to evaluate agents of public health concern by developing and applying tools of modern toxicology and molecular biology.

National Drug Code Directory (US) (US Food and Drug Administration Center for Drug Evaluation and Research)

A current list of all drugs manufactured, prepared, propagated, compounded, or processed in the US for commercial distribution. Drug products are identified and reported using a unique, three-segment number, called the National Drug Code (NDC), which is a universal product identifier for human drugs. FDA inputs the full NDC number and the information submitted as part of the listing process into a database known as the Drug Registration and Listing System (DRLS). Several times a year, FDA extracts some of the information from the DRLS data base (currently, properly listed marketed prescription drug products and insulin) and publishes that information in the NDC Directory.

NAViGaTOR (University of Toronto)

A software package for visualizing and analyzing protein-protein interaction networks. NAViGaTOR can query OPHID / I2D - online databases of interaction data - and display networks in 2D or 3D. To improve scalability and performance, NAViGaTOR combines Java with OpenGL to provide a 2D/3D visualization system on multiple hardware platforms. NAViGaTOR also provides analytical capabilities and supports standard import and export formats such as GO and the Proteomics Standards Initiative (PSI).

NDRL/NIST Solution Kinetics Database on the Web(US National Institute for Standards and Technology)

NetPhosK 1.0 Server

Produces neural network predictions of kinase specific eukaryotic protein phosphoylation sites. Currently NetPhosK covers the following kinases: PKA, PKC, PKG, CKII, Cdc2, CaM-II, ATM, DNA PK, Cdk5, p38 MAPK, GSK3, CKI, PKB, RSK, INSR, EGFR and Src.

NIH DTP Cancer Cell Line Activity Predictions (Indiana University)

The NIH Developmental Therapeutics Program provides a set of 60 cancer cell lines, against which approximately 42,000 compounds have been screened. The IU database returns activity predictions for all 60 cell lines for a user specified set of compounds. Note that the data avilable from the NIH is real-valued and is available for three concentration parameters. The models were developed using log GI50, with a cutoff of 5.0. Predictions are obtained using a set of random forest models, one for each cell line using 166 bit MACCS keys as the features.

NIOSH Databases and Information Resources (US National Institute for Occupational Safety and Health)

The databases are categorized by Chemical; Injury, Illness & Hazards Data and Information; Publications; Respirators and other Personal Protective Equipment; Agriculture; and Construction. The most popular databases include the International Chemical Safety Cards, NIOSH Pocket Guide to Chemical Hazards, and NIOSHTIC-2.

NIST Chemistry WebBook (US National Institute for Standards and Technology)

This site provides thermochemical, thermophysical, and ion energetics data compiled by NIST under the Standard Reference Data Program.

NIST Data Gateway

The Gateway includes links to selected free online NIST databases as well as to information on NIST databases available for purchase. You can search by specific keywords, properties and substances across a collection of NIST scientific and technical databases. You will get a list of databases most likely to contain the information you need.

NIST Physics Laboratory Physical Reference Data (US National Institute for Standards and Technology)

You can get access to database holdings by elements. You can also use links to access all information on physical constants, atomic spectroscopy data, molecular spectroscopic data, X-ray and Gamma-ray data, nuclear physical data and so on.

NIST Scientific and Technical Databases (US National Institute for Standards and Technology)

This online database resource cover a variety of databases in chemistry, material, mathmatics, physics,optics, thermophysics, and thermochemistry.


NMRShiftDB is a web database for organic structures and their nuclear magnetic resonance (nmr) spectra. It allows for spectrum prediction (currently only for carbon) as well as for searching spectra, structures and other properties. Last not least, it features peer-reviewed submission of datasets by its users.
Collections (2005-08-30): 15944 structures and 17925 spectra; 1423 proton spectra; about 1500 structures with spectra of different type.

Nucleic Acids Database (Rutgers University: a repository of three-dimensional structural information about nucleic acids)

It is a repository of three-dimensional structural information about nucleic acids. There are three types of searches you can do:

  • NDB Search: search for nucleic acid containing structures determined by either X-ray crystallography or NMR.
  • NDB Integrated Search: an alternate NDB search application which provides more flexible searching and report generation.
  • NDB Status Search: provides a report on the processing status of crystal structures.



Organic Syntheses

Organic Syntheses procedures may be accessed either via the tables of contents of individual volumes ("journal mode") or by conducting structure and keyword searches ("database mode"). It’s a good source of synthesis “recipes” for organic chemists.

OWL, Composite Protein Sequence Database (Covers SWISS-PROT, PIR, GenBank, and NRL-3D)

OWL is a non-redundant composite of 4 publicly-available primary sources: SWISS-PROT, PIR (1-3), GenBank (translation) and NRL-3D. SWISS-PROT is the highest priority source, all others being compared against it to eliminate identical and trivially-different sequences. Owl can be accessed by accession number, database code, text, sequence, title, author, query language, and regular expression.



Patent Analysis Free Databases

Provides free access to the US patent databases, EU patent databases and the WO/PCT patent database and soon free INPADOC and Japanese patent searches. To access the free patent databases it is necessary to create an account and login. Premium search services such as Idea Analysis and SureChem require a paid subscription. SureChem, produced by Reel Two, contains more than 5.4 million unique structures mined from the full text of USPTO, EPO and WO patents. SureChem users can search by substructure or similarity. Data export lets customers analyze results on their desktop or incorporate them into any informatics workflow. The SureChem portal also gives users access to a range of advanced patent searching tools for both chemical and generic searches. A free 7-day trial is extended to all new users.

Pathway Interaction Database

The Pathway Interaction Database is a highly structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways. It is a collaborative project between the US National Cancer Institute (NCI) and Nature Publishing Group (NPG).

PDB, Protein Data Bank

The Protein Data Bank (PDB) is the single worldwide depository of information about the three-dimensional structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, and mice, and in healthy as well as diseased humans. Understanding the shape of a molecule helps to understand how it works. A variety of information associated with each structure is available through the RCSB PDB including sequence details, atomic coordinates, crystallization conditions, 3-D structure neighbors computed using various methods, derived geometric data, structure factors, 3-D images and a variety of links to other resources. As of October 17, there are 39464 structures stored in PDB.

PDSP Ki Database

The database has 44,913 Ki values for searching, and is growing

Pharmabase: a database of cellular physiology & pharmacology

Database on the use of pharmacological compounds in cellular research.

The Physical Properties Database (PHYSPROP)

PHYSPROP contains chemical structures, names and physical properties for over 25,250 chemicals. Physical properties are collected from a wide variety of sources, and include experimental, extrapolated, and estimated values for melting point, boiling point, water solubility, octanol-water partition coefficient, vapor pressure, pKa, Henry's law constant, and OH rate constant in the atmosphere.

Proteins Database at National Chemical Laboratory (Pune, India)

The MOLTABLE portal is established for small molecules in chemoinformatics context and it is now being extended for the biomolecules especially the protein data. The moltable portal would show case the 'ligand space' in protein chemistry.

Proton NMR Spectra of Xyloglucans

Consists of a searchable table of the Proton NMR chemical shifts of xyloglucan oligoglycosyl alditols. Oligosaccharide subunits of xyloglucans were generated by endoglucanase treatment of the polymer and reduced with sodium borohydride. Borohydride treatment simplifies the purification and spectroscopic analysis of each xyloglucan oligosaccharide by transforming the interconverting anomeric forms of the reducing residues into a single, stable moiety (an alditol.) The Proton NMR chemical shifts of the resulting oligoglycosyl alditols are similar, but not identical to those of the parent reducing oligosaccharides.


PubChem provides information on the biological activities of small molecules. It is a component of NIH's Molecular Libraries Roadmap Initiative. PubChem includes substance information,compound structures, and bioactivity data in three primary databases, PCSubstance, PCCompound, and PCBioAssay, respectively. PCSubstance contains more than 15 million records; PCCompound contains more than 10 million unique structures; PCBioAssay contains more than 300 bioassays. Each bioassay contains various data points.

PubChemSR (Search and Retrieve) Tool

PubChemSR (Search andRetrieve) is a MS-Windows-based data search and retrieval toolfor NCBI's chemical databases PubChem. PubChemSR is written in MS Visual Basic .Net and implemented by using Entrez Eutilities via SOAP (Simple Object Access Protocol). It's tested only on Windows XP but it should work well on any other Windows platform. PubChemSR has some special features such as text (keyword) search against any of the three PubChem databases (PubChem Compound, PubChem Substance,and PubChem Bioassay) and spelling correction for mis-spelled search terms.

PubDock; Docking PubChem for Discovery (Indiana University)

PubChem Dock stores the results of large-scale docking calculations and includes the PDB structure of the targets, 3D structures of docked ligands, and the docking scores. Currently four scoring functions provided by Openeye's fred are evaluated, namely:

  • chemgauss3
  • shapegauss
  • oechemscore
  • plp

For each scoring function the total score as well as the component scores are saved. The database currently has docking results against six proteins (1YC4, 1R1P, 1YC3, 1YC1, 1XP6, 1QKT). It is planned to populate PubDock with docking results for families of proteins. One possible use is to screen ligands over families using a similarity approach.

PubMed Central (PMC)

PMC is the U.S. National Institutes of Health (NIH) free digital archive of biomedical and life sciences journal literature. The value of PubMed Central, in addition to its role as an archive, lies in what can be done when data from diverse sources is stored in a common format in a single repository.



Query Chem

You can combine text and chemical structure to do web searches. Query Chem is limited to 1000 "unkeyed" daily Google searches as per the rules of Google's Web API license and is powered by Chembank, PubChem, and Emolecules.

QCLDB II Quantum Chemistry Literature DataBase

It is a database of those papers published after 1978 which treat ab initio calculations of atomic and molecular electronic structure. From about thirty core journals they are collected, surveyed, and given proper tags revealing the content and essence of the paper by the group of young Japanese quantum chemists.



Reactome, a curated knowledgebase of biological pathways

Reactome is a collaboration among Cold Spring Harbor Laboratory, The European Bioinformatics Institute, and The Gene Ontology Consortium to develop a curated resource of core pathways and reactions in human biology. The information in this database is authored by biological researchers with expertise in their fields, maintained by the Reactome editorial staff, and cross-referenced with the sequence databases at NCBI, Ensembl and UniProt, the UCSC Genome Browser , HapMap, KEGG(Gene and Compound ), ChEBI, PubMed and GO. In addition to curated human events, inferred orthologous events in 22 non-human species including mouse, rat, chicken, zebra fish, worm, fly, yeast, two plants and E.coli are also available.



SCOP, Structural Classification of Proteins

The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the Protein Data Bank (PDB). It is available as a set of tightly linked hypertext documents which make the large database comprehensible and accessible. In addition, the hypertext pages offer a panoply of representations of proteins, including links to PDB entries, sequences, references, images and interactive display systems

Sigma-Aldrich Chemistry Database

Sigma-Aldrich is one of the leading suppliers of chemicals in the world. The site features substructure searching and reaction searching.


SitesBase is a database of known ligand binding sites within the PDB which is navigable by PDB identifier or ligand 3 letter code e.g. NAD. Each binding site has a frequently updated register of structurally similar binding sites sharing atomic similarity detected by geometric hashing (Brakoulias and Jackson 2004). Multiple alignments, structural superpositions and links to other structural databases are also available enabling further analysis.

SOLV-DB (Solvents data from the National Center for Manufacturing Sciences)

SOLV-DB is presented by the National Center for Manufacturing Sciences as the one-stop source for solvents data. SOLV-DB brings together a wealth of information on commercially available solvents. Use SOLV-DB to find:

  • Health and Safety considerations involved in choosing and using solvents.
  • Chemical and Physical data affecting the suitability of a particular solvent for a wide range of potential applications.
  • Regulatory responsibilities, including exposure and effluent limits, hazard classification status with respect to several key statutes, and selected reporting requirements.
  • Environmental Fate data, to indicate whether a solvent is likely to break down or persist in air or water, and what types of waste treatment techniques may apply to it.

SPECARB, Raman spectra of carbohydates

SPECARB is an experimental database containing Raman spectra of carbohydrates. It is intended to contain solid state Raman spectra of carbohydrates from the monomers (Aldoses, Ketoses, and Glucolipids) and their derivatives to complex polysaccharides.

Spectral Database for Organic Compounds ,SDBS (National Institute of Advanced Industrial Science and Technology (AIST), Japan)

SDBS is an integrated spectral database system for organic compounds, which includes 6 different types of spectra under a directory of the compounds. The six spectra are as follows, an electron impact Mass spectrum (EI-MS), a Fourier transform infrared spectrum (FT-IR), a 1H nuclear magnetic resonance (NMR) spectrum, a 13C NMR spectrum, a laser Raman spectrum, and an electron spin resonance (ESR) spectrum. The numbers of the data present at the end of September 2004 were as follows:

  • Compounds: ca 32,200 compounds updated
  • MS: ca 22,900 spectra updated
  • 1H NMR: ca 14,000 spectra updated
  • 13C NMR: ca 12,300 spectra updated
  • FT-IR: ca 49,800 spectra updated
  • Raman: ca 3,500 spectra
  • ESR: ca 2,500 spectra.

Substructure Search of SRC's Pointer File (Syracuse Research Corporation)

The file contains 18,620 chemicals accessible by substructure and exact searching powered by ChemS­3, a chemistry search engine developed for both internet and intranet applications.

SUGABASE, carbohydrate NMR database

This site is an experimental WWW interface to the database SUGABASE, which is a carbohydrate-NMR database that combines CarbBank Complex Carbohydrate Structure Data (CCSD) with proton and carbon chemical shift values. It can be used to perform simple searches for carbohydrate structures and/or NMR data. The structures are rendered in CCSD format. The NMR data are displayed as chemical-shift tables

Synthesis Protocols (Boston University Center for Chemical Methodology and Library Development)

A database for solid phase, solution phase and library synthesis procedures.



TeraGrid Bioportal

The system enables database searching, alignment and phylogeny, pattern searching, DNA/RNA analysis, and protein analysis. It collects and unifies data and applications by providing a unified look and functionality.

Therapeutic Target Database A database to provide information about the known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs/ligands directed at each of these targets. Also included in this database are links to relevant databases that contain information about the function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and related literatures of each target.This database currently contains 1535 targets and 2107 drugs/ligands.

TOXNET, Toxicology Data Network (US National Library of Medicine)

TOXNET cover databases on toxicology, hazardous chemicals, environmental health, and toxic releases. The databases include HSDB, ChemIDPlus, DART, and so on.



UK PubMed Central

Based on PubMed Central (PMC), the U.S. National Institutes of Health (NIH) free digital archive of biomedical and life sciences journal literature, UK PubMed Central (UKPMC) provides a stable, permanent, and free-to-access online digital archive of full-text, peer-reviewed research publications.


UniProt is a comprehensive catalog of information on proteins. It is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR. The UniProt Knowledgebase (UniProtKB) is the central access point for extensive curated protein information, including function, classification, and cross-reference. The UniProt Reference Clusters (UniRef) databases combine closely related sequences into a single record to speed searches. The UniProt Archive (UniParc) is a comprehensive repository, reflecting the history of all protein sequences.

US National Cancer Institute Developmental Therapeutics Program

The DTP databases include:

  • Investigational Drugs - Chemical Information: chemical and physical data on compounds that are past or present candidates in the DTP program. Chemical structures, names, biological data (activity in human tumor cell line screens), toxicity data, solubility, stability, UV, and HPLC data are included.
  • COMPARE Database and Analysis tools for cancer research: A probe or "seed" compound can be specified by using the compound's NCI accession number (the NSC number). The COMPARE algorithm then proceeds to rank an entire database in the order of the similarity of the responses of the 60 cell lines to the compounds in the database to the responses of the cell lines to the seed compound.
  • NCI Screening Data 3D Miner: The database in its first release contained 216,089 names (of 45,229 compounds) from the original DTP tables, 44,804 AIDS antiviral screening results, 41.000 anti-tumor, and 122,631 CAS numbers from the original DTP sources.






In WebReactions you simply draw a complete reaction, i.e. the reaction as you might draw it into a lab journal. WebReactions will detect any reaction centers for you automatically and then retrieve about a dozen most similar reactions from the database. WebReactions does not run a reaction substructure search as conventional reaction database systems. An introduction and tutorial are included in the website.







ZINC (commercially-available compounds for virtual screening)

ZINC contains over 4.6 million compounds in ready-to-dock, 3D formats. ZINC is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF).