Proteomics/Proteomics and Drug Discovery/Software Tools
Helpful Links, Resources, and Software
These tools and databases are examples of the publicly available resources available to drug discovery teams. Private databases often contain the contents of publicly available ones as well.
Protein, Proteomics, and Drug Discovery Resources
- NCBI The National Center for Biotechnology Information has databases on protein structure and sequences, nucleotide sequences, dozens of journals, and more.
- Protein Data Bank (PDB)  The PDB is a repository of biological macromolecular structure data.
- ExPASy Proteomics Server  It contains many useful tools for protein sequence and structure analysis
- Drug Design by Landes Bioscience  A free-access book on NCBI, this has good information about drug discovery.
- Drug Discovery News  Contains many free articles on current news and trends in drug discovery.
- Biotechnology Information Directory This is a list of links in genomics, proteomics, and drug discovery.
Virtual Drug and Target Libraries
- Human Metabolome Database HMD is a free database with slightly under 2500 metaboites linked to over 5500 protein and DNA sequences that correspond to them.
- National Cancer Institute (NCI) Database [search for Enhanced NCI Database Browser] This publicly available database contains over 250,000 compounds, along with detailed search features.
- ZINC ZINC is a free library from UCSF containing an astounding 4.6 million searchable compounds in multiple formats with vendor links for ordering.
- Available Chemicals Directory (ACD)  The ACD is a proprietary chemical database marketed by Elsevier MDL. It contains over 480,000 unique chemical compounds; versions are available on a subscription basis for access at a number of academic institutions.
- Quantum Lead Virtual Screening Services  This pay per use library contains 50,000 compounds. The company also offers other software for compound analysis.
- Therapeutic Target Database (TTD)  The TTD contains information about known therapeutic protein and nucleic acid targets. Furthermore, it describes the targeted disease conditions and provides pathway information. Currently, the TTD includes 1535 drug targets and 2107 drugs/ligands (Chen).
Docking and Scoring Tools
The search space in a docking problem consists of all possible conformations (the relative positions of atoms in the 3D structure of a molecule, independent of the coordinate system) and configurations /orientations (the positions of atoms of a molecule after undergoing a rotation and translation in a coordinate system of the protein paired with the ligand). With present computing resources, it would be impossible to exhaustively explore the search space for all possible poses (a pose is the name given to the configuration of the conformation of a molecule in a coordinate system). Needless to say, every docking simulation is a trade-off between accuracy and speed and a good docking tool is expected to maintain a reasonably good balance between the two. The success of a docking program can be said to depend on depend on two components: the scoring function and the search algorithm.
- A scoring function discriminates correct (experimentally verified) docking poses from incorrect ones. It estimates the binding affinity between ligand and receptor.
- A search algorithm finds the best docking pose measured by the scoring function
There are many competitive docking tools available from various sources, ranging from expensive and feature-rich corporate suites to simple yet powerful applications developed by academic/ research institutes.
DOCK was developed by the team led by Irwin Kuntz at the University of California, San Francisco (http://dock.compbio.ucsf.edu/). The most recent version, DOCK 6; is written in C++ and is functionally separated into independent components, thereby conferring program flexibility. The Kuntz lab provides source code for all programs and they are available free of charge for academic institutions, however industrial organizations are charged a licensing fee. Following are some of the applications cited by the authors for DOCK:
- predict binding modes of small molecule-protein complexes
- search databases of ligands for compounds that inhibit enzyme activity
- search databases of ligands for compounds that bind a particular protein
The main DOCK executable is run command line from a standard unix shell and windows users need to run it using a Linux-like environment like Cygwin. DOCK uses an “incremental construction” docking algorithm, which essentially means that the ligands are initially ‘fragmented’:
- a base ‘anchor’ is selected
- the anchor fragment is placed/ anchored to the active site
- followed by incremental addition of the other fragments
The active site of the protein is identified by the program sphgen, which also generates the sphere centers filling the site. Scoring grids are generated by the program grid. The program DOCK then matches the spheres to the ligand atoms and uses the scoring grids to evaluate ligand orientations and finally minimizes the energy based scores. A comprehensive tutorial for DOCK 6 and the docking process in general can be found here: http://www2.umdnj.edu/~kholodvl/tut/DOCK60_intro_linux.pdf
Glide is a commercial product marketed as “A complete solution for ligand-receptor docking” by Schrödinger (http://www.schrodinger.com/). Some of the highlights of the tool include:
- A range of speed vs accuracy options, from the HTVS (high-throughput virtual screening) mode for millions of compounds, to the SP (standard precision) mode, to the XP (extra precision) mode with advanced scoring.
- User-friendly interface: Glide automates calculations for large libraries of compounds and organizes docking results in both summary and detailed reports. The ‘Maestro’ user interface provides advanced visualization and analysis tools to examine ligand-receptor interactions.
- Cross-platform (Linux, SGI, and IBM/AIX) and Parallel processing support.
Glide uses a “Stochastic Search” docking algorithm. The algorithm approximates a complete systematic search over ligand positions, orientations, and conformations in the receptor site. The energy minimization stage utilizes the Monte Carlo simulation algorithm.
The enhanced features and easy-to-use interface of Glide, however comes at a cost that is both financial and computational.
Yucca is a very recent and new algorithm specifically for small molecule docking, developed by Vicky Choi at the Department of Computer Science, Virginia Tech. The algorithm (still under active development), is based on an efficient heuristic for local search and has been used in conjunction with the conformer generator OMEGA (Optimized Molecular Ensemble Generation Application) to generate a set of low-energy conformers. Yucca utilizes a “multiconformer” algorithm. The conformers obtained from OMEGA are then rigidly docked and the configurations are coarsely sampled. Each configuration is then improved locally to a local minimum.
Comparative evaluation of Yucca with the other existing algorithms seems to prove it to be a competitive tool, however it is still being developed and tweaked to add and improve features. Among the improvements being made are: the tool’s own conformer generator, a better scoring function, flexible receptor docking and virtual screening.
Other available software tools include:
- BAPPL 
The Binding Affinity Prediction of Protein-Ligand server is a tool which can be used to calculate the binding energy of a non-metallo protein-ligand complex.
- AutoDOCK 
AutoDOCK, like DOCK, attempts to determine the orientation of a compound in a drug target. AutoDOCK, being a docking and scoring package, also contains scoring functions. AutoDOCK has also been applied to the problem of protein-protein docking .
- Docking Study with HyperChem 
A newer system that claims to use novel docking algorithms.
FleXX is another widely used docking program, known for its speed compared to many other applications.
- Molecular Docking Server 
Molecular docking server is an internet service that calculates the site, geometry and energy of small molecules interacting with proteins.
ZLAB is free academically but requires a paid license through Accelrys commercially. It is a two-stage docking system with ZLAB performing initial FT calculations and RLAB optimizing the highest scoring hits.
See the Wikipedia Molecular docking  site for links to additional softwares.
Although there are currently, many applications that simulate intermolecular interactions, there is still much room for improvement. Newer applications and algorithms are being developed constantly and the struggle for the perfect balance between accuracy and speed remains one of the critical factors. For instance, Glide, in spite of being among the best performers in terms of accuracy and ease of use, loses out heavily on computational time. Institutionally designed packages such as DOCK and AutoDock also have strengths as they are constantly worked on and updated. Most such academic software, however are command-line based and pose a steep learning curve for most users. Various other problems such as flexible receptor docking (which is, at this point enormously computationally expensive) are still areas of active research.
Choi, Vicky. “Yucca: An Efficient Algorithm for Small-Molecule Docking." Chemistry & Biodiversity 2 (2005): 1517-1524.
Flower, Darren. "Molecular Informatics: Sharpening Drug Design’s Cutting Edge". N.p.: Royal Society of Chemistry, 2002.
Kellenberger, E, et al. “Comparative Evaluation of Eight Docking Tools for Docking and Virtual Screening Accuracy.” Proteins 57.2 (2004): 225-242.
Kuntz, Irwin, et al. “DOCK 6.0 Users Manual.” The Official UCSF DOCK Web-site. July 2006. 12 Nov. 2006 http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm>.
Perun, Thomas J, and C L Propst. "Computer Aided Drug Design". N.p.: Marcel Dekker, Inc, 1989.
Schrödinger. “Glide 4.0 User Manual.” Biowulf. Apr. 2006. Schrödinger. 12 Nov. 2006.
Next Chapter: Contributors