Proteomics/Protein Primary Structure/Sequencing Methods
Protein sequencing denotes the process of finding the amino acid sequence, or primary structure of a protein. Sequencing plays a very vital role in Proteomics as the information obtained can be used to deduce function, structure, and location which in turn aids in identifying new or novel proteins as well as understanding of cellular processes. Better understanding of these processes allows for creation of drugs that target specific metabolic pathways among other things.
Though several methods exist to sequence proteins the two dominant methods are Mass Spectrometry and Edman Degradation. Other methods that are not as frequently used still can serve very specific roles, such as overcoming inadequacies or acting as a preliminary, that compliment the two predominant methods.
Pehr Edman began his work in the Northrop-Kunitz laboratory at the Princeton branch of the Rockefeller Institute of Medical Research in 1947 where he attempted to find a method to decode the amino acid sequence of a protein using chemicals; specifically he had early success with fluorodinitrobenzene (FDNB) and phenylisothiocyanate (PITC). Throughout his year at Princeton, Edman was able to conduct enough experiments to understand that it was feasible to use reagents like FDNB and PITC to determine amino acid sequence. Edman returned to Sweden in 1947 and after two more years of work he was able to publish his paper that would describe the first successful method to sequence proteins  .This groundbreaking paper described a method to determine the amino acid sequence of a protein and would come to be known as the Edman Degradation.
Five years earlier, Frederick Sanger had demonstrated a method to determine the amino acid residue located on the N-terminal end of a polypeptide chain by using the reagent fluorodinitrobenzene. While it was thought, that at most, this method could only provide the sequences found on the N-terminal, Sanger was able to take the method one step further. By using several proteolytic enzymes, partial hydrolysis and early version of chromatography, Sanger was able to cleave the protein into fragments and piece together the residues like a jigsaw puzzle. It wasn’t until 1955 that Sanger was able to present the complete sequence of insulin which led to him being awarded a Nobel Prize in Chemistry in 1958.
Mass Spectrometry, as a tool for the analysis of individual molecules, had been available many years before either Sanger or Edman began their work on protein sequencing. From its humble beginnings in the late 1800’s it has undergone many changes in its hardware and its software and has proven to be so critically important to the field of sequencing that several more Nobel Prizes were awarded to those that were able to improve upon this technology. Despite its importance today, it wasn’t until 1966 when K. Biemann, C. Cone, B.R. Webster, and G.P. Arsenault sequenced several oligopeptides containing glycine, alanine, serine, proline, and several other amino acids  that the importance of mass spectrometry was fully realized. Further developments were to come in the late 80's as the Mass Spectrometer became a more robust piece of instrumentation in the laboratory. 1989 saw the first demonstration of fast atom bombardment ionization with Tandem Mass Spectroscopy as applied towards the identification of protein sequences. The work here was the early foundation for the Protein Mass Fingerprinting procedure that would come to use in the early 90's. With the arrival of MALDI and electrospray ionization as two new ionization methods the dynamic range of mass spectroscopy was greatly improved and paved the way for the mass spectrometer to be a dominant tool in the use of protein sequencing. The advent of Proteomics in 1996 saw many rapid developments such as increasing computational power, the growth of the world wide web and protein databases, and the advances in mass spectroscopy multi-quadrupole systems that enabled MALDI-MS/MS and other tandem mass spectrometer methods .
N-terminal Residue Identification
N-terminal residue identification encompasses a technique which chemically determines which amino acid forms the N-terminus of a peptide chain. This information can be used to aid in ordering of individual peptide sequences that were generated using other sequencing techniques that fragment the peptide chain. Frequently, the first round of Edman Degradation will also contain impurities that may make identification of the N-terminus residue difficult. The general process of N-terminal residue identification is described below:
- The free unprotonated α-amino groups are labeled using a reagent that will selectively label the terminal amino acid. Reagents that can accomplish this include 2,4-dinitrofluorobenzene (DFNB - Sanger's reagent), dansyl chloride, and phenylisothiocyanate (Edman's reagent).
- The labeled peptide is hydrolyzed with acid which yields the N-terminal residue and other free amino acids.
- Each of these derivative N-terminal residues can be separated and identified using chromatography.
These methods can be used to identify the N-terminal residue of the peptide. This is time consuming process which has decreased in usefulness now that more efficient sequencing techniques are now available. Further complicating the issue: certain reagents used in the process can also degrade amino acid residues to the point where they are unrecognizable. Of the reagents available to label the N-terminal residue dansyl chloride is about 100 times more sensitive than FDNB due to its highly fluorescent nature which makes it easily detectable in minute amounts. The use of Edman's reagent is also advantageous as it leaves the remaining residues in the peptide chain untouched as described in the next section.
The Edman Degradation method is based on the principal that single amino acid residues can be modified chemically such that they can be cleaved from the chain without disrupting the bonds between any other residues. The procedure can be achieved with very minute amounts of peptide, usually amounts on the order of 10-100 picomoles will allow for successful completion. Samples must contain only one protein component and should be free of any reagents that interfere with the degradation process such as glycine, glycerol, sucrose, guanidine, ethanolamine, ammonium sulfate, and ammonium salts. The general method is described below:
- The peptides to be sequenced must first be immobilized by being absorbed onto a chemically modified glass or by electroblotting onto a porous polyvinylidene fluoride (PVDF) membrane.
- Under mildly alkaline conditions phenylisothiocyanate (PITC) is reacted with an uncharged terminal group on the amino acid chain to form a phenylthiocarbamoyl derivative.
- This phenylthiocarbamoyl derivative is then cleaved using Trifluoroacetic acid producing its anilinothiazolinone derivative (ATZ-amino acid). The next terminal amino acid is now exposed and ready for the same reactions to occur.
- A wash is performed to remove excess buffers and reagents and the ATZ amino acid is selectively extracted with ethyl acetate and converted to a more stable phenylthiohydantoin (PTH)- amino acid derivative.
- Identification of the PTH amino acid derivative is accomplished using chromatography or electrophoresis.
- The process can now be repeated for the remaining residues of the chain.
Automation of the Edman Degradation procedure was initiated in 1967 and continues to be a favorable sequencing method due to its sensitivity and rapid completion. Sequencers that can automate the Edman Degradation procedure include many models of the Applied Biosystems Procise or Protein Sequencer families.
The major drawback of the procedure remains the length of the peptide chain. If the chain exceeds a length of 50-60 residues (30 residues in practice) the procedure tends to fail due to the incompletion of the cyclical derivitization. This can be solved by taking the larger peptide chain and cleaving it into smaller fragments using cyanogen bromide, trypsin, chemotrypsin or any enzyme/chemical which can break peptide chains.
Mass spectrometry is quickly becoming the gold standard by which to identify protein sequences due to its ease of automation and extreme accuracy. The use of mass spectroscopy now dominates the process of sequencing proteins because prior problems of delivery were solved by John B. Fenn and Koichi Tanaka with their Nobel Prize winning electrospray ionization procedure. The two most popular methods to identify protein sequences using Mass Spectrometry are Peptide Mass Fingerprinting and Tandem Mass Spectrometry.
Peptide Mass Fingerprinting
This method, also known as Protein fingerprinting, was developed in 1993 by several groups     and functions by cleaving an unknown protein into smaller fragments so that these smaller fragments can then be accurately measured with a mass spectrometer. A generalized procedure is shown below:
- Protein samples are broken up into several smaller peptide fragments by proteolytic enzymes.
- The resulting fragments are extracted using acetonitrile and dried by vacuum. The peptides are then dissolved in distilled water and ready for analysis.
- The peptides are then inserted into the vacuum chamber of a mass spectrometer such as ESI-TOF or MALDI-TOF.
The mass spectrometer produces a peak list (i.e. a list of molecular weights), which is then compared against databases such as SwissProt or GeneBank to find close matches. Software is used to translate the retrieved genomic data into proteins which then undergo simulated cleavage by the same enzyme used to cleave the unknown protein. The mass is calculated of these fragments and then compared to that of the unknown protein.
Tandem Mass Spectrometry
Tandem Mass Spectrometry describes the partitioning of mass spectroscopy into separate steps where fragmentation occurs in between these steps. These separations can occur either physically in space, by separate chambers called quadrupoles, using either multiple mass spectrometers or in a single mass spectrometer by time. The generalized procedure is described below:
- Enzymatic or chemical degradation of target protein to produce peptides.
- Fractionation of peptides by high-performance liquid chromatography.
- Resulting fragments fed into mass spectrometer for analysis.
Analysis of fragments by Tandem Mass Spectrometry occurs in two or more quadrupole systems with the first quadrupole filtering select ions that will undergo further analysis. These filtered ions are transferred to the second quadrupole which acts as a collision center to induce further fragmentation at amide linkages. A third quadrupole is then used to separate these fragments by mass. Tandem Mass Spectrometry mainly generates peptides of the N- and C- terminal types, which are represented in 2D via mass/charge vs. intensity graphs.
The spectra produced by a mass spectrometer containing all the molecular weights of the fragments is called a peptide map and can serve as a means of identifying proteins analyzed by a mass spectrometer. Other approaches to identifying protein sequences include Protein Sequence tags and de novo methods. Peptide sequence tags, proposed by Matthias Wilm and Matthias Mann at the EMBL, function by sampling of masses at random points during a Tandem Mass Spectrometry experiment. These handfuls of masses are then used as unique tags to identify specific peptides following further fragmentation by the mass spectrometer. This aids not only in identification but also during the process of attempting to stitch back together the peptides into a full sequence. De novo approaches to protein sequencing identification are also employed along side similarity searches. These de novo methods do not take into account any prior knowledge of the amino acid sequence being analyzed and approach the identification of peptide sequences in novel ways. Examples include Hidden Markov Models, which takes a statistical approach to protein sequence identification, and graph searches of the problem space that helps minimize the search space as to speed up the time needed to identify sequences via a database.
- Morgan, F.J. "AAS Biographical Memoirs - Pehr Victor Edman 1916 - 1977." 8 April 1998. University of Melbourne. Accessed 30 March 2008. <http://www.asap.unimelb.edu.au/bsparcs/aasmemoirs/edman.htm>.
- Biemann K., Cone C., Webster B.R., Arsenault G.P. Determination of the amino acid sequence in oligopeptides by computer interpretation of their high-resolution mass spectra J. Am. Chem. Soc., 1966, 88(23), p.5598-606. PMID: 5980176.full text available online.
- Henzel W.J., Watanabe C., Stults J.T. Protein Identification: The Origins of Peptide Mass Fingerprinting. Journal of the American Society for Mass Spectrometry, 2003, 14(), p.931 full text available online
- Bhagavan, N. V. (2001) Medical Biochemistry (Harcourt Academic, San Diego). full text available online
- Lottspeich, Friedrich. "Edman Sequencing." Max Planck Institute of Biochemistry. Accessed 30 March 2008. <http://wwwa2.udic.org/en/rg/lottspeich/technologies/edman_sequenzing/absatz_2_edman_degradation.pdf>.
- Edwards, Adam. "Protein Sequencing by Adam Edwards." 06 December 1999. Samford University. Accessed 30 March 2008. <http://faculty.samford.edu/~gekeller/edwards.html>.
- Automated Edman degradation: the protein sequenator. Methods Enzymol. 1973, 27, 942-1010.
- Pappin DJ, Hojrup P, Bleasby AJ (1993). "Rapid identification of proteins by peptide-mass fingerprinting". Curr. Biol. 3 (6): 327-32. PMID 15335725.
- Henzel WJ, Billeci TM, Stults JT, Wong SC, Grimley C, Watanabe C (1993). "Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases". Proc. Natl. Acad. Sci. U.S.A. 90 (11): 5011-5. PMID 8506346.
- Mann M, Højrup P, Roepstorff P (1993). "Use of mass spectrometric molecular weight information to identify proteins in sequence databases". Biol. Mass Spectrom. 22 (6): 338-45. doi:10.1002/bms.1200220605. PMID 8329463.
- James P, Quadroni M, Carafoli E, Gonnet G (1993). "Protein identification by mass profile fingerprinting". Biochem. Biophys. Res. Commun. 195 (1): 58-64. doi:10.1006/bbrc.1993.2009. PMID 8363627.
- Yates JR, Speicher S, Griffin PR, Hunkapiller T (1993). "Peptide mass maps: a highly informative approach to protein identification". Anal. Biochem. 214 (2): 397-408. doi:10.1006/abio.1993.1514. PMID 8109726.
- Hunt DF, Yates JR III, Shabanowitz J, Winston S, Hauer CR. Proc Natl Acad Sci USA. 1986;83:6233–6237
- Ashcroft, Allison. "An Introduction to Mass Spectrometry." University of Leeds. Accessed 17 April 2008. <http://www.astbury.leeds.ac.uk/facil/MStut/mstutorial.htm>.
- Mann M, Wilm M (1994). "Error-tolerant identification of peptides in sequence databases by peptide sequence tags". Anal. Chem. 66 (24): 4390-9. PMID 7847635.