Structural Biochemistry/Volume 5

From Wikibooks, open books for an open world
< Structural Biochemistry
Jump to: navigation, search



Proteins are polymers of multiple monomer units called amino acid, which have many different functional groups. More than 500 amino acids exist in nature, but the proteins in all species, from bacteria to humans, consist mainly of only 20 called the essential amino acids. The 20 major amino acids, along with hundreds of other minor amino acids, sustain our lives. Proteins can have interactions with other proteins and biomolecules to form more complex structures and have either rigid or flexible structures for different functions. Iodinated and brominated tyrosine can also be present; however, they were also not listed in the 20 amino acids. Since iodinated tyrosin can only being found in thyroid hormones and brominated tyrosine only being found in coral, they were excluded. The 20 main amino acids that are found in most but not all proteins are listed below:

Amino Acids[edit]

Amino acids are molecules which contain both a carboxylic acid and an amine group. In amino acid, the carboxyl group is more acidic than the carboxylic acid. 2-amino acids, also known as alpha-amino acids, are a specific type of amino acid that makes up proteins. These amino acids have many interesting properties which will be discussed in the next sections.

Amino acids play central roles both as building blocks of proteins and as intermediates in metabolism. Proteins are linear polymers formed by linking the a-carboxyl group of one amino acid to the a-amino group of another amino acid. This type of linkage is called a peptide bond or an amide bond. The formation of a dipeptide from two amino acids is accompanied by the loss of a water molecule. The equilibrium of this reaction lies on the side of hydrolysis rather than synthesis under most conditions. Hence, the biosynthesis of peptide bonds requires an input of free energy. Nonetheless, peptide bonds are quite stable kinetically because the rate of hydrolysis is extremely slow; the lifetime of a peptide bond in aqueous solution in the absence of a catalyst approaches 1000 years. Thus, the 20 amino acids that are found within proteins convey a vast array of chemical versatility. The precise amino acid content, and the sequence of those amino acids, of a specific protein, is determined by the sequence of the bases in the gene that encodes that protein. The chemical properties of the amino acids of proteins determine the biological activity of the protein. Proteins not only catalyze all (or most) of the reactions in living cells, they control virtually all cellular process. In addition, proteins contain within their amino acid sequences the necessary information to determine how that protein will fold into a three dimensional structure, and the stability of the resulting structure. The field of protein folding and stability has been a critically important area of research for years, and remains today one of the great unsolved mysteries. It is, however, being actively investigated, and progress is being made every day.

Amino Acid Subdivisions[edit]

There are twenty major amino acids which make up proteins. Each of them contains a unique functional group which gives rise to different properties. These properties include size, shape, charge, capacity for hydrogen bonding, hydrophillicity/hydrophobicity(hydrophobic interactions), and chemical reactivity. Amino acids can be broadly hydrophobic and hydrophilic, depending on the chemical properties of the R group side chain. In an aqueous environment, the hydrophobic amino acids are unable to participate in hydrogen bonding. They associate with one another and reside mostly inside the protein. On the other hand, hydrophilic amino acids tend to interact in the aqueous environment due to polarity. These amino acids are normally found on the exterior surface.


An amino acid is in a zwitterionic state when the carboxylic acid group is deprotonated and the amino group is protonated, simultaneously. Zwitterions are dipole ions—meaning that these molecules have two charges, both a positive and a negative charge. The pH of the water solution is a factor determining the state of protonation. Such a state leaves the carboxylic end negatively charged (-COO-) and the adjacent amino end positively charged (-NH3+). The carboxyl group (-COO-) is deprotonated first because the pKa is about 2 and the pKa of the amine group (-NH3+) is about 9. The net charge for the protein in zwitterionic form is zero. [1] Molecules which behave in this fashion are called amphoteric. In solid state, the amine functionality deprotonates the carboxylic acid group, giving rise to the zwitterionic, dipolar entity. The charged state of an amino acid in aqueous solution depends largely on the pH. The major form of all amino acids at a pH of 2 to 9 is the zwitterionic form. In strong acid (pH < 2), the predominant form is the fully protonated cationic ammonium with the corresponding protonated form of the carboxylic acid. This species has a net charge of +1. In strongly basic solutions (pH > 9), the predominant form is the fully deprotonated aminocarboxylate anion. This species would have a net charge of -1. These forms interconvert by acid-base equilibria. This leaves a wide pH range wherein the zwitterion would play a large role as a contributing species. The pH at which the extent of protonation equals that of deprotonation is called the isoelectric pH or the isoelectric point (pI). At this pH, the amount of positive charge balances that of negative charge and the concentration of the charge-neutralized zwitterionic form is at its highest. When the side chain of the acid bears an additional acidic or basic function, the pH is either decreased or increased, respectively. Note that at most relevant physiological pH ranges, the zwitterion would be, by far, the species of the most abundance.

Histidine contains an imidazole ring with 2 nitrogen atoms: one is basic and the other is not. The basic nitrogen is involved in the delocalization which is important during enzyme catalysis.

Here is an example of L-amino acids forming zwitterion at neutral pH:

Lysine-zwitterion,Zwitterionic forms of L-amino acids

Optical Activity[edit]

All proteins or polypeptides are a series of linked amino acids. A typical α amino acid consists of a central carbon (which is the alpha carbon in this case) that is attached to an amino group (-NH2), a carboxylic acid (-COOH), a hydrogen atom, and a distinctive R group. The R group, usually referred to as a side chain, determines the properties of each amino acid. Scientists classify amino acids into different categories based on the nature of the side chain. A tetrahedral carbon atom with four distinct groups is called chiral. The ability of a molecule to rotate plane polarized light to the left, L (levorotary) or right, D (dextrorotary) gives it its optical and stereochemical fingerprint. All amino acids within polypeptides are configured in the L form. The L form corresponds to the absolute configuration of S, which is a system used to designate stereochemistry in the field of organic chemistry. Although D-amino acids (designated as R stereoisomers in the field of organic chemistry) exist naturally, they are not found in proteins. Thus far, scientists have not been able to come up with a hypothesis on the preference for the L amino acids in living organisms. It is clear, however, that all of the physiological mechanics downstream of the amino acids are geared towards recognizing and interacting with the specific L conformation. Note: Since the central carbon has four distinct groups attached, all of amino acids are chiral except for glycine, which is achiral. This is due to the fact that the central carbon atom in glycine contains only 3 unique substituents instead of 4 (R sidechain = H).

Modified Amino Acids[edit]

Within proteins, it is possible to find amino acids which do not correspond to the 20 standard types. Most of these come about by chemical modification of an already incorporated amino acid. For example, a hydroxylated form of proline exists within collagen protein. Also, a selenium analog of cysteine is known to occur in glutathione peroxidase enzymes. Pyrrolysines have also been isolated and characterized. These exceptions to the rule are dictated by and encoded within DNA and RNA and there are many more examples.

The Peptide Bond[edit]

Any discussion of amino acids is not complete without mentioning how each amino acid bonds to another. All amino acids bond to one another through a condensation reaction involving the amine group of one amino acid and the carboxylic acid group of another. The enzymatically-catalyzed reaction forms an amide entity: [R1-NH2 + R2-COOH ==> R1-NH-C(=O)-R2 + H2O]. The amide bond has special properties in that it has a resonance form which gives the bond a planar, rigid, double bond character: [R1-N-C(=O)-R2 <==> R1-N+=C(-O-)-R2]. Amino acids can link to each other in small units of only 2 or 3 amino acids called dipeptides and tripeptides, but can also connect in very large chains consisting of hundreds or even thousands of amino acids. Each complete peptide series has an N terminus (amino) and a C terminus (carboxylate). The overall, 4 atom angles involved in the peptide bond system are important to those who study proteins. In particular, the R-[N-C-C(=O)-N]-R group is called a phi torsion angle and the adjacent angle, the psi, φ, torsion angle, involves the R-[C-N-C-C(=O)]-R group. These angles are important to consider and the natural distribution of know peptide angles are summarized on the Ramachandran plot. Peptide bond is formed by condensation reaction and broken by hydrolysis (addition of water).

Tetraptide is a peptide that has four amino acids that are joined by peptide bonds.

Amino Acid Classification

  • Non-polar Amino Acids
Aliphatic : glycine, alanine, valine, isoleucine, leucine
Aromatic : phenylalanine, tryptophan.
Cyclic : Proline
  • Polar Amino Acids
Sulfur-Containing : cysteine, methionine
Hydroxyl-Containing : serine, threonine
Aromatic : tyrosine
Acidic Amide : asparagine, glutamine
  • Charged Amino Acids (at physiological pH)
Acidic : aspartic acid, glutamic acid
Basic : histidine, lysine, arginine

List of the 20 Amino Acids[edit]

Amino Acid 3-Letter Abbreviation 1-Letter Abbreviation Class of Amino Acid (Side Chain) Hydrophobicity Index (100 being extremely hydrophobic, 0 being neutral, and -55 being hydrophilic) Structure pKa of COOH group pKa of NH3+ group pKa of R group Molecular Weight [g/mol] alpha helix beta sheet Reverse turn
Glycine Gly G Aliphatic, nonpolar Neutral (0 at pH = 2; 0 at pH = 7) Glycin - Glycine.svg 2.4 9.8 -- 75.07 0.43 0.58 1.77
Alanine Ala A Aliphatic, nonpolar Hydrophobic (47 at pH = 2; 41 at pH = 7) L-Alanin - L-Alanine.svg 2.4 9.9 -- 89.1 1.41 0.72 0.82
Valine Val V Aliphatic, nonpolar Very Hydrophobic (79 at pH = 2; 76 at pH = 7) L-Valin - L-Valine.svg 2.3 9.7 -- 117.15 0.90 1.87 0.41
Leucine Leu L Aliphatic, nonpolar Very Hydrophobic (100 at pH = 2; 97 at pH = 7) L-Leucin - L-Leucine.svg 2.3 9.7 -- 131.18 1.34 1.22 0.57
Isoleucine Ile I Aliphatic, nonpolar Very Hydrophobic (100 at pH = 2; 99 at pH = 7) L-Isoleucin - L-Isoleucine.svg 2.3 9.8 -- 131.18 1.09 1.67 0.47
Methionine Met M Hydroxyl or Sulfur-Containing, nonpolar Very Hydrophobic (74 at pH = 2; 74 at pH = 7) L-Methionin - L-Methionine.svg 2.1 9.3 -- 149.21 1.30 1.14 0.52
Serine Ser S Hydroxyl or Sulfur-Containing, polar Neutral (-7 at pH = 2; -5 at pH = 7) L-Serin - L-Serine.svg 2.2 9.2 -- 105.09 0.57 0.96 1.22
Cysteine Cys C Hydroxyl or Sulfur-Containing, polar Hydrophobic (52 at pH = 2; 49 at pH = 7) L-Cystein - L-Cysteine.svg 1.9 10.7 8.4 121.16 0.66 2.40 0.54
Threonine Thr T Hydroxyl or Sulfur-Containing, polar Neutral (13 at pH = 2; 13 at pH = 7) L-Threonin - L-Threonine.svg 2.1 9.1 -- 119.12 0.76 1.17 0.96
Proline Pro P Cyclic Hydrophilic (-46 at pH = 2; -46 at pH = 7) L-Prolin - L-Proline.svg 2.0 9.6 -- 115.13 0.34 0.31 1.32
Phenylalanine Phe F Aromatic Very Hydrophobic (92 at pH = 2; 100 at pH = 7) L-Phenylalanin - L-Phenylalanine.svg 2.2 9.3 -- 165.19 1.16 1.33 0.59
Tyrosine Tyr Y Aromatic Hydrophobic (49 at pH = 2; 63 at pH = 7) Tyrosin - Tyrosine.svg 2.2 9.2 10.5 181.19 0.74 1.45 0.76
Tryptophan Trp W Aromatic Very Hydrophobic (84 at pH = 2; 97 at pH = 7) L-Tryptophan - L-Tryptophan.svg 2.5 9.4 -- 204.25 1.02 1.35 0.65
Histidine His H Basic Hydrophilic at pH=2 (-42), Neutral at pH=7 (8) L-Histidin - L-Histidine.svg 1.8 9.3 6.0 155.16 1.05 0.80 0.81
Lysine Lys K Basic Hydrophilic (-37 at pH = 2; -23 at pH = 7) L-Lysin - L-Lysine.svg 2.2 9.1 10.5 146.188 1.23 0.69 1.07
Arginine Arg R Basic Hydrophilic (-26 at pH = 2; -14 at pH = 7) L-Arginin - L-Arginine.svg 1.8 9.0 12.5 174.2 1.21 0.84 0.90
Aspartate Asp D Acidic Neutral at pH=2 (-18), Hydrophilic at pH=7 (-55) L-Asparaginsäure - L-Aspartic acid.svg 2.0 9.9 3.9 133.10 0.99 0.39 1.24
Glutamate Glu E Acidic Neutral at ph=2 (8), Hydrophilic at pH=7 (-31) L-Glutaminsäure - L-Glutamic acid.svg 2.1 9.5 4.1 147.13 1.59 0.52 1.01
Asparagine Asn N Acidic, polar Hydrophilic (-41 at pH = 2; -28 at pH = 7) L-Asparagin - L-Asparagine.svg 2.1 8.7 -- 132.118 0.76 0.48 1.34
Glutamine Gln Q Acidic, polar Neutral (-18 at pH = 2; -10 at pH = 7) L-Glutamin - L-Glutamine.svg 2.2 9.1 -- 146.15 1.27 0.98 0.84

Network Approach[edit]

  • The network approach helps determine the role of a specific amino acid at a known position in the protein structure. Networks simplify complex system behaviors by splitting the system into a series of links. Links represent the neighboring positions of amino acids in protein molecules. Because proteins are linked in this way and protein structure networks are connected to each other by only a few other amino acid elements, we can determine folding probability. Proteins with denser protein structure networks fold more easily and the folding probability increases as the protein structure becomes more compact.
  • The network approach can also be applied to the prediction of active centres in proteins. Active centres are protein segments that play key parts in the catalytic reaction of the enzyme function shown by their respective proteins. Scientists have used long-range network topology to create a network skeleton from which they can study only side chains which are essential in the flow of information for the whole protein. Network analysis has showed that active centres occupy a central position in protein structure networks, usually have many neighbors, give unique linkages in their neighborhood, integrate communication for the entire network, do not take part in wasteful actions of ordinary residues, and collect and coordinate most of the energy in the network.

Alanine - Ala/ A[edit]


Structure Alanine, also known as 2-Aminopropanoic Acid, (abbreviated as Ala or A) is an α-amino acid with the chemical formula HOOCCH(NH2)CH3. It has a molar mass of 89.09 g/mol and a density of 1.424 g/cm3. The α-carbon atom of alanine is bound with a methyl group (-CH3), making it one of the simplest α-amino acids with respect to molecular structure and also resulting in alanine being classified as an aliphatic and amino acid. The methyl group of alanine is non-reactive and is thus almost never directly involved in protein function. Alanine is a nonpolar hydrophobic molecule. It is ambivalent, meaning it can be inside or outside of the protein molecule. The α-carbon of alanine is optically active; in proteins, only the L-isomer is found.

Features Alanine is a non-essential amino acid which meant that it can be manufactured by the human body and does not need to be obtained directly through the diet. Alanine is found in a wide variety of foods, but is particularly concentrated in meats. It is a non-essential amino acid that occurs in high levels in its free state in plasma.

Functions Alanine is the primary amino acids for sugar and acid metabolism. It boosts up the immune system by producing antibodies, and provide energy for muscles tissues, brain, and the central nervous system. It is used in pharmaceutical preparations for injection or infusion. It is also used in dietary supplement and flavor compounds in maillard reaction products. In addition, it is a stimulant of glucagon secretion.

Chemical Synthesis Alanine can be manufactured in the body from pyruvate and branched chain amino acids such as valine, leucine, and isoleucine. Alanine is most commonly produced by reductive amination of pyruvate. Because transamination reactions are readily reversible and pyruvate pervasive, alanine can be easily formed and thus has close links to metabolic pathways such as glycolysis, gluconeogenesis, and the citric acid cycle. It also arises together with lactate and generates glucose from protein via the alanine cycle. Racemic alanine can be prepared via the condensation of acetaldehyde with ammonium chloride in the presence of potassium cyanide by the Strecker reaction.

Analysis Alanine can be identified via UV spectrometry, infrared spectroscopy (IR), nuclear magnetic spectroscopy, (NMR), and mass spectroscopy.

Arginine - Arg/ R[edit]


Structure Arginine, 2-Amino-3-carbamoylpropanoic acid, contained of a three-carbon aliphatic straight chain with the end of which is capped by a guanidinium group. Its molar mass is 132.12g/mol. With a pKa of 12.48, the guanidinium group is positively charged in neutral, acidic and even most basic environments. Therefore, arginine has basic chemical properties. Because of the conjugation between the double bond and the nitrogen lone pairs, the positive charge is delocalized and enables the formation of multiple H-bonds.

Features Arginine is an essential amino acid that plays important role in nitrogen metabolism. It is a chemical precursor to nitric oxide (a blood vessel-widening agent called a vasodilator. Nitric oxide is a powerful neurotransmitter that helps blood vessels relax and also improves circulation. Food that are rich in arginine include red meat, fish, poultry, wheat germ, grains, nuts and seeds, and dairy products.

Functions Arginine assists in wound healing and help in burn treatment. It is necessary in normal immune system activity by enhancing the production of T-cells. Studied show that arginine may help treat medical conditions that improve with increased vasolidation. Some conditions that are treated with arginine are chest pain, atherosclerosis (clogged arteries), heart disease or failure, erectile dysfunction, intermittent claudication/peripheral vascular disease, and vascular headaches (headache-inducing blood vessel swelling). Arginine also helps with bodybuilding, enhancing sperm production, and preventing tissue wasting in people with critical illnesses. Arginine hydrochloride has high chloride content and has been used to treat metabolic alkalosis.

Biosynthesis Arginine is synthesized from citrulline with the presence of cytosolic enzymes argininosuccinate synthetase and argininosuccinatelyase. This is energetically costly reaction. Therefore, the synthesis of each molecule of argininosuccinate will be coupling with hydrolysis of adenosine triphosphate (ATP) to adenosine monophosphate (AMP).

Synthesis of arginine in human body occurs principally via the intestinal–renal axis, wherein epithelial cells of the small intestine, which produce citrulline primarily from glutamine and glutamate, then join with the proximal tubule cells of the kidney, which extract citrulline from the circulation and convert it to arginine, which comes back to the circulation.

Arginine and Nitrogen Storage In order for a cell to grow, it needs nitrogen which can come from ammonia, nitrates, dinitrogen or amino acids. The PII protein is an ancient signaling protein that senses and integrates nitrogen and carbon abundance by binding 2 OG and ATP/ADP. The N-acetyl-L-Glutamate kinase (NAGK) stores nitrogen as arginine which it incorporates into arginine rich copolymers. Since arginine is nitrogen-rich, it is an ideal for nitrogen storage. The osmotic impact of arginine minimizes when arginine is incorporated into proteins. The PII protein binds to NAGK when nitrogen is abundant only in oxygenic phototrophs. But when nitrogen is scarce, 2-oxoglutarate binds to the PII protein with ATP leading to the dissociation of the PII-NAGK complex.

Arginine-insensitive NAGK is a homodimer containing a backbone of 16-stranded Beta sheets in both subunits. However, arginine-sensitive are hexameric and recent studies have shown that these enzymes are ring-like hexameric trimers of dimers. The ring is formed by the link between three E. Coli NAGK-like dimers and the N-terminal alpha-helix. In arginine-sensitive NAGK, the arginine is connected by interlaced N-helices. The helices are needed for making NAGK an arginine-operated switch showing a sigmoidal of the arginine inhibition kinetics. The PII protein is homotrimers having a βαββαβ subunit topology with the alpha helices looking outward and the beta sheet inward. The T-loop is large and flexible loop that contain the phosphorylation and uridylylation sites in cyanobacteria and proteobacteria. When the protein PII is absent, S. elongates NAGK is inactive having low Vmax and high Km for NAG and requiring a low concentration of argigine for inhibition. However, the enzyme A. thaliana NAGK is highly active having a Km four times lower and a Vmax three times greater for NAG than S. elongates NAGK. When PII binds the S. Elongates NAGK, the Vmax for NAG increases up to four times the original amount and decreases up to ten times the original amount for Km. Km is not affected when it binds to A. thaliana NAGK, but the Vmax for NAG increases by five times the original amount. The original amount is the amount with the protein PII absent. The S. elongates PII-NAGK complex has one NAGK hexamer that is sandwiched between two PII trimers. Since the PII proteins are not packed tightly on NAGK, PII only interacts with NAGK on the T-loops and B-loops. The A. thaliana PII-NAGK complex has MgATP bounded to the PII protein with all the NAGK active centers containing bound NAG and ADP.

Asparagine - Asn/ N[edit]


Structure Asparagine is polar and uncharged derivative of acidic amino acid aspartic acid or aspartate; as a side chain, it has a carboxamide group, which is neutral at physiological pH and can be changed to carboxylic acid by hydrolysis to form aspartate amino acid. The carboxamide group of the amino acid can form hydrogen bonds.

Features Asparagine is found in abundance in asparagus, and is thus named so. Asparagine is not an essential amino acid, meaning that it is not necessary for humans to ingest it to receive necessary amounts. Asparagine has a high propensity to hydrogen bond, since the amide group can accept two and donate two hydrogen bonds. It is found on the surface as well as buried within proteins. It is a common site for attachment of carbohydrates in glycoproteins. Food sources that contain asparagine is dairy, beef, poultry, and eggs.

Functions Asparagine, along with glutamate, is an important neurotransmitter. Since Aspartic acid and Asparigine have high concentration in the hippocampus and hypothalamus of the brain, which is important in short-term memory and emotions, the two amino acids serves essential role between the brain and the rest of the body. Asparagine is required by the nervous system to maintain equilibrium and is also required for amino acid transformation from one form to the other which is achieved in the liver.

Synthesis Synthesis of asparagine requires oxaloacetate, C4H4O5. The double bonded oxygen attached to carbon-2 is replaced by ammonium group from glutamate via a process called transaminase. The newly formed compound, or aspartate, is converted to asparagine by replacing a negatively charged oxygen end with an ammonium group. The asparagine synthesis converts glutamine to glutamate, and ATP into AMP and pyrophosphate.

Analysis Asparagine can be identified by following methods: UV spectrometry, infrared spectroscopy (IR), nuclear magnetic spectroscopy, (NMR), and mass spectroscopy.

Aspartic acid - Asp/ D[edit]


Structure Aspartic acid (C4H7NO4) is also named as a 2-aminobutanedioic acid. Its molecular weight is 133.1 g/mol.

Also known as aspartate, Aspartic acid is an acidic and polar amino acid that has carboxylic acid group, which loses a proton to be carboxylate group for physiological pH and has a negative charge; the carboxylic acid group of the amino acid has a pKa value of 4.1, which is a little basic than the terminal α-carboxyl group. Its pI is 5.41. Proteins are critical to maintain the pH balance in the body. It is the charged amino acids that are involved in the buffering properties of proteins. Aspartic acid is similar to alanine but with one of the β hydrogens replaced with a carboxylic acid group. This carboxylic acid group is what makes aspartate an acidic amino acid. Aspartate has an α-keto homolog, called oxaloacetate. Aspartate and oxaloacetate are interconvertable by a simple transamination reaction. Oxaloacetate is one of the intermediates of the Krebs cycle. The Krebs cycle is the sequence of reactions by which most living cells generate energy during the process of aerobic respiration.

Features Aspartic acid is a non-essential amino acid can be obtained from central metabolic systems.

Functions Aspartic acids are involved in transamination in which oxaloacetate and aspartate is interconvertible. It is also involved in immune system activity by promoting immunoglobulin production and antibody production. Moreover, aspartic acid protects the liver and helps in detoxification of ammonia.

Aspartate, the conjugate base of aspartic acid, also functions as a neurotransmitter. Along with few other amino acids, its primary role is to activate NMDA receptors in brain and; however, its effect is not significant as glutamate's.

Other than its role as an excitatory neurotransmitter, aspartate is proteinogenic amino acids that are used in coding of DNA.

Aspartate plays important roles as acids in enzyme active centers, as well as in maintaining the solubility and ionic character of proteins.

Synthesis Aspartic acid is synthesized from oxaloacetate via transamination. Aspartic acid can be used as an initial reactant in synthesis of other essential amino acids as well: methionine, threonine, isoleucine, and lysine. Aspartic acid needs to be reduced to its semialdehyde form of HOOCCH(NH2)CH2CHO. Asparagine can be also obtained from aspartic acid via transamidation: aspartic acid + glutamine -> asparagine + glutamic acid

Cysteine - Cys/ C[edit]


Structure Cysteine, C3H7NO2S with molecular mass of 121.16 g/mol, is an amino acid that is made of the sulfhydryl or thiol group (-SH), which is more nucleophilic than a hydroxyl group. Its alternate name is 2-amino-3-mercaptopropanoic acid. Two cysteine residues can be oxidized to form stable disulfide bonds. Disulfide bonds can help to give a protein secondary and tertiary structure, e.g. protein folding. The unit of two bonded cysteines is known as cystine. Cysteine is considered to be a hydrophilic amino acid based on the fact that the thiol group interacts well with water. It is also a non-essential amino acid, and can be biosynthesized in human bodies.

Functions Nucleophilic thiol groups in cysteine can be easily oxidized; thus, cystein is highly reactive with its neutral pKa and has various functions in biology.

Cysteine is capable of inactivation of insulin in bloodstream. Excessive amount of cysteine reduces one of three disulfide bonds in insulin structure. As a result, insulin loses its functionality. Cysteine's capability of inactivation of insulin can be utilized in medicine and pharmaceutic when a patient experiences hypoglymecia attack due to high level of insulin.

Cysteine promotes iron production in iron deficiency anemia. It also assists in lung diseases by increasing production of red blood cells and red blood cells. Cysteine is a key, active site residue in many important proteins. Cysteine is the key residue in glutathione reductases which has protective effects against UV light, radiation, and free radicals. Additionally, glyceraldehyde-3-phosphate dehydrogenase, a key enzyme in glycolysis, uses cysteine in to achieve its most critical functions.

When cysteine is taken as a supplement, it is in the form of N-acetyl-L-cysteine (NAC). The body makes this into cysteine and then into glutathione, a powerful antioxidant. Antioxidants fight free radicals which are harmful compounds in the body that cause damage to the cell membranes and DNA. Researchers believe the free radicals play a role in aging as well as the development of a number of health problems, including heart disease and cancer. NAC can also help prevent side effected caused by drug reactions and toxic chemicals. It also helps break down mucus in the body. NAC also benefits in treating some respiratory conditions, such as bronchitis and COPD. COPD is the acronym for chronic obstructive pulmonary disease. Doctors often give NAC to people who have taken an overdose of acetaminophen (Tylenol). The NAC helps to prevent or reduce liver and kidney damage. NAC also helps reduce angina. Angina is chest pain or discomfort when the heart muscle does not get enough blood. Taking NAC will open the blood vessels and improve blood flow to the heart. Studies have also shown that NAC may help relieve symptoms of chronic bronchitis, leading to fewer flare ups. Not all studied gave these results. Some studies did not find any reduction in flare ups. Other studies showed that people with COPD who took NAC lowered the number of flare ups about 40% when used with other therapies. Another study shows that people who took NAC two times a day had fewer flu symptoms than those who took placebo. Some research has shown that intravenous NAC may boost levels of glutathione and help prevent and/or treat lung damage cause by ARDS, acute respiratory distress syndrome. Other results did not coincide with these results. For example, giving NAC to people with ARDS helped reduce the severity of their conditions while not reducing the number of overall deaths compared to placebo.

Biosynthesis The precursors of synthesis of cysteine are serine and methionine. Serine has a hydroxide group and methione has a sulfer as their substituents. Methione is initially converted into a homocysteine. With serine, homocysteine becomes cystathione (C7H14N2O4S) with water molecule leaving. Finally, addition of water and departure of ammonia from cystathione result in cysteine and alpha-ketobutyrate as a side-product.

Glutamine - Gln/ Q[edit]



Glutamine, or 2-amino-4-carbamoylbutanoic acid, has a molecular formula of C5H10N2O3 and a molecular mass of 146.16 g/mol. It is a polar and uncharged derivative of acidic amino acid glutamic acid or glutamate; it has a carboxamide group, which is neutral at physiological pH and can be changed to carboxylic acid by hydrolysis to form glutamate amino acid. The carboxamide group of the amino acid can form hydrogen bonds.

Glutamine Final

Synthesis As previously stated, glutamine is a nonessential amino acid. In the body, glutamine is synthesized from glutamate via the enzyme glutamine synthestase (GS) and through the addition of ATP and ammonia. (See Figure).

Glutamate + ATP + NH3 → Glutamine + ADP + phosphate + H20

The incorporation of ammonia into glutamate is an amidation type reaction and the hydrolysis of ATP to ADP drives the reaction forward. ATP is directly involved in the reaction because it phosphorylates the carboxyl group on the side chain of glutamate and forms an acyl-phosphate intermediate (See Figure: Glutamine Final). The acyl-phoshphate intermediate reacts with free ammonia and forms glutamine. Glutamine synthetase (GS) plays a major role because a high-affinity binding-site for ammonia is formed in GS after the formation of the intermediate to prevent hydrolysis of the intermediate. Hydrolysis of the intermediate would not yield glutamine and thus waste a valuable molecule of ATP.

Functions Glutamine is a non-essential amino acid, which means that it will naturally occur in the human body and does not need to be gathered from exogenous sources. It is one of the most abundant amino acid manufactures in the body. Glutamine circulates in the blood and is able to cross the blood-brain barrier directly.

Glutamine has various functions in biochemistry. Its primary role is protein synthesis, but it also helps to maintain neutral pH in the liver by balancing the acid and base levels.

Like glucose, glutamine is capable of fueling cell bodies. It donates nitrogen to cells via anabolic reactions and provides carbons in the citric acid cycle. It is critical in the gastrointestinal system in that it provides energy to the small intestine. Notably, intestine is the only organ in the body that uses glutamine as a primary energy source. The kidney, activated immune cells, and cancer cells also require glutamine, but not as a primary energy source.

Within a cell, glutamine is essential for cell growth and protein translation. Moreover, it serves as a nitrogen donor and assists in maintaining the gradient across the mitochondrial membrane.

Normal cells require glutamine. On the other hand, cancer cells use glutamine in quantities much higher than normal cells. As discussed in the paper "Glutamine addiction: a new therapeutic target in cancer" by David R. Wise and Craig B. Thompson, cancer cells will sometimes exhibit what is called “glutamine addiction”. In this addiction, cancer cells will uptake glutamine from the body in much larger amounts than is necessary for cellular function. In fact, cancer cells will intake more glutamine than the cell can metabolize. Depriving cancer cells of this excess glutamine causes them to die. Such deprivation is the key to potential glutamine-based cancer therapy. Glutamine consumption can exceed the consumption of any other amino acid in the cell by tenfold. In cancer cells, a metabolic shift occurs so that glutamine replaces glucose as the major source of carbon for the cell.

The body can make enough glutamine for its regular needs, but extreme stress, such as heavy exercise or an injury), will make the body require more glutamine. Most glutamine is stored in muscles followed by the lungs, where much of the glutamine is made. Usually the body can make enough glutamine so it is not necessary to take supplements of glutamine. Certain medical conditions, including injuries, surgery, infections, and prolonged sites, can lower glutamine levels, however. In these cases, taking a glutamine supplement may be helpful.

Glutamine is important for removing excess ammonia, which is a common waste product in the body. Glutamine also helps your immune system function and is need for normal brain function and digestion. Glutamine is important in wound healing and recovery form an illness. When the body is stressed, it releases hormone cortisol into the bloodstream. This high concentration of cortisol will lower the body’s stores of glutamine. Other studies have shown that adding glutamine to enteral nutrition it will help reduce the rate of death in trauma and critically ill people. Clinical studies have found that glutamine supplements strengthen the immune system and reduce infections. Glutamine supplements also help in the recovery of severe burns. Another importance of glutamine is to protect the lining of the gastrointestinal tract known as the mucosa. People who have inflammatory bowel disease (IBD) may not have enough glutamine in their body. Two clinical trials found that taking glutamine supplements did not improve symptoms of Crohn’s disease. People with HIV or AIDs often experience severe weight loss, thus those people take glutamine supplements along with other nutrients including vitamin C and E, beta-carotene, selenium, and N-acetylcysteine to increase weight gain and help the intestines better absorb nutrients. Athletes who train for endurance events may reduce the amount of glutamine in their bodies, thus making them more prone to catch a code after an athletic event. Studies show that taking glutamine supplements resulted in fewer infections.

Glutamine and Cancer It has been shown that some cancer cells have an addiction to glutamine in that there is an increased rate of glutamine uptake. The increase in glutamine uptake is due to glutamine playing roles other than providing nitrogen for protein (amino acid) and nucleotide biosynthesis.

The first signs of cancer cells relying on an excess of a given compound to produce energy were discovered by Otto Heinrich Warburg. Warburg noticed that the energy produced in most cancer cells was produced through glycolysis of excess glucose, which is in turn converted into lactic acid during lactic acid fermentation. Such a process is in contrast with energy production in normal cells, in which glycolysis still occurs, but is instead followed by oxidation of pyruvate in mitochondria. As such, Warburg concluded that these cancer cells must have devolved into a more primitive form of metabolism as seen in single-celled eukaryotes. Thus this effect of cancer cells up taking excess glucose for their energy needs has been dubbed the "Warburg Effect". Glutamine was later found to mirror this effect in some tumor cells.

Glutamine has been shown to participate in signaling and uptake of essential amino acids. For instance, it is capable of acting as the substrate of the mitochondria to maintain the integrity of the mitochondrion membrane potential. It also plays integral roles in a variety of anaplerotic reactions.

Glutamine donates nitrogen to cancer cells. Like all cells, cancer cells must synthesize nitrogen compounds to produce nucleotides and other amino acids. Glutamine donates the nitrogen that is necessary for the production of these compounds. Glutamine donates its amide group and is converted into glutamic acid. Glutamatic acid transfers its amine group by transaminases to α-ketoacids which is used to generate the nonessential amino acids. This decompostion provides the nitrogen with several amino acids including alanine, serine, aspartate, and proline. Tyrosine is the only nonessential amino acids not produced from either glucose or glutamine.

  • Glutamine is Needed for the Uptake of Essential Amino Acids in Certain Cancer Cells and as a Molecular Signal
Glutamine is imported through glutamine solute carrier SLC1A5 and quickly exported through the SLC7A5 amino acid transporter in exchange for extracellular essential amino acids. However, when the glutamine importer is impaired, the uptake of essential amino acids is also impaired. Such impairment suggests that glutamine is necessary for essential amino acid uptake. Without essential amino acids, the rapamycin-sensitive (mTORC1) is not activated. mTORC1 plays an essential role in regulatin cell growth and protein translation as well as inhibiting macroautophagy. As such, inactivation of mTORC1 inhibits cellular growth and protein translation. Thus, glutamine acts as a signal to mTORC1 and as a resource of essential amino acids in some cancer cells.
  • Glutamine Provides Anaplerosis in Cancer Cells
Anaplerosis is a term used to describe the replenshing of the carbon pool in the mitochondrion. Oxaloacetic acid (OAA) is one of the substrates in mitochondria that eventually lead to synthesis of many essential biological macromolecules like cholesterol. In glioblastoma cells, glutamine metabolism provides the bulk of the OAA cellular pool. Thus, the increased rate of glutamine metabolism into OAA confirms glutamine as a primary substrate in cancer cells that provides the mitochondria with precursor macromolecules to carry out its metabolic functions.
  • c-Myc Regulate Glutamine Metabolism in Cancer Cells
The synthesis of purines and pyrimidine uses glutamine as a source of nitrogen in five enzymatic steps. Three out of the five steps are regulated by c-MYC (Myc), a DNA transcription factor. Oncogenic levels of Myc promote increased glutaminolysis at the transcription level and the metabolism of glutamine into lactic acid. The catabolism of glutamine provides cells with carbons for anaplerosis and NADPH production.
Myc is a transcription factor that codes for a protein that binds to DNA. In a cancerous cell, Myc is amplified. Myc uptakes glutamine and converts it to glutamic acid and lactic acid. Myc over expression leads to increased catabolism of glutamine, which leads to a larger amount of carbon in the cell, which allows the cell to produce more NADPH. This over-expression of Myc triggers the metabolic switch from glucose to glutamine as the source of carbon for the cell.
  • Glutamine-based cancer therapy
Glutamine addiction in some cancer cells is a target for new cancer therapies. Further research is needed to determine a non-toxic dosage; that is, a dosage that does not inhibit glutamine production indiscriminately and does so only in cancerous cells.
Since cancer cells are dependent on glutamine, starving these cells of glutamine will cause them to die. Thus, glutamine has become a target for new cancer treatments. New treatments have attempted to deny cancer cells their source of glutamine by reducing the amount of glutamine in the body. However, as glutamine is essential for many other processes in the body, such as synaptic communication in the brain, removing glutamine from the body is not a feasible treatment and is very dangerous. Other treatment methods have attempted to reduce the ability of the cell to uptake glutamine by targeting Myc and other proteins that are responsible for transporting glutamine into the cell. Other treatments have attempted to reprogram the mitochondria so that it will no longer depend on gluatmine. Another treatment involves targetting mTOR’s glutamine response. These treatments show more promise and less harm than removing all glutamine from the body.
These therapeutic methods target major glutamine activity in cancer cells:
  1. Glutamine uptake and mTOR activation: L-γ-glutamyl-p-nitroanilide (GPNA) inhibits SLC1A5, a target for Myc. Such inhibition suppresses glutamine uptake in the cell. 2-aminobicyclo-(2,2,1)heptanecarbozylic acid (BCH) also inhibits SLC7A5 and blocks mTOC activation, inducing autophagy.
  2. Glutamine-dependent anaplerosis and activity in mitochondria: Studies suggest that carbons derived from glutamine enter the citric acid cycle via transaminase. Therefore, Amino-oxyacetic acid (AOA), a transaminase inhibitor, shows potential as a promising cancer therapeutic. Additionally, the regeneration of mitochondrial NAD+ may prevent the entry of glutamine through the citric acid cycle. Metaformin, a biguanide class drug, inhibits this mechanism.


  1. Wound Healing
  2. Inflammatory Bowel Disease
  4. Obesity
  5. Peritonitis
  6. Athletes
  7. Cancer
  8. etc.

Glutamic acid - Glu/ E[edit]


Structure The molecular formula of glutamic acid is C5H9NO4. Its molecular mass is 147.13 g/mol. Also known as glutamate, Gluctamic acid is a polar amino acid that has carboxylic acid group, which loses a proton to become carboxylate group for physiological pH and has a negative charge; the carboxylic acid group of the amino acid has a pKa value of 4.3, which is a little basic than the terminal α-carboxyl group and that of aspartic acid. The pKa of glutamic acid is significantly higher than that of aspartic acid due to the inductive effect o the additional methylene group. In some proteins, due to a vitamin K dependent carboxylase, some glutamic acid will be dicarboxylic acids, referred to as γ carboxyglutamic acid, that form tight binding sites for calcium ion. Glutamic acid and α-ketoglutarate, an intermediate in the Krebs cycle, are interconvertible by transamination. Glutamic acid can therefore enter the Krebs cycle for energy metabolism, and be converted by the enzyme glutamine synthetase into glutamine, which is one of the key players in nitrogen metabolism.

Function Glutamic acid is highly involed in metabolism. In citric acid cycle, tranamination of alpha-ketoglutarate with alanine or aspartate each gives off glutamate and pyruvate or oxalatate respectively. Pyruvate and oxalatate formed fram transamination play critical roles in cellular metabolism.

Glutamic acid is a non-essential amino acid. It plays an important role in DNA synthesis. It also assists in wound and ulcer healing. Glutamic acid takes places in the excitatory neurotransmitter and the metabolism of sugars and fats. It aids potassium move through the blood-brain barrier. Glutamic acid is a source of fuel for the brain. It is capable to attach to amine group to form glutamine. The process of forming glutamine will detoxifies ammonia that the body contains.

Glutamic acid can be used in correcting personality disorders and treating childhood behavioral disorders. It also takes places in treating epilepsy, mental retardation, muscular dystrophy, ulcers, and hypoglycemic coma.

Other minor uses include flavor enhancer, GABA precursor, nutrients, and fertilizers for plants

Synthesis A biosynthesis of glutamic acid involves various schemes. The most common scheme is the conversion of glutamine to glutamic acid by adding water molecules with glutaminase as a helper enzyme. The side product is an ammonia group. Addition of water to a N-Acetylglutamic acid also produce glutamic acid and acetate. Ketoglutaric acid is another common precursor in synthesis of glutamic acid. Addition of NADPH ad ammonia or alpha amino acid produces glutamic acid. Such enzymes involved are glutamate dehydrogenase and transaminase. Other methods include 1-pyrroline-5-carboxylate + NAD+ + HOH and N-formimino-L-glutamate + FH4.

Glutamic acid is easily converted into proline. First, the γ carboxyl group is reduced to the aldehyde, yielding glutamate semialdehyde. The aldehyde then reacts with the α-amino group, eliminating water as it forms the Schiff base. In a second reduction step, the Schiff base is reduced, yielding proline.

Glycine - Gly/ G[edit]


Structure Glycine's molecular formula and mass are C2H5NO2 and 75.07 g/mol. Being the smallest amino acid out of all 20 amino acids, glycine only has a hydrogen atom as its substituent. For this reason, it has the ability to fit into tight spaces of molecules where no other amino acid could possibly fit therefore glycine is evolutionarily conserved. Most proteins contain small amount of glycine, however collagen is one of the exception that contains 35% glycine. Thus, if glycine were cleaved from an amino acid chain composing a whole protein, it would either alter the function of that protein, or denature it entirely. It is also the only achiral amino acid since its R group is simply a H atom. In particular it does not favor the helix formation.

Functions Glycione is non-essential amino acids meaning the human can manufacture it in their body. It serves an important role in maintaining central nervous and digestive systems. Glycine prevents the breakdown of muscle by increase creatine, which is a compound that helps build muscle mass. Glycine also keeps the skin firm and flexible. Without glycine, the skin can be damage from the UV rays, oxidation and free radical.

Glycine regulates blood sugar levels and helps provide glucose for the body.

Glycine serves as an inhibitory neurotransmitter in the central nervous system, especially in the spinal cord. When glycine binds to receptors, it activates chloride ion channels to open. As chloride ions enter the channels, the membrane becomes hyperpolarized, causing an inhibitory postsynaptic potential (IPSP).

Some disorders that can be treating using glycine is used for treating schizophrenia, stroke, benign prostatic hyperplasia (BPH), and some rare inherited metabolic disorders. It is also used to protect kidneys from the harmful side effects of certain drugs used after organ transplantation as well as the liver from harmful effects of alcohol. Other uses include cancer prevention and memory enhancement.

Some people apply glycine directly to the skin to treat leg ulcers and heal other wounds. The body uses glycine to make proteins. Glycine is also involved in the transmission of chemical signals in the brain, so there is interest in trying it for schizophrenia and improving memory. Some researchers think glycine may have a role in cancer prevention because it seems to interfere with the blood supply needed by certain tumors.

Biosynthesis Glycine is a derivative form of serine and 3-phosphoglycerate. The conversion of serine requires a specific enzyme called serine hydroxymethyltransferase and co-factor pyridoxal phosphate. The process can be simplied as the following reaction: serine + tetrahydrofoate -> glycine + N5, or N10-methylene tetrahydrofolate + water.

The reaction continues to carry out in the liver. Glycine synthase is used as enzyme in the conversion of N5, or N10-methylene tetrahydrofolate. In this reaction, carbon dioxides, ammonium, NADH, and protons transform the tetrahydrofolate molecule into glycine.

Degradation of glycine has three pathways. The most common pathway is the opposite of the previous reaction: conversion of glycine into a tetrahydrofolate molecule. Another pathway is the conversion of serine into pyruvate and serine dehydratase. The last pathway involves converting glycine to gloxylate by D-amino acid oxidase. This pathway leaves glycoxylate oxidized to oxalate.

Histidine - His/ H[edit]


Structure Histidine, C6H9N3O2, is also called 2-amino-3-(1H-imidazol-4-yl)propanoic acid. Its molecular mass is 155.15 g/mol. It is a basic, polar amino acid with an imidazole group, which is an aromatic ring that can be of positive charge and hydrophilic. The imidazole group of the amino acid has a pKa value of 6, which can be either uncharged or positively charged at neutral pH. This amino acid is often present in active sites of enzymes wherein the imidazole group acts as a buffer (proton acceptor or donor) for chemical reactions. Histidine is a precursor of histamine, a compound released by the immune system cells during an allergic reaction.

Features At a physiological pH of around 7, the Henderson-Hasselbalch equation can be used to give a ratio of deprotonation/protonation of the imidazole side chain (pKa = 6). As it turns out, the histidine side chain is approximately 10% protonated at a neutral pH. That is not a negligible amount and it gives the histidine residue a certain amount of buffering capacity. The basic nitrogen activates imidazole sites as a nucleophile.

Functions Histidine is found in high concentrations in hemoglobin. As a result, it aids in treatment of anemia and maintaining optimal blood pH. Also, histidine is the precursor of histamine, which is involved in local immune responses.

Histidine is an essential amino acid, which means that the body cannot manufacture it. Histidine plays important roles in stimulating the inflammatory response of skin and mucous membranes. It also stimulates the secretion of the digestive enzymes gastrin and acts as the source and control for histamine levels. Histidine is required for growth and for the repair of tissues, as well as the maintenance of the myelin sheaths that act as protector for nerve cells. Histidine is also required to manufacture both red and white blood cells. With histidine in the body, it helps protect the body from damage caused by radiation and in removing heavy metals from the body. Histidine is also in the stomach. It is helpful in producing gastric juices, and people with a shortage of gastric juices or suffering from indigestion, may also benefit from this nutrient. It is thought that histidine may be beneficial to people suffering from arthritis and nerve deafness. This is not conclusively proven. Histidine is also used for sexual arousal, functioning and enjoyment. Histidinemia is an inborn error of the metabolism of histidine due to a deficiency of the enzyme histidase, where high levels of histidine are found in the blood and urine, and may manifest in speech disorders and mental retardation. There are no reported side effects with histidine, but too high levels of histidine may lead to stress and mental disorders such as anxiety and people with schizophrenia have been found to have high levels of histidine. People suffering from schizophrenia or bipolar (manic) depression should not take a histidine supplement without the approval of their medical professional.

Metabolism Histidine can be converted into histamine by histidine decarboxylase. The carboxyl group leaves histidine.

Food sources Food that contain histidine are dairy, meat, poultry, fish, rice, wheat, and rye.

Isoleucine - Ile/ I[edit]


Structure Isoleucine, HOOCCH(NH2)CH(CH3)CH2CH3, is also known as a 2-amino-3-methylpentanoic acid and has a molar mass of 131.17 g/mol. Isoleucine is a nonpolar, aliphatic or hydrophobic amino acid that has two chiral centers for α-carbon atom and the R group. Isoleucine, because it contains two stereocenters, is a diastereomer. If it weren't for the selectivity of living things for one particular stereoisomer, there would be 4 possible stereoisomers because of the 2 chiral centers. However, only one version persists in living organisms: the 2S, 3S version. The structure stabilizes water-soluble proteins by hydrophobic effect.

Features Isoleucine cannot be distinguished by MS from leucine because of the simple fact that they have the same molecular weight. Instead, these two residues would usually have to be isolated and characterized by HPLC or TLC against known standards.

Isoleucine is also degraded into succinyl CoA and acetal CoA and consumed by TCA cycle.

Functions Isoleucine is an essential amino acid, meaning the human body cannot manufacture it. It is needed for the formation of Hemoglobin and to regulate blood sugar and energy levels. Isoleucine serves important roles in muscle strength and endurance and is a source of energy for muscle tissues.

Isoleucine promotes muscle recovery after an intense workout. Isoleucine is necessary for the formation of hemoglobin as well as assisting with regulation of blood sugar levels as well as energy levels. It is also involved in the formation of blood clots.

Symptoms of people with a deficiency of isoleucine may result in headaches, dizziness, fatigue, depression, confusion as well as irritability. Symptoms of deficiency may mimic the symptoms of hypoglycemia. This nutrient has also been found to be deficient in people with mental and physical disorders, but more research is required on this. Consuming higher amounts of isoleucine is not associated with any health risks for most people but those with kidney or liver disease should not consume high intakes of amino acids without medical advise. People who take in higher amounts of isoleucine report elevated urination. People involved with strenuous athletic activity under extreme pressure and high altitude may benefit from supplementation of this nutrient.

Food sources of isoleucine Food containing isoleucine are almonds, cashews, chicken, eggs, fish, lentils, liver, meat, etc.

Biosynthesis Synthesis of iseoleucine involves multiple steps. Isoleucine can be derived from pyruvate and ketoglutarate. Catalytic enzymes required are the followings:

1. Acetolactate synthase 2. Acetohydroxy acid isomemoreductase 3. Dihydroxyacid dehydratase 4. Valine aminotransferase

Industrially, isoleucine can be synthesized from 2-bromobutane and diethylmalonate.

Leucine - Leu/ L[edit]


Structure Leucine's molecular formula and mass are C6H13NO2 and 131.17 g/mol respectively. Leucine, also known as a 2-amino-4-methylpentanoic acid, has aliphatic R group. It is one of the three amino acids with branched hydrocarbon side chains (generally buried in folded proteins) and result as a nonpolar or hydrophobic amino acid. The hydrophobic effect counts for stabilization of water-soluble proteins.

Features Leucine cannot be distinguished by MS from isoleucine for the simple fact that they have the same molecular weight. Instead, these two residues would usually have to be isolated and characterized by HPLC or TLC.

Functions Leucine has all functions of the amino acid Isoleucine as their similarity in branched hydrocarbon side chain. Leucine facilitates skin healing and bone healing by modulating the release of natural pain-reducers, Enkephalins. It is also a precursor of cholesterol and increases the synthesis of muscle tissues by slowing down their degradation process. Leucine is an essential amino acid. It is essential in promoting growth in infant and regulating nitrogen concentration in adults. Leucine is generally used as a flavor enhancer.

Deficiency and Excess Deficiency of this particular amino acids can result in Hypoinsulinemia, Depression, Chronic fatigue syndrome, Kwashiorkor (or starvation), etc. Excess of Leucine leads to Ketosis.

Biosynthesis As an essential amino acid, leucine cannot be synthesized in human bodies, and must be obtained from outside sources. Starting from pyruvic acid, the conversion includes valine, ketovalerate, isopropylmalate, and ketoisocaproate via reduction. Enzymes required are: 1. acetolactate synthase, acetohydroxy acid isomeroreductase, dihydroxyacid dehydratase, isopropylmalate synthase and isomerase, and leucine aminotrasferase.

Lysine - Lys/ K[edit]


Structure Lysine is an essential amino acid. This means that is is necessary for human health but the body cannot produce it so you have to get the amino acid from food or supplements. Lysine are the building blocks of protein. Lysine has a positively charged amine group chain. The ε-amino group has a significant high pKa value of about 10.8, which is more basic than the terminal α-amino group. This basic amino group is highly reactive and participates in the reactions at the active center of enzymes. Although the terminal ε-amino group is charged under physiological condition, the hydrocarbon side chain with three methylene group is still hydrophobic.

Features Lysine is a naturally occurring essential amino acid in human body. It promotes optimal growth of infants and nitrogen equilibrium in adults.

Functions Lysine can be a treatment of Herpes Simplex and virus-associated Chronic Fatigue Syndrome as it inhibits viral growth. It facilitates the formation of collagen, which is the main component of fascia, bone, ligament, tendons, cartilage and skin. It also helps in absorption of calcium, which is critical in bone growth of infants.

Lysine is important for proper growth, and it plays an essential role in the production of carnitine, a nutrient responsible for converting fatty acids into energy and helping to lower cholesterol. Lysine helps the body absorb calcium, and it plays an important role in the formation of collagen, a substance important for bones and connective tissues including skin, tendon, and cartilage.

Herpes Simplex Virus (HSV) Consuming lysine on a regular basis may help prevent outbreaks of cold sores and genital herpes. Lysine has antiviral effects by blocking the activity or arginine, which promotes HSV replication. It has been studied that lysine at the beginning of a herpes outbreak did not reduce symptoms. Studies show that lysine with L-arginine makes bone building cells more active and enhances production of collagen. No studies have examined whether lysine helps prevent osteoporosis in humans.

Osteoporosis Lysine helps the body absorb calcium and thus decreases the amount of calcium that is lost in urine. Calcium is essential for strong bones so some researchers assumed lysine may help prevent bone loss associated with osteoporosis.

Deficiency and Excess Deficiency of lysine is seen in Herpes, Chronic Fatigue Syndrome, AIDS, Anemia, hair loss, and weight loss, etc. Having excessive lysine can result in high concentration of ammonia in the blood. Most people get enough lysine in their diet, although athletes, vegans who do not eat beans, as well as burn patients may need more. Not enough lysine can cause fatigue, nausea, dizziness, loss of appetite, agitation, bloodshot eyes, slow growth, anemia, and reproductive disorders. For vegans, legumes such as beans, peas, and lentils are the best sources of lysine.

Food Sources Foods rich in lysine are meat, cheese, fish, nuts, eggs, soybeans, spirulina, and fenugreek seed. Brewer's yeast, beans, and other legumes, and dairy products also contain lysine, Many nuts contain lysine.

Methionine - Met/ M[edit]


Structure Methionine is one of the two amino acids with side chain containing sulfur. It contains a largely aliphatic side chain that includes a thioether (-S-) group. Unlike Cysteine, the chemical linkage of the sulfur in methionine is thiol ether. This sulfur does not participate in covalent bonding like that of cysteine. The high inclination of the sulfur oxidation in methionine is one of the causes of smoking-induced emphysema in the human lung tissue.

Features Methionine is a naturally occurring essential amino acid, which plays a critical role in supplying free methyl groups and sulfur in metabolism. It is also one of only two amino acids coded for by a single codon.

Functions Methionine helps the breakdown of fat and reduces blood cholesterol levels. It is an antioxidant that neutralizes free radicals and removes waste in the liver. Synthesis of DNA and RNA requires the presence of Methionine. It is also a precursor of several critical amino acids, hormones, and neurotransmitters in human body. Its AUG codon also serves as a "start" signal for ribosomal translation of messenger RNA or mRNA; this means that every peptide chain began with an methionine residual at its N-terminal. It may however be removed later on by cleavage.

Deficiency and Excess Methionine deficiency can be seen in chemical exposure and vegetarians. Severe liver disease can result from having excessive methionine.

Phenylalanine - Phe/ F[edit]


Structure The amino acid phenylalanine is a derivative of alanine wherein a phenyl group takes the place of one of the hydrogens on the CH3 group. Phenylalanine has stronger hydrophobic properties when compared to the other aromatic amino acids, i.e. tyrosine and tryptophan. Tyrosine and tryptophan are less hydrophobic than phenylalanine due to their hydroxyl and indol substituents. Phenylalanine is often found buried in the proteins due to its hydrophobicity. Neighboring phenyl rings (on adjacent amino acids) can stabilize each other by pi stacking.

Features Individual amino acids as well as peptides are occasionally analyzed by UV light. Phenylalanine, along with the few other aromatic amino acids, fluoresces when UV light is applied. UV light can be a useful technique for verifying the presence of Tyr, Phe, and Trp. It can also quantify those amino acids if a sensitive enough assay is developed.

Functions Phenylalanine is a precursor of the amino acid tyrosine, which gives rise to neurotransmitters, such as dopamine, norepinephrine and epinephrine. It can be used to manage certain types of depression as a powerful anti-depressant and can also enhance memory, thought, and mood. This amino acid also plays a role in decreasing blood pressure in hypertension. The D form of phenylalanine can be used to reduce pain in arthritis which is a rare instance among amino acids. Phenylalanine is a naturally occurring amino acid that promotes growth in infants and regulates nitrogen concentration in adults.

Deficiency and Excess Deficiency of Phenylalanine can be seen in depression, AIDS, obesity, Parkinson's Disease, etc. Some people have a autosomal recessive genetic disorder called phenylketonuria, or PKU. This disorder is due to the lack of an enzyme that breaks down phenylalanine amino acids, which leads to a large accumulation of this amino acid, and in large quantities, phenylalanine is toxic, particularly to the brain. This leads to the possibility of mental retardation from this disorder. As a result, babies were blood tested early for signs of PKU, and if they have it then they must follow a strict diet that reduces the amount of natural phenylalanine in the food.

Proline - Pro/ P[edit]


Structure Proline is one of the twenty DNA-encoded amino acids. It is unique among the 20 protein-forming amino acids because the α-amino group is secondary rather than primary as other amino acid. The distinctive cyclic structure of proline side chain locks its φ backbone dihedral angle at approximately -75°, giving proline an exceptional conformational rigidity compared to other amino acids. Hence, proline loses less conformational Entropy upon folding, which may account for its higher prevalence in the proteins. Proline, strictly speaking, can also be referred to as an imino acid. It greatly influences protein architecture because of its ring structure that makes it more conformationally restricted than the other amino acids.

Functions Proline behaves as a structural disruptor in the middle of regular secondary structure elements. However, proline is commonly found as the first residue of an alpha helix and in the edge strands of beta sheets. Proline is most commonly found in turns, which may account for the curious fact that proline is usually solvent-exposed although it has a completely aliphatic side chain. Because proline lacks of hydrogen on the amide group, it cannot act as a hydrogen bond donor, only as a hydrogen bond acceptor. Proline is important in healing, cartilage building, and in flexible joints and muscle support. It also helps reduce the sagging, wrinkling, and aging of skin resulting from exposure to the sun. Proline by breaking down protein and helps create healthy cells. It is essential both to skin health, and for the creation of healthy connective tissues and also muscular tissue maintenance.

Deficiency and Excess Proline deficiency is generally caused by people who perform prolonged exercises. Vitamin C deficiency will also cause proline to be lost in the urine because of collagen breakdown. Generally, people's body with proline deficiency tends to metabolize muscle cells instead of carbohydrates first if glucose levels are low. Proline is needed to maintain proper collagen creation and stabilize muscular tissue as well. The lack of proline could lead to symptoms such as fatigue, weight loss, dehydration, dizziness, and nausea.

Serine - Ser/ S[edit]


Structure This amino acid's R group is a hydroxyl group attached to a CH2 group. The hydroxyl group is polar giving serine polar/hydrophilic properties. It has a pH of 5.68. pKa = 2.21, 9.15.

Features Serine is a non-essential amino acid which means it can be synthesized by the human body. For instance, serine can be synthesized from glycine. Serine is a precursor of glycine and cysteine.

Biosynthesis The biosynthesis of serine begins with the oxidation of 3-phosphoglycerate (an intermediate in glycolosis) to 3-phosphohydroxypyruvate which is then transaminated to 30phosphoserine. This last intermediate is then hydrolyzed to serine.

Function Serine is a non-essential amino acid which means it can be synthesized by the human body. For instance, serine can be synthesized from glycine. Serine is also a precursor of glycine and cysteine. Serine is found in phospholipids, active sites of trypsin and chymotrypsin. It can synthesize pyrimidines and proteins, cysteine and tryptophan. It is also involved in fat and fatty acid formation, muscle synthesis. Serine can be deaminated by the catalyst serine dehydratase, yielding to pyruvate and ammonium. The deamination of threonine follows a similar process.

Threonine - Thr/ T[edit]


Structure Threonine is a polar, uncharged amino acid. Its side chain contains a secondary alcohol and a methyl group; hence it can be characterized as a hydrophilic amino acid. Threonine incorporates two chiral centers, just like isoleucine. If it weren't for the selectivity of living things for one particular stereoisomer, there would be 4 possible stereoisomers because of the 2 chiral centers. However, only one version persists in living organisms: the 2S, 3R version.

Features Threonine is an essential amino acid, which means it cannot be synthesized by the human body. Humans must ingest it in the form of threonine-containing foods.

Functions Threonine aids the formation of elastin and collagen. In the immune system, threonine aids in the formation of antibodies. It also promotes growth and function thymus glands and absorption of nutrients. In addition, threonine is the precursor to isoleucine. Threonine can be deaminated by the catalyst threonine dehydratase, yielding to α-ketobutyrate and ammonium. The deamination of Serine follows a similar process.

Tryptophan - Trp/ W[edit]


Structure Tryptophan is an amino acid of aromatic group of an indole group bonded to a methylene group as the side chain, which is of two aromatic rings of nitrogen and hydrogen group and is hydrophilic. One of the side chains is 5-membered while the other is 6, and 2 carbons are shared by both aromatic rings.

Features Individual amino acids as well as peptides are occasionally analyzed by UV light. Tryptophan, along with the few other aromatic amino acids, fluoresces when UV light is applied. UV analysis can be a useful technique for verifying the presence of Tyr, Phe, and Trp. It can also quantify those amino acids if a sensitive enough assay is developed.

Functions Tryptophan is the precursor for various proteins, serotonin and niacin. It also promotes the formation of peptides and proteins. It is an essential amino acid, meaning it cannot be produced by the human body. It is usually present in peptides, enzymes, and structural proteins.

Deficiency and Excess Excess tryptophan has been linked with eosinophilia-myalgia syndrome (EMS). A deficiency of tryptophan is known as Pellagra which causes a deficiency of niacin. However, with vitamin supplements, this disease is no long as prominent. Symptoms of the disease include dementia and schizophrenia. Hartnup Disease is a genetic autossomal recessive disease in which a person cannot effectively digest this amino acid in their digestive tract. Although the disease of experiences symptoms similar to those of pellagra, however being slightly less severe. Patients suffering from the disease are generally seen with red rashes that are further aggravated by UV light from the sun. Further mental retardation could occur if not treated correctly with vitamin supplementation.

Tyrosine - Tyr/ Y[edit]


Tyrosine Tyrosine is a nonpolar aromatic amino acid that contains a hydroxyl group attached to an aromatic ring. The hydroxyl group is particularly important because these residues are utilized in the phosphorylation of other proteins. Tyrosine is a non essential amino acid meaning it can be synthesized in the body. It is synthesized using phenylalanine in the body.

Features Individual amino acids as well as peptides are occasionally analyzed by UV light. Tyrosine, along with the few other aromatic amino acids, fluoresces when UV light is applied. UV light can be a useful technique for verifying the presence of Tyr, Phe, and Trp. It can also quantify those amino acids if a sensitive enough assay is developed.

Functions Tyrosine plays crucial roles in the human body: It helps deal with stress by becoming an adaptanogen helps minimize effects of the stress syndrome, in drug detoxification such as for cocaine, coffee and nicotine addictions. It reduces withdrawals and abuse. It assists in treating Vitiligo, pigmentation of skin, Phenylketonuria, the condition where phenylalanine is not metabolized. In addition, it is effective for depression treatment.

Tyrosine is also important in the production of epinephrine, norepinephrine, serotonin, dopamine, melanin, and enkephalins, which has pain-relieveing effects in the body. It also affects the function of hormones by regulating thryoid, pituatary and adrenal glands. For example, one need only look at the thyroid hormone thyroxine to see that it is synthesized from tyrosine. Tyrosine is known to dislodge molecules that may be harmful to cells, therefore it has protective qualities.

Deficiency and Excess Deficiency of tyrosine can result in low blood pressure, depression, and low body temperature. Tyrosine is a major amino acid responsible for skin, hair, and eye pigments. A loss of tyrosine amino acid in the body may lead to failure to form melanin pigments, resulting partial or full albinism. Interestingly enough, Tyrosine is produced mainly from phenylalanine in which a loss of one would lead to the increase of the other amino acid present in the organism's body.

Valine - Val/ V[edit]


Structure Valine is an amino acid with an aliphatic, isopropyl side chain and is therefore a hydrophobic amino acid. Valine differs from threonine in that the OH group of threonine is replaced by a CH3 group. This is a nonpolar amino acid. It is an essential amino acid; therefore it cannot be produced by the human body. Being hydrophobic, this amino acid is often found in the interior of proteins.

Features In animals, valine must be ingested. In plants, it is created by using pyruvic acid, converting it to leucine followed by the reductive amination with glutamate. Valine is found in the following foods: soy flour, fish, cheese, meat and vegetables.

Functions Valine is essential in muscle growth and development, muscle metabolism, and maintenance of nitrogen balance in the human body. It can be used as an energy source in place of glucose. It can also be used as a treatment for brain damage caused by alcohol.

Deficiency and Excess Deficiency of valine affects myelin sheets of nerves. Maple Syrup Urine Disease is caused because leucine, valine and isoleucine cannot be metabolized.

Ionization of amino acids[edit]

The 20 standard amino acids have two acid-base gorups: the alpha-amino and the alpha-carboxyl groups attached to the Cα atom. Those amino aicds with an ionizable side-chain (Asp,Glu,Arg,Lys,His,Cys,Tyr) have an additonal acid-base group. At low pH (i.e. high hydrogen ion concentration) both the amino group and the carboxyl group are fully protonated so that the amino acid is in the cationic form H3N+CH2COOH. As the amino acid in solution is titrated with increasing amounts of a strong base (e.g. NaOH), it loses two protons,first from the amino group which has the higher pK value (pK=9.6). The pH at which Gly has no net charge is termed its isoelectric point, pI. The α-carboxyl gorups of all the 20 standard amino aicds have pK values in the range 1.8-2.9, while their α-amino groups have pK values in the range 8.8-10.8. The side-chains of the acidic amino acids Asp and Glu have pK values of 3.9 and 4.1, respectively, whereas those of the basic amino acids Arg and Lys, have pK values of 12.5 and 10.8, respectively. Only the side-chain of His,with a pK value of 6.0, is ionized within the physiological pH range (6-8). It should be borne in mind that when the amino aicd are linked together in proteins, only the side-chain groups and the terminal α-amino and α-carboxyl gorups are free to ionize.

Pyridoxal 5’-Phosphate-Mediated Decarboxylation of an �-Amino Acid[edit]

Step 1: The amino acid reacts with enzyme-bound pyridoxal 5�-phosphate (PLP). An imine linkage (CoeN) between the amino acid and PLP forms, and the enzyme is displaced.

Step 2: When the pyridine ring is protonated on nitrogen, it becomes a stronger electron-withdrawing group, and decarboxylation is facilitated by charge neutralization.

Step 3: Proton transfer to the � carbon and abstraction of a proton from the pyridine nitrogen brings about rearomatization of the pyridine ring.

Step 4: Reaction of the PLP-bound imine with the enzyme liberates the amine and restores the enzyme-bound coenzyme.


Berg, Jeremy, Tymoczko J., Stryer, L.(2012). Protein Composition and Structure.Biochemistry(7th Edition). W.H. Freeman and Company. ISBN1-4292-2936-5

Berg, Jeremy M., ed. (2002), Biochemistry (6th ed.) New York City, NY: W.H. Freeman and Company,

Hames, Daivd, Hooper,Nigel. Biochemistry, 3rd edition. Taylor and Francis Group. New York. 2005.

Wise R, David; Thompson B, Craig “Glutamine addiction: a new therapeutic target in cancer” Trends in Biochemical Sciences 35 (2010) 427—433. Retrieved 2010-10-16. Wise DR. Thompson CB., "Glutamine Addiction: a new therapeutic target in cancer". Trends Biochem Sci. 2010 Aug; 35(8):427-33.

"Chemistry of Health", US Department of Health and Services, NIH Publications,Reprinted 2006 is required for


The total chemical synthesis of a D-Enzyme experiment was conducted by R. C. deL. Milton, S.C. F. Milton, and S. B. H. Kent, which found enzyme enantiomers exhibiting reciprocal chiral specificity on peptide sequences. The concept of L-configuration of amino acids predominates in living organisms while the D-configuration remains biologically inactive; Milton et al. examined the ability of enzymes to distinguish and react with a specific enantiomer over the other.


The following properties of D-HIV PR and L-HIV PR were analyzed: covalent structure, physical properties, circular dichroism spectra, and enzymatic activity. After the total synthesis of D-HIV PR and L-HIV PR, the new synthesized L- and D- sequences of HIV PR were initially protected and then deprotected to allow the folding of their secondary and tertiary structures. The second method used reversed-phase high-performance liquid chromatography which resulted to identical retention rates of the two polypeptide sequences. It was further examined by ion-mass spectroscopy that both polypeptide sequences had the same molecular weight. This method found that both the D-HIV PR and L-HIV PR sequences had the same covalent structure. Despite having the same covalent structure between D-HIV PR and L-HIV PR, differences arise within its chiral features; using a circular ion spectra proved the expected equal but opposite optical activity of the enantiomers. Within a fluorogenic assay containing a hexapeptide analog of a GAG cleavage site was used as a substrate to test the reactivity of the enantiomers. Both enzymes were equally active, yet exhibited reciprocal chiral specificity; reciprocal chiral specificity was apparent when L-enzyme degraded only the L-substrate and D-enzyme degraded only the D-substrate. In addition, reactivity of the D-HIV PR and L-HIV PR were further tested with enantiomers of an inhibitor called MVT101. As expected its corresponding enzyme determined the effectiveness of the inhibitor; L-MVT101 inhibited L-HIV PR but not D-HIV PR, and D-MVT101 inhibited D-HIV PR but not L-HIV PR. The folding of the polypeptide chains into the three-dimensional structure holds importance to the specificity and catalytic activity of HIV-1 protease. D-HIV PR and L-HIV PR displayed mirror images of each other within the secondary, supersecondary, tertiary, and quaternary structure. In the primary structure, only one chiral amino acid was introduced in the synthesis of the polypeptide chain for D-HIV PR and L-HIV PR; the consequence of this one chiral amino acid in the polypeptide backbone resulted to mirror images of the secondary, supersecondary, tertiary, and quaternary structures.


The results of this experiment conclude that the two configurations of the enantiomer are reactive and should be reactive in vivo, yet due to evolution the L-proteins are prevalent in living organisms while D-proteins are biologically inactive.


del. Milton, R. C, S.C.F. Milton, and S.B.H Kent. "Total Chemical Synthesis of a D-Enzyme: The Enantiomers of HIV-1 Protease Show Demonstration of Reciprocal Chiral Substrate Specificity."Science. 256. (1992): 1445-1448. Print. Nitrogen Fixation, or rather, the fixing of Nitrogen, is a process where N₂ is reduced into NH₃, either biologically or abiotically. The nitrogen in amino acids, pyrimidines, purines and other molecules all come from the N₂ in our atmosphere. The fixing of nitrogen can also be associated with the conversion of nitrogen into other forms, other than ammonia, such as nitrogen dioxide. The triple bond that is present in N₂ is very strong; it has a bond energy of 940 kJ/mol. Yet, it is thermodynamically favorable to form ammonia from hydrogen and nitrogen, yet the reaction is still very difficulty kinetically speaking since intermediates can prove to be unstable. It has been estimated that approximately 60 percent of the newly fixed nitrogen on Earth is produced by diazotrophic microorganisms, while lightning and ultraviolet radiation contribute another 15 percent and the rest 25 percent is done by industrial processes.

Nitrogen Fixation[edit]

The main avenue for entry of nitrogen into the biosphere is nitrogen fixation. In the nitrogen fixation, we basically fix the dinitrogen, or nitrogen gas into ammonia. Also, fixation of nitrogen requires lots of energy because the triple bond of nitrogen gas is stable. However, breaking the triple bond to generate ammonia requires a series of reduction steps involving high input of energy. Biologically speaking, the conversion of nitrogen into ammonia is usually done by bacteria and archaea. These organisms that are responsible for nitrogen fixation are called diazotrophic microorganisms. For example, the symbiotic Rhizobium bacteria, a diazotrophic microorganism, goes into the roots of leguminous plants to form root nodules where they fix nitrogen. Other examples include Cyanobacteria, Azotobacteraceae, and Frankia. Industrial Processes of Nitrogen Fixation include Dinitrogen complexes, Ambient Nitrogen reduction, and the most common process is the Haber process, invented in 1910. The Haber process involves high pressure, high temperatures, possibly an iron or ruthenium catalyst to produce ammonia. Nitrogen Fixation, in the biological sense, is run by an enzyme called nitrogenase. The reason why the nitrogenase complex is used is because it has multiple redox centers. In general though, nitrogenase complex contains two proteins. The first, a reductase, which provides electrons while the second part, nitrogenase, uses these electrons to turn nitrogen into ammonia. The transferring of electrons, from reductase to nitrogenase, in this process is coupled with the hydrolysis of ATP by the reductase. The reaction for this process is N2 + 8 H+ → 2 NH3 + H2. The reason why this process is an 8 electron process and not simply a 6 electron process is due to the extra mole of Hydrogen that gets generated along with the generation of the ammonia. Often the microorganisms that carry out nitrogen fixation, contain the 8 electrons from the reduced form of Ferredoxin, which can be made from photosynthesis or oxidative processes. Also, this process is coupled by two ATP molecules for each mole, which in turn, equals 16 molecules. The reason for this is not that the ATP hydrolysis is making the reduction thermodynamically favorable since the process is already thermodynamically favorable, but rather allows the reaction to be kinetically possible. Nitrogen fixing bacteria generally separate anaerobic nitrogen fixation from aerobic metaboism by one of several mechanisms. In the ocean and in the freshwater systems, cyanobacteria are the major nitrogen fixers. Within an ecosystem, nitrogen fixers ultimately make the reduced nitrogen available for assimilation by nonfixing microbes and plants. Besides, nitrogen fixation is extremely energy intensive; thus the rate of fixation usually fails to meet the potential demand of other members of the ecosystem.


Berg, Tymoczko, Stryer, Biochemistry Sixth Edition

Slonczewski, Joan L. Microbiology. "An Evolving Science." Second Edition.


When there are unneeded amino acids from either protein digestion or turnover, they are broken down into certain compounds. This process usually occurs in the liver.

In amino acid degradation the amino group is removed and turned into an α-ketoacids which is then modified so that the carbon chain could enter the metabolism and eventually become glucose or intermediates of the citric acid cycle.

Amino Acid Degradation[edit]

The amino group is transferred to α-ketoglutarate which forms glutamate. Then the glutamate is oxidatively deanimated to form the ammonium ion NH4+

Aminotransferases catalyzes the reaction that turns the α-amino group from an α-amino acid to an α-ketoacid. These enzymes catalyze α-amino groups from a variety of amino acids to α-keto-glutarate for conversion to NH4+

Aspartate aminotransferase, catalyzes the transfer of the amino group of aspartate to α-ketoglutarate.

Alanine aminotransferase catalyzes the transfer of the amino group of alanine to α-ketoglutarate.

The nitrogen from the α-ketoglutarate in the transamination reaction is converted into an ammonium ion by oxidative deamination. This reaction is catalyzed by glutamate dehydrogenase. This enzyme is special in that it is able to utilize either NAD+ or NADP+. The reaction dehydrogenates the C-N bond, and then hydrolyses of the Schiff base to make a ketoglutarate

The equilibrium for this reaction favors glutamate. But the reaction can be pushed forward by the consumption of ammonia. Glutamate dehydrogenase is found in the mitochondria. This compartmentalization prevents interaction with ammonia. In vertebrates, the activity of glutamate dehydrogenase is allosterically regulated.

NH4+ is converted into urea, which is then excreted as waste.


To synthesize amino acids, there must be a source of nitrogen that is in a form that can be easily used. Various microorganisms reduce inert nitrogen gas into two molecules of ammonia to provide for this source of nitrogen. On the other hand, the carbon backbone can be provided in three different ways--these include the citric acid cycle, the glycolytic pathway, and the pentose phosphate pathway.

Since amino acids are all chiral except for glyciene, biosynthesis of amino acides must generate the correct isomers efficiently. This is done by transamination reactions and high regulation of biosynthetic pathways, through feedback and other mechanisms.

Nitrogen Fixation

To reduce atmospheric nitrogen gas (N2) to ammonia (NH3, a process called nitrogen fixation, microorganisms require ATP. Nitrogen fixation is performed by nitrogenase complex, an enzyme that has many centers for redox. This enzyme is composed of a reductase and nitrogenase. The reductase provide electrons while the nitrogenase uses these electrons, reducing atmospheric nitrogen to ammonia in the following reaction:

N2 +8 e- + 8 H+ <--> 2 NH3 + H2

Most microorganisms that are capable of nitrogen fixation carry out this reaction by generating a reduced ferredoxin through photosynthesis, providng the electrons. Two molecules of ATP are then used to transfer each electron, meaning that 2x8=16 electrons are needed to generate the two molecules of ammonia. The total reaction for this can then be written as:
N2 +8 e- + 8 H+ + 16 ATP + 16 H2O <--> 2 NH3 + H2 + 16 ADP + 16 Pi
Then, through the amino acids glutamine and glutamate, ammonium ion (NH4+)is assimilated.


Of the 20 amino acids, humans can synthesize 11 of them. These amino acids are referred to as nonessential amino acids. The remaining 9 amino acids are referred to as essential amino acids, and they must be provided for in the diet. Synthesizing the 11 nonessential amino acids require different intermediates, but one fact remains common among them--the gycolytic pathway, the citric acid cycle, and the pentose phosphate pathway provide intermediates that their carbon skeletons come from. Also, in all these amino acids, the same step ensures the correct chirality. This step is in a transamination reaction, and a quinonoid intermediate is protonated, forming an external aldimine. The direction the proton comes from dictates the amino acid's chirality.

Regulation by Feedback

The rate of amino acid biosynthesis depends on the amount of enzymes present and the activity of those enzymes. However, there are other ways of regulating the biosynthesis of amino acids.

Feedback Inhibition

The first reaction that is irreversible in the biosynthesis of amino acids is referred to as the committed step, and the feedback loop of amino acid synthesis is a negative one, with the product inhibiting the catalyst to the committed step. This indicates that the biosynthesis of amino acids is regulated by a negative feedback loop. There are variety of different feedbacks that regulate the synthetic pathway.

Branched Pathways

Branched pathways are more complex in that they involve more sophisticated regulation. They can involve both positive and negative feedback. In other words, reactions have both feedback inhibition and feedback activation. An example of this is the enzyme threonine deaminase. This enzyme converts threonine to alpha-ketobutyrate, and valine activates this process, while isoleucine inhibits it.

Branched pathways may also involve enzyme multiplicity, a phenomenon in which multiple enzymes regulate or catalyze one single reaction. These enzymes may all have different activities and different regulatory mechanisms. Lastly, in cumulative inhibition, multiple proteins are capable of inhibiting one enzyme's activity. Even if the inhibited enzyme is saturated with one protein, other inhibiting proteins can still continue to reduce its activity. An example of this is the cumulative feedback inhibition of glutamine synthetase in E. coli.

Enzymatic cascade is another form of regulation in branched pathways. An enzymatic cascade is a reaction that requires successive steps of enzymatic catalysis after initiation. The advantages of this process is that it can amplify signals and highly increase allosteric control. This is due to the fact that requiring different enzymes basically combines multiple regulations of the enzymes, so that the process, in entirety, will have all these regulations occurring. This extends the potential for more efficient accruing of nitrogen in the cell.

So What?

Why is the biosynthesis of amino acids important? Amino acids are not only the basic building blocks to all peptides and proteins. A wide variety of biomolecules are also derived from amino acids. Examples of these include the purine and pyrimidine bases in DNA and RNA, a vasodilating protein called histamine, the hormone thyroxine, and the hormone epinephrine, to name a few. Amino acids are also a part of other compounds in the body, such as buffers, antioxidants, and enzymes. Another molecule formed from amino acids is nitric oxide (NO). Nitric oxide is derived from arginine, and serves as a messenger in signal transudction.

As amino acids are involved in the synthesis of so many proteins and compounds within the body, lack of amino acids therefore has its consequences. Various inherited disorders may occur as a result of lack of a certain amino acid, or a certain compound derived from amino acids. An example is porphyrias. Thisdisorder may be inherited or acquired during one's lifetime, and it is due to a deficiency of heme pathway enzymes.

Source: Berg, Jeremy and Stryer, Lubert. Biochemistry: Fifth Edition. United States of America: W.H. Freeman and Company, 2002.


The formation of amyloid fibrils, protofibrils, and oligomers from β-amyloid peptides have been very crucial for the research of the disease, Alzheimers. However, determning the structures of these peptides has been a struggle. In the past five years, there has been new data obtained about these structures through electron cryo-microscopy and NMR which has enhanced scientists' understanding of a certain mechanism, Aβ aggregation and has paved new pathways of relevance of specific conformers in terms of neurodegenerative pathologies.

Structural diversity of β-amyloid aggregates[edit]

The β-amyloid (Aβ) peptide resides inside the human brain as a proteolytic fragment of the amyloid precursor protein, with an amphiphilic structure, possessing a hydrophilic N- and hydrophobic C terminus. The two most studied Aβ alloforms are Aβ(1-40) and aβ(1-42), where they contain 40 and 42 residues, respectively. More than 10 single-site sequence variants have been connected to similar forms of Alzheimer's disease. These alloforms are important because since Aβ amyloid fibrils form the center of amyloid plaques inside the brain parenchyma, they are correlated to Alzheimer's disease. Scientists have been trying to determine the structure of these alloforms, but they cannot be isolated or easily purified within the laboratories. Thus, there is no reliable structural information of Aβ amyloid fibrils. This provides a challenge for scientists who need this structural information to understand their biological properties.

Cross-β structure of Aβ amyloid fibrils[edit]

Amyloid fibrils are fibrillar polypetide aggregates with a cross-β structure. In cross-β structures, the β-sheet plane ad the backbone hydrogen bonds connecting the β-strands are positioned parallel to the axis while the β-strands run perpendicular to the axis. Further study of these structures showed that these peptides hve things called steric zippers. Steric zipper are composed of a pair of two cross-β sheets with interlacing side chains. They're formed by many short peptide chins, like Aβ residues 37-42 or 35-40. Also, steric zipper's structure is similar to that of the spine of amyloid fibrils.

General topology and polymorphism of mature amyloid fibrils[edit]

TEM (transmission electron microscopy) and atomic force microscopy have observed that mature amyloid fibrils have a length greater than 1 um, whereas previously analyzed fibrils were thought to have a length of about 25 nm. Mature Aβ amyloid fibrils have one or more protogilaments. Amyloid protofilaments create the substructures of mature fibrils, found by TEM to show that these fibrils are twisted left-handed with polarity. Studying thes structures shows that there's a structural feature of structural polymorphism of amyloid fibrils. Structural polymorphism is the variability in peptide conformation of fibrils3D reconstructions of polymorphic amyloid fibrils have revealed that fibrils differ in:

(i) number of protofilaments
(ii) different internal protofilament substructures
(iii) relative protofilament orientation

In addition to structural polymorphism (or inter-sample polymorphism), study of Aβ fibril samples with single particle techniques has shown that there is a lot of intra-sample polymorphism. Such as, an analysis of Aβ(1-40) fibrils created in 50mM sodium borate with a pH of 9 has revealed variations in the fibril width (13 to 29 nm); however, most fibrils demonstrate crossover distances of 100 to 200 nm. Thus, there is a wide range of morphologies, especially when fibrils are grown under sodium or potassium chloride (buffer systems).

Structural deformations report on the nanoscale flexibility properties of amyloid fibrils[edit]

Structural deformation is another cause for heterogeneity of amyloid samples besides polymorphism. These deformations bend and twist themselves and although these can create more potential problems for structural analysis, they can be used to understand anoscale mechanical properties of amyloid fibrils.

Structural methods for studying amyloid fibrils[edit]

Atomic structures of full-length Aβ fibrils have not been found because:

There have not been any fibril that creates a crystal suitable for X-ray crystallography
The fibrils are too large for NMR techniques.

However, solid-state NMR and cryo-EM have been found to possibly determin ethe structure of Aβ smyloid fibrils at atomic resolution.
Solid-state NMR can determine structural constraints like chemical shift values, bond angles, and/or specific interatomic distances. Thus, identification of residues of Aβ amyloids interconnecting with the β-sheet structure of fibrils.
Cryo-EM can visualize the structure of the fibrils and can calculate the 3D density. Thus, the observation of individual fibrils can determine specific fibril morphologies.

Protofilament structure of mature Aβ fibrils[edit]

The protofilament substructure of an Aβ fibril has been found by cryo-EM. The protofilaments have cross-sectional dimensions of 4 x 11 nm and a cross-sectional subdivision of quasi twofold symmetry (4 x 5 nm) with two peripheral regions. Aβ(1-40) fibril contains two protofilaments and Aβ(1-42) fibril contains only one protofilament. The single-protofilament in Aβ(1-42) fibril has two equally shaped peripheral regions, fully solvent-exposed and structurally disordered. In contrast, the two-rotofilament Aβ(1040) fibril has an arch-shaped peripheral region at the protofilament-protofilament interface. The other peripheral region is the one that is solvent-exposed and structurally disordered.

Structural comparison of Aβ(1-40) and Aβ(1-42)[edit]

The Aβ(1-40) peptide is more pathogenic than the Aβ(1-40) peptide. For example, when it is expressed in Drosophila melanogaster, the Aβ(1-40) peptide is very toxic and halves the life-span of the animal; however, Aβ(1-40) don't present a discernible phenotype.Although of this difference, their chemical properties are pretty similar (the first 40 residues are identical) which leads to similarities in their conformation proerpties. Some of the differences include the Aβ(1-40) peptide having additional two C-terminal residues and the higher aggregation propensity of Aβ(1-40).Also, Aβ(1-40) can affect aggregation mechaisisms of Aβ(1-40) and thus prevents formation of matue Aβ(1-40) fibrils.
According to cryo-EM of these two peptides, it shows differences in their protofilament packing. Aβ(1-40) fibrils have eiher a single-protofilament arrangement or a two-protofilament arrangement with a hollow core. But, all in all, the protofilaments of these two fibrils are pretty similar. For example, they can both produce the same mPL values, cross-sectional areas and shapes, and the cross-sections of the protofilaments have a similar division at the one central and two peripheral regions. Thus, they have similar peptide folding. Also, according to IR and NMR data, they both have concluded that both fibrils have a parallel β-sheet structure.


Fandrich, Marcus, and Matthias Schmidt, and Nikolaus Grigorieff. "Recent Progress in understanding Alzheimer's β-amyloid structures ." Trends in Biochemical Sciences 36.6 (2011) 338-345. Academic Search Complete. Web. 21 Nov. 2012.

General Information[edit]

Proteins are important organic compounds that serve as structural elements, transportation channels, signal receptors and transmitters, and catalysts; they are the most versatile macromolecules found in living organisms. Protein compositions are made up of one or more polypeptides which are composed of combinations that are derived from the 20 different amino acid subunits. These polypeptides are linear polymer chains of amino acids that are bonded together by a peptide bond that is formed between carboxyl and amino groups of adjacent amino acid residues. Each amino acid has its own size, shape, and set of properties, and proteins have 50 to 2,000 amino acids connected end-to-end in many different combinations (Structures of Life 3). Proteins can have different functionalities and roles in the body due to many different possible structures and shapes. One specific characteristic of proteins is that only the L isomers of amino acids are found in nature and used in protein. There is no evidence that explains why this happens. Proteins have many different active functional groups attached to them to help define their properties and functions. Proteins perform a number of important functions, ranging from acting as very rigid structural elements to transmitting information between cells. In addition, complex assemblies are formed due to proteins reacting with each other and with other macromolecules. Proteins fold into secondary, tertiary, and quaternary structures based on the intramolecular bonding between functional groups and can take on a variety of three-dimensional shapes depending on the amino acid sequence.

One example of a protein is collagen, a fibrous structural protein that is the most abundant protein found in animals. The structure of collagen consists of a triple helix and consists of mainly three polypeptide chains held together by hydrogen bonds, similar to that of DNA's double helix. This structure of collagen was determined using the method of X-ray crystallography. There are several important properties that enable proteins to perform a variety of crucial functions.

1. LINEAR POLYMERS: Proteins are built out of monomer units (amino acids): Based on the sequence of amino acids, proteins spontaneously fold up into three-dimensional structures.

2. CONTAIN A WIDE RANGE OF FUNCTIONAL GROUPS: Proteins contain functional groups such as alcohols, thiols, thioethers, carboxylic acids, etc. These functional groups are key to the variety of functions the protein can perform.

3. PROTEIN INTERACTION FOR COMPLEX ASSEMBLIES: Within complex assemblies, proteins act synergistically in order to achieve a specific function.

4. STRUCTURE: Proteins vary in flexibility. Rigid units of a protein can function as structural elements in the cytoskeleton of cells or in connective tissue. Protein structure Is divided into four categories and is a crucial element in the specificity of protein function.

Proteins are usually portrayed in 3D structures. They are usually categorized into 4 different characteristics and levels:

A picture of primary structure of protein.

Primary: The primary structure of a polypeptide is its amino acid sequence, from beginning to end. The primary structures of polypeptides are determined by encoding genes. Genes carry the information to make polypeptides with a defined amino acid sequence. An average polypeptide is about 300 amino acids in length, and some genes encode polypeptides that are a few thousand amino acids long.

Secondary: The amino acid sequence of a polypeptide, together with the laws of chemistry and physics, cause a polypeptide to fold into a more compact structure. Amino acids can rotate around bonds within a protein. This is the reason proteins are flexible and can fold into a number of shapes. Folding can be irregular or certain regions can give a repeating folding pattern. Such repeating patterns are called secondary structures. The two types are the α-helix and β-sheet. In an α-helix, the polypeptide backbone forms a repeating helical structure that is stabilized by hydrogen bonds. These hydrogen bonds occur at regular intervals and cause the polypeptide backbone to form a helix. In a β-sheet, regions of the polypeptide backbone come to lie parallel to each other. When these regions form hydrogen bonds, the polypeptide backbone forms a repeating zigzag shape called a β-sheet.

One type of secondary sturcture, an alpha helix.
Another type of secondary structure, a beta sheet.

Tertiary: As the secondary structure becomes established due to the primary structure, a polypeptide folds and refolds upon itself to assume a complex three-dimensional shape called the protein tertiary structure. The tertiary structure is the three-dimensional shape of a single polypeptide. For some proteins, such as ribonuclease, the tertiary structure is the final structure of a functional protein. Other proteins are composed of two or more polypeptides and adopt a quaternary structure.

Quaternary: Most functional proteins are composed of two or more polypeptides that each adopt a tertiary structure and then assemble with each other. The individual polypeptides are called protein subunits. Subunits can be identical polypeptides or can be different. When proteins consist of more than one polypeptide chain, they are said to have quaternary structure and are also known as multimeric proteins, meaning many parts. These proteins bind in a specific shape through interactions such as hydrogen bonding, salt bridges, and disulfide bonds. The two major structure categories of proteins are fibrous and globular. An example of a fibrous protein is keratin, which is found in wool, hair, myosin and actin in muscles, fur, nails, and fibrinogen for blood clotting. Examples of a globular protein include insulin, hemoglobin, and most enzymes.

A picture of Hemoglobin, one of the most well-known quaternary structure of protein.

Factors that influence protein structure:[edit]

Several factors determine the way that polypeptides adopt their secondary, tertiary and quaternary structures. The amino acid sequences of polypeptides are the defining features that distinguish the structure of one protein from another. As polypeptides are synthesized in a cell, they fold into secondary and tertiary structures, which assemble into quaternary structures for most proteins. As mentioned, the laws of chemistry and physics, together with amino acid sequence, govern this process. Five factors are critical for protein folding and stability:

1. Hydrogen Bonds

2. Ionic bonds and other polar interactions

3. Hydrophobic Interactions

4. Van der Waals forces

5. Disulfide bridges

Protein Recognition[edit]

Protein functions such as molecular recognition and catalysis depend on their complementary binding sites. They also depend on specialized microenvironments that result from protein's tertiary structure. Such specialized microenvironments at binding sites eventually contribute to catalysis. Binding sites have a diverse distribution of charges which allow the substrates to bind.

Protein Denaturing[edit]

As temperature is risen, a protein starts to denature.

Upon addition of heat, proteins begin to denature. Denaturation occurs in the tertiary and secondary structures. If denaturation occurs, this could lead to protein inactivity, or even cause the cell to die and no longer function.

The reason that heat is able to cause the protein to denature is because it disrupts the bonds due to the rapid vibrations that it causes in these molecules.

Heat effects the tertiary and secondary structures. The primary structure of a protein is just peptide bonds, and heat is not strong enough to break these peptide bonds, so heat doesn't have an effect on the primary structure.

Protein Hormones[edit]

Leptin and Insulin[edit]

Hyperphagia as well as elevated levels of insulin and leptin are found in obesity although leptin is supposed to be a feeding inhibitor while lowering insulin levels and suppressing insulin production. Leptin may not be functioning as predicted due to the correlation found that Hyperphagia may cause leptin resistance. This could be related to insulin resistance as well. Leptin is a strong modulator of biochemical pathways and metabolic fluxes which in turn causes a redistribution of glucose fluxes. Research suggests that if leptin secretion at an early time due to overeating may have a correlation with obesity and glucose intolerance. Over feeding decreases the rate of glucose infusion needed to maintain regular glucose levels. Due to this, the intake of carbohydrates was drastically altered because after 7 days of over eating the rate of glucose intake was decreased. Over feeding drastically decreased insulin’s inhibition of glucose production. Voluntary over feeding decreases the extent to which leptin affects food consumption. In an experiment with over fed rats and rat control group, this was proved by injecting leptin to both groups. The group of over fed rats had no response to the leptin therefore their food intake did not decrease but the control group was seen to have the expected outcome of leptin. In the control group, the leptin functioned as expected and inhibited food intake. The increase in body mass due to increase in food intake may be related to causing insulin resistance as well as early increase in glucose production during hyperphagia. Therefore, it is proved that the increase in food consumption plays a role in the paralysis/ decrease of the leptin system and a decreased action of insulin on carbohydrate metabolism.


Matthew D. Shoulders and Ronald T. Raines. "Collagen Structure and Stability" "Quaternery Protein." Elmhurst College: Elmhurst, Illinois. Web. 12 Nov. 2011. <>. Here is a summary for the primary structure of a protein: I. Primary Structure: 1. It is a sequence of amino acids. 2. It is a linear polymer: linking the alpha-cacboxyl group of one amino acid to the alpha amino group of another amino acid => PEPTIDE BOND (covalent bond). 3. In some proteins, the linear polypeptide chain is cross-linked: Disulfide bonds.

The primary structure is a polypeptide, in which:

     + each amino acid in the peptide is a residue. 
     + there is a regularly repeating segment called the main chain or backbone,and a variable part, comprised of the side chain.

Primary Structure[edit]

The primary structure of a protein is a linear polymer with a series of amino acids. These amino acids are connected by C-N bonds, also known as peptide bonds. The formation of peptide bonds produce water molecules as a by-product when an amino terminal residue (N-terminal) loses an oxygen from the alpha-carboxyl group while the other amino acid loses two of its hydrogens from its alpha-amino group. Thus, polypeptide, or polypeptide chain, is a term that describes the multiple connected peptide bonds between numerous amino acids. Each amino acid in a polypeptide chain is a unit, commonly known as a residue. These chains have a planar backbone, as the peptide bonds have double bond characteristics due to the existence of resonance between the carbonyl carbon and the nitrogen where the peptide bonds form. The primary structure of each protein has been precisely determined by the specific genes. The C-N bond in an amino acids chain has the character of a double bond. This bond has a short length and stable. It cannot be rotated. This double-bond character can be explained structurally, in that the R groups in amino acid chains avoid steric clash.

Amino acids are linked by peptide bonds to form polypeptide chain; each amino acid unit is known as a residue; a polypeptide chain constructed by the same unit is known as the main chain or backbone and a changing R group, side chains.

Forces that stabilize Protein Structure[edit]

Protein structures are governed primarily by hydrophobic effects and by interactions between polar residues and other types of bonds. The hydrophobic effect is the major determination of original protein structure. The aggregation of nonpolar side chains in the interior of a protein is favored by the increase in Entropy of the water molecules that would otherwise form cages around the hydrophobic groups. Hydrophobic side chains give a good indication as to which portions of a polypeptide chain are inside, out of contact with the aqueous solvent. Hydrogen bonding is a central feature in protein structure but only make minor contributions to protein stability. Hydrogen bonds fine tune the tertiary structure by selecting the unique structure of a protein from among a relatively small number of hydrophobically stabilized conformations. Disulfide bonding can form within and between polypeptide chains as proteins fold to its native conformation. Metal ions may also function to internally cross link proteins.

Factors that cause denaturing[edit]


2) pH

Extreme temperatures will result in the unfolding of a polypeptide chain leading to a change in structure and often a loss of function. If the protein functioned as an enzyme denaturing will cause that protein to lose its enzymatic activity. As the temperature of a solution containing the protein is raised, the extra heat causes twisting and bending of bonds. As proteins begin to denature the secondary structure of the protein is lost and adopts a random coil configuration. Covalent interaction between amino acid side chains such as disulfide bonds are also lost.

At high or low pH levels the protein will denature due to the lose or gain of a proton and, therefore, will lose their charge or become charged, depending on which way the pH is changed and by how much. This will eliminate many of the ionic interactions that were necessary for maintenance of the folded shape of the protein. As a result the change in structure will cause a change or loss of function.

Determination of Primary Structure: Amino Acid Sequencing[edit]

After the polypeptide has been purified, the composition of the polypeptide should be established. To determine which amino acid and how much of each is present, the entire strand is degreaded by amide hydrolysis (6N HCl, 1100C, 24hr) to produce a mixture of all free amino acid residues. The mixture is separated and its composition recorded by amino acid analyzer. The amino acid analyzer establishes the composition of a polypeptide by giving a chromatogram, which records the peaks of each amino acid presents in the sequence. However, the amino acid analyzer can only give the composition of a polypeptide, not the order in which the amino acids are bound to one another.

To determine the amino acid sequence, it usually starts from the determination of the amino terminal of the polypeptide. The procedure is known as Edman degradation, and the reagent employed is phenyl isothiocyanate.

Phenyl isothiocyanate

In Edman degradation, the terminal amino group adds to the isothiocyanate reagent to produce a thiourea derivative. Treating with mild acid, the tagged amino acid is turned into a phenylthiohydantoin, and the remainder of polypeptide is unchanged. Since the phenylthiohydantoins of all amino acid are known, the amino terminal of the original polypeptide can be identified easily. However, Edman degradation can only be used to identify the amino end of the polypeptides; therefore, for polypeptides that are made up by hundreds of amino acids, it is not a practical method in general. In addition, multiple degradation rounds will build up impurities which will seriously affect the yield of peptide. High yield means not completely quantitative, and with each step of degradation, incompletely reacted peptide will mix with the new peptide, resulting in a intractable mixture.


In other words, secondary structure refers to the spatial arrangement of amino acid residues that are nearby in the sequence. The alpha helix, and beta strands are elements of secondary structure.

Secondary Structure[edit]

Secondary structures of proteins are typically very regular in their conformation. They are the spacial arrangements of primary structures. Alpha Helices and Beta Pleated Sheets are two types of regular structures. An interesting bit of information is that certain amino acids making up the polypeptide will actually prefer certain folding structures. The Alpha Helix seems to be the default but due to interactions such as sterics, certain amino acids will prefer to fold into Beta pleated sheets and so on. For example, amino acids such as Valine, Isoleucine, and Threonine all have branching at the beta carbon, this will cause steric clashes in an alpha helix arrangement. Glycine is the smallest amino acid and can fit into all structures so it does not favor the helix formation in particular. Therefore, these amino acids are mostly found where their side chains can fit nicely into the beta configuration.

The structure of polypeptide main chains is mostly of hydrogen-bonding; each residue has a carbonyl group that is a good hydrogen- bond acceptor; nitrogen- hydrogen group, a good hydrogen- bond donor.

Alpha helix look like the outside of structure. + Right hand appeared in right bottom of Rachamanda plot often

+ Left hand (LOOP): rare on the left top of Ramacha plot

Alpha Helix[edit]


The general physical properties of an alpha helix are:

Alpha helix project outward in helical array
Ribbon displaying the backbone of the alpha helix
  • 3.6 residues per turn
  • Translation (rise) of 1.5 A
  • Rotation of 100 degrees
  • Pitch (or height) of 5.4A (1.5A*3.6 residues)
Alpha helix with hydrogen bonds
  • Screw sense = clockwise (usually) because it would be less sterically hindered
  • Inside the helix consist of the coiled backbone and the side chains project outward in helical array
  • Hydrogen bonding between the 1st carbonyl to the hydrogen on the 4th amino
  • The shorthand drawing of the alpha helix is a ribbon or rod
Ribbon shorthand notation for the alpha helix
Ramachandran diagram
  • Alpha helix falls within quadrant 1 (left-handed helix) and 3 (right-handed helix) in the Ramachandran diagram

Supersecondary Structure of Alpha Helix[edit]

Fibrous Proteins[edit]

I. COILED-COIL (α-keratin)

An alpha coiled coil consists of two or more alpha helices intertwined, creating a stable structure. This structure provides support to tissues and cell, contributing to the cell cytoskeleton and muscle proteins such as myosin and tropomyosin. Alpha keratin consists of heptad repeats (imperfect repeats of 7 amino acid sequences). This facilitates bonding between the two or more helices.


Collagen is another type of fibrous protein that consists of three helical polypeptide chains. It is the most abundant protein found in mammals, making up a large component of skin, bone, tendon, cartilage, and teeth. Wrinkles are also caused by the degradations of this protein. In the structure of collagen, every third residue in the polypeptide is glycine because it is the only residue that is small enough to fit in the interior position of the superhelical cable. Unlike normal alpha helices, each collagen helix is stabilized by steric repulsion of the pyrrolidine rings of the praline and hydroxyproline residues. However, the three strands intertwined are stabilized by hydrogen bonding.

Alpha Tertiary[edit]


Motifs are simple combinations of the secondary structure such as the helix-turn-helix, which consist of two helices separated by a turn. The helix-turn-helix motif are usually found in DNA-binding proteins.


Domains, or compact globulars, consist of multiple motifs.They are polypeptide chains folded into two or more compact regions connected by turns or loops. Their structure is spherical, which is beneficial for the protein because it conserves space. Generally, inside the globular protein consist of hydrophobic amino acids such as leucine, valine, methionine, and phenylalanine. The outside consists of amino acids with hydrophilic tendencies such as aspartate, glutamate, lysine, and arginine. An example of a globular protein is myoglobin, which is the oxygen carrier in muscle. It is an extremely compact molecule made of only alpha helices (70%) except for loops and turns (30%).

Transmembrane and Non-Transmembrane Hydrophobic Helix[edit]

Studying the topography of transmembrane and non-transmembrane helix have helped answer many questions about membrane protein insertion. Specifically, studying the sequence and lipid dependence of the topography provide insights into post-translational topography changes. Furthermore, studying topography has lead to the design of hydrophobic helices that have biomedical applications. For example, a tumor marker called pHLIP peptide has been designed.

Different tests have been used to show the various effects on the hydrophobic helices. For example, hydrophilic residues such as tryptophan and tyrosine destabilize the transmembrane state. The hydrophilic domains cannot cross the membrane so it blocks any transmembrane and non-transmembrane equilibration. Furthermore, charged ionized residues also destabilize the transmembrane state. Stabilization of the transmembrane is also achieved in helix-helix interaction. Moreover, anionic lipids promote membrane binding of hydrophobic peptides and proteins.

Alpha helices, beta strands, and turns are formed by a regular pattern of hydrogen bonds between the peptide N-H and C=O groups of amino acids that are NEAR ONE ANOTHER IN THE LINEAR SEQUENCE. Such folded segments are called secondary structure.

====Summary==== The alpha-helix consists of a single polypeptide chain in which the amino group (N-H) hydrogen bonds to a carboxyl group (C=O) 4 residues away. The alpha - helix is a rod-like structure. The tightly coiled backbone of the chain forms the inner part of the rod and the side chains extend outward in a helical array. This results in a clockwise coiled structure, which is known as a "right handed" screw sense. This folding pattern, along with the beta-pleated sheets were actually proposed by Linus Pauling and Robert Corey half a decade before people could actually see it. Most of the alpha strands are located in the lower left corner or upper right corner of the Ramachandran diagram . Essentially, most of the alpha helices are found in the right-hand helices area. An alpha helix is especially suited for cross-membrane proteins because all of the amino hydrogen and carbonyl oxygen atoms of the peptide backbone can interact to form intrachain hydrogen bonds while its aliphatic side chains can stabilize in hydrophobic environment of cell membrane.

Alanine, leucine and glutamic acid (existed as glutamate as physiological pH) are the most common residues present in alpha-helices.

The alpha-helix content of protein ranges widely, from none to almost 100%.

In general, the alpha helix is the "normal" shape of a polypeptide chain; however, features of certain amino acids disrupt alpha helix formation and instead favor beta strand formation. Amino acids with branching at the beta carbon (i.e. valine, threonine, and isoleucine) are problematic because they crowd the peptide backbone. H-bond accepting/donating groups attached to the beta carbon (i.e. serine, asparagine, and aspartate) can bond with backbone amine and carboxyl groups, again interfering with alpha helix formation.

While individual amino acids may favor one form or another, predicting the 2° structure of even a short (<7 amino acid) peptide strand is only 60-70% accurate. Such variability suggests other factors, like tertiary interactions with amino acids further down the chain, influence the folding into its observed 3° structure.

Beta-strand is: 1. It is around ʊ = 120 and ϕ = -120 2. You have the angle, and you form the zigzag. The zigzag have the distance between amino acids is 3.5 Angstron

Beta Pleated Sheet[edit]

In contrast to the alpha helical structure, Beta Sheets are multiple strands of polypeptides connected to each other through hydrogen bonding in a sheet-like array. Hydrogen bonding occurs between the NH and CO groups between two different strands and not within one strand, as is the case for an alpha helical structure. Due to its often rippled or pleated appearance, this secondary structure conformation has been characterized as the beta pleated sheet. The beta strands can be arranged in a parallel, anti-parallel, or mixed (parallel and anti-parallel) manner.

Anti-parallel Beta Strand

The anti-parallel configuration is the simplest. The N and C terminals of adjacent polypeptide strands are opposite to one another, meaning the N terminal of one peptide chain is aligned with the C terminal of an adjacent chain. In the anti-parallel configuration, each amino acid is bonded linearly to an amino acid in the adjacent chain.

Parallel Beta Strand

The parallel arrangement occurs when neighboring polypeptide chains run in the same direction, meaning the N and C terminals of the peptide chains align. As a result, an amino acid cannot bond directly to the complementary amino acid in an adjacent chain as in the anti-parallel configuration. Instead, the amino group from one chain is bonded to a carbonyl group on the adjacent chain. The carbonyl group from the initial chain then hydrogen bonds to an amino group two residues ahead on the adjacent chain. The distortion of the hydrogen bonds in the parallel configuration affects the strength of the hydrogen bond because hydrogen bonds are strongest when they are planar. Therefore, due to this distortion of hydrogen bonds, parallel beta sheets are not as stable as anti-parallel beta sheet (exp: formation of parallel beta sheet with less than 5 residues is very uncommon).

The side chains of beta strands are arranged alternately on opposite sides of the strand. The distance between amino acids in a beta strand is 3.5A which is longer in comparison to the 1.5A distance in alpha strands. Because of this, beta sheets are more flexible than alpha helices and can be flat and somewhat twisted. The average length of beta sheets in a protein is 6 amino acid residues. The actual length ranges from 2 to 22 residues.

Ramachandran Plot: Beta strands are found in the purple region

Beta sheets are graphically found in the upper left quadrant of a Ramachandran plot. This corresponds to ψ angles of 0 to 180 and Φ angles of -180 to 0.

The schematic model of beta sheets

Visual representations in 3D models for beta sheets are traditionally denoted by a flat arrow pointing in the direction of the strand.

Loop is everything, but what is alpha helix and beta-strand does. It is related to SECONDARY structure of protein.

Turn and Loop[edit]

Polypeptide chains can change direction by making reverse turns and loops. Alpha helices and beta strands are connected by these turns and loops. Most proteins have compact, globular shape owing to reversals in the direction of their polypeptide chains, which allows the polypeptide to create folds back onto itself. In many reverse turns, the CO group of residue i of a polypeptide is hydrogen bonded to the NH group of residue i+3. A turn helps to stabilize abrupt directional changes in the polypeptide chain. Loops are more elaborate chain reversal structures that are rigid and well defined. Loops and turns generally lie on the surfaces of proteins so they often participate in interactions between proteins and other molecules. In a loop, there are no regular structures as can be found in helices or beta strands.

Two hypotheses have been proposed for the role of turns in protein folding. In one view, turns play a critical role in folding by bringing together interactions between regular secondary structure elements. This view is supported by mutagenesis studies indicating a critical role for particular residues in the turns of some proteins. Also, nonnative isomers of X-Proline peptide bonds in turns can completely block the conformational folding of some proteins. In the opposing view, turns play a passive role in folding. This view is supported by the poor amino-acid conservation observed in most turns. Also, non-native isomers of many X-Pro peptide bonds in turns have little or no effect on folding.

Beta Hairpin Turns[edit]

A motif is when secondary structure elements combine in specific geometric arrangements. Beta hairpin turns are one type of arrangement; they are one of the simplest structures and then are found in globular proteins. Upon turning, the antiparallel strand can bind effectively through hydrogen bonding between the carbonyl carbon and the peptide backbone nitrogen. It has been shown that 70% of beta-hairpins are less than seven residues long; the majority being 2 residues long. There are two types of two-residue beta hairpin turns. The first, Type I, forms a left-handed alpha-helical conformation. This left-handed conformation has a positive phi angle due to the properties of the aforementioned amino acids. Glycine does not have a side chain to sterically interfere with the turned amino acid sequence. Asparagine and aspartate both readily form hydrogen bonds with the carbonyl oxygen as a hydrogen bond acceptor. The second amino acid in the Type I turn is usually glycine due to steric hindrance that would result using any amino acid with a side chain. In a Type II beta hairpin turn, the first residue can only be glycine due to steric hindrance. However, the second residue is usually polar, such as serine or threonine.

Fibrous proteins[edit]

Fibrous protein such as alpha-keratin and collagen consist of two right handed alpha helix intertwined to form a type of left handed super-helix called an alpha coiled coil. The two helices in this type of protein usually cross-linked by weak interaction such as Van der Waals forces force and ionic interaction. The side chain interaction can be repeat every seven residues, forming heptad repeats. Another form of fibrous protein, that of collagen, exists as three helical polypeptide chains. These chains are relatively long, ~1000 residues, and because of overcrowding, glycine appears once every three residues. While the helix is stabilized by the steric repulsions, the three strands are stabilized by hydrogen bonding. These protein usually serve structural roles in organisms, alpha-keratin is commonly found in the cytoskeleton of a cell, as well as certain muscle proteins. Collagen is often found in teeth, skin, and tendons.

Secondary Structure Prediction[edit]

The science of predicting what polypeptide chain will conform to which secondary structure group (alpha-helix, beta-sheet/strand or turns/loops) is not particularly exact. However, various frequencies of secondary structure formation of certain amino acids have been recorded in actual scientific experimentation, and these values can allow scientists to predict the folding of a protein based on its amino acid composition with about 60-70% accuracy. Stretches of six or less residues can usually be predicted with this accuracy. Although, certain amino acids tend to fold in its preferred conformation, there are of course exceptions and so secondary structure prediction is not always accurate. Tertiary interactions, interactions with residues further apart from each other, can also determine the folding structures. Each amino acid has a preference for either secondary structure, but it normally is only a small preference towards one in comparison to another, therefore, this unfortunately does not mean much. Amino acids can appear in an alpha-helix in one protein and also in a beta-sheet in another. Due to the unpredictability of the secondary structure based on the sequence of amino acids, secondary structures are being analyzed and predicted in relations to a similar family of sequences.

Various techniques have risen throughout history in the study of secondary structural prediction. With the aid of computers, prediction has been a pursued research topic in bioinformatics and many approaches continue to be proposed. After Linus Pauling and Robert Corey discovered the periodic alpha helix and beta sheet structures within proteins in 1951, further elucidation of protein structure prediction began to grow. A major method in secondary structure prediction was the Chou-Fasman method; it yielded a 50-60% accuracy. This method based its predictions on assigning a set of prediction values to a certain amino acid residue and then applied an algorithm to that value. Shortly after, further improvements were made on this method, the GOR method, which was developed in the late 1970s and utilized information theory|entropy and information concepts for secondary structure prediction. When devised, the method was about 65% accurate, however, improvements have also been made to it. There are deductive techniques in which similar sequences are found in already identified proteins. This method is accomplished by having computer software search databases of identified proteins. Opposite of that would be the Ab initio method, which builds 3-dimensional models without looking at similar residue sequences. This method is based on hydrogen bonding principals and localization.

Other methods and factors of folding prediction include analyzing the basic chemical tendencies of the side chains of amino acids to determine its preference in secondary structure. The alpha-helix is taken as the default structure, thus amino acids that destabilize alpha-helices are often found in beta-pleated sheets or loops and turns. For instance, valine, threonine, and isoleucine will often destabilize the helix because of branching of the beta carbon. These three amino acid residues are more often found in beta-pleated sheets, where their side chains will lie in a separate plane than the main chain. There are also amino acid residues that prefer neither alpha-helices nor beta-pleated sheets, for example, Proline has a restricted phi angle of ~60 degrees and no NH group, all due to the fact that it is cyclic. This will disrupt both alpha-helices and beta-pleated sheets, thus is found mostly in loops and turns. A counter-intuitive example is glycine which, according to its small size, theoretically can fit in any structure easily, but in reality it tends to avoid alpha-helices and beta-sheets also. The folding definitely also relies on chemical interactions between the side chains so the surrounding amino group interactions also affect the tendency of folding. These tendencies are reflected in the frequencies of secondary structure for individual amino acids.

The relative tendencies of secondary structures for particular amino acids are listed below:

alpha-helix: Glu, Ala, Leu, Met, Lys, Arg, Gln, His

beta-sheet: Val, Ile, Tyr, Cys, Trp, Phe, Thr

turns and loops: Gly, Asn, Asp, Pro, Ser

Torsion Angles[edit]

Torsion angles are also called dihedral angles. The torsion angle is the measure in degrees in bonds between atoms. Folding of proteins are influenced by the degree of rotation amino bonds can hold. There are two different types of torsion angles existing in polypeptide bonds. Psi, ψ is the angle between the α-carbon and the nitrogen atom of a peptide bond. The other bond is called phi, φ which is the angle between the α-carbon and the carbonyl group. To measure φ, one must look from the nitrogen atom towards the α-carbon to measure if the angle is negative or positive. The angle is negative if the α-carbon rotates counterclockwise and vice versa. Furthermore, to measure ψ, one must look from the nitrogen atom towards the carbonyl group. Likewise, the angle is negative if the carbonyl group rotates counterclockwise and vice versa.

Phi Psinatalie

Ramachandran Diagram[edit]

The Ramachandran Diagram, created by Gopalasamudram Ramachandran, helps to determine if amino acids will form alpha helices, beta strands, loops or turns. The Ramachandran Diagram is separated into four quadrants, with angle ϕ as the x axis and angle ψ as the y axis. The combinations of torsion angles will put the amino acids in specific quadrants, which determine whether it will form an alpha helix, beta strand, loop, or turn. Those that fall in quadrants 1 and 3 a few times in a row form alpha helices, and those that repeat in quadrant 2 form beta strands. Quadrant 4 is generally disfavored because of steric hindrance. Also, it is mostly impossible because the different torsion angles combinations in quadrant 4 can't exist because they cause collisions between the atoms of the amino acids. If the amino acids land in the different quadrants, with no repeats, then they become loops or turns. Furthermore, the principle of steric exclusion states that two atoms cannot occupy the same place simultaneously.

MYOGLOBIN is one of example of tetriary structrue. It is oxygen carrier in muscle is a single polypeptide chain of 153 amino acids. The capacity of myoglobin to bind oxygen depends on the presence of HEME, a nonpolypeptide PROSTHETIC group consisting of protoporhyrin IX and a central iron atom. Myoglobin is an extremely compact molecule.

Tertiary Structure[edit]

The tertiary structure of a protein is the three-dimensional structure of the protein. This three-dimensional structure is mostly determined by the amino acid sequence, which is denoted by the primary structure of the protein, however the amino acid sequence cannot entirely predict on how the three-dimensional structure is formed. Another contributing factor to the final shape of the tertiary structure is based on the environment in which the protein is synthesized. The tertiary structure is stabilized by the sequence of hydrophobic amino acid residues in the backbone of the protein. The interior consists on hydrophobic side chains while the surface consists of hydrophilic amino acids that interact with the aqueous environment.

Tertiary structure is formed by interactions between side chains of various amino acids - in particular disulfide bonds formed between to cysteine groups. At this stage, some proteins are complete, while other proteins incorporate multiple polypeptides subunits which creates the quaternary structure.

Nucleation-condensation model- The tertiary folding process is very structured with key intermediates. When a protein starts to fold, localized areas of the protein first begin folding. Then, the individual localized folds come together to complete the tertiary structure. The key concept is that when a correct fold is achieved, that fold is retained until all other parts of the protein are also correctly folded. This folding process follows reason because a random trial and error folding process would not only take much more time to complete, but also would require much more input energy.

Tertiary structure refers to the spatial arrangement of amino acid residues that are far apart in the sequence and to the pattern of disulfide bonds. Tertiary structure is also the most important protein structure that is used in determining the enzymatic activity of proteins.


A lobster's exoskeleton is not an example of keratin (it is made of chitin, a polysaccharide).
A dog's fur is an example of keratin.

Cysteine, an amino acid containing a thiol group, is responsible for the disulfide bonds that hold a tertiary structure together. In the tertiary structure, when two helices come together, they may be linked by these disulfide bonds. A tertiary structure with fewer disulfide bonds form less rigid structures that are flexible, but still strong and can resist breakage such as hair and wool. While tertiary structures that contain more crossed disulfide bonds, formed by cysteine residues, lead to stronger, stiffer and harder structures such has exoskeletons. Others examples of protein that contain more disulfide bonds include claws, nails, and horns.

A structure made of two a-helices such as keratin can be found in living organisms. Immunoglobulin, also known as antibodies, is an example of an all beta-sheet protein fold. It consists of approximately 7 anti-parallel beta-strands arranged in 2 beta-sheets. For instance, if a cysteine is mutated to another amino acid it can code to a different protein which would lead to incorrect folding.


Some polypeptide chains fold into several compact regions. These regions in a polypeptide chain are called domains and generally range from 30 to 400 amino acids. On average, domains contain roughly 100 amino acids. Each domain forms its own tertiary structure which contributes to the overall tertiary structure of the protein. These domains are independently stable. Stabilization is caused by metal ions or disulfide bridges that cause the folding of polypeptide chains. Different proteins may have the same domains even if the overall tertiary structure is different.

There are four types of domains:

  • All-α domains - Domains made purely from α-helices.
  • All-β domains - Domains made purely from β-sheets.
  • α+β domains - Domains made both of α-helices and β-sheets.
  • α/β domains - Domains made from both α-helices and β-sheets layered in a β,α,β fashion with a α-helix sandwiched in between 2 β-sheets.


In order for a protein to be functional (except in food), it must have an intact tertiary structure. If a tertiary structure of a protein is disrupted, it is said to be denatured. Once a protein is denatured, it will not be able to perform its intended or original function. A primary cause for an alteration of the tertiary structure is a mutation in the gene encoding a protein. The mutation in the gene can cause a domino effect that will lead to the degradation of the tertiary structure. Degradation can cause several diseases, one of which is called cystic fibrosis. Cystic fibrosis is brought about by a mutation of a genes called cystic fibrosis transmembrane conductance regulator (CFTR). This disease causes the exocrine glands to overproduce mucus. Most commonly, CF patients suffer from lung failure by the age of early 20-30. Diabetes insipidus, familial, hypercholesterolemia, and Osteogenesis imperfecta are also diseases that originate from degraded proteins. A mutation in the tertiary structure itself, rather than from a mutation in the nucleotide sequence can also lead to diseases. Such mutated proteins can also aggregate and become insoluble deposits called amyloids, and therefore lose the ability to function. A common mutation is when a hydrophobic R group folds in, rather than out, in a hydrophobic environment. The inherited form of Alzheimer's disease is one disease that is caused by mutated tertiary structure. Another disease is "mad cow" disease, which is caused due to a-helix (which are soluble) mutating into b-sheets (which are insoluble and cause amyloid deposits). [7]


The folding of a protein is dependent on the amino acid sequence laid out in the primary structure. It is also dependent on the environment in which the folding occurs. In a hydrophobic environment, the hydrophobic side chains of the amino acids of the protein fold out while the hydrophilic side chains fold in and vice versa for a hydrophilic environment. An example of a protein that is folded in a hydrophobic environment is Porin. Its hydrophilic side chains are folded in which creates a channel for water to pass through. Amino acids that have nonpolar/hydrophobic side chains such as leucine, valine, methionine, phenylalanine, and isoleucine would be folded out in the folding of the protein in a hydrophobic environment. Likewise, in a hydrophilic environment, amino acids with polar side chains such as glutamine and asparagine fold outwards and the hydrophobic side chains would fold inwards.

Determination of Tertiary Structure[edit]

The tertiary structure of a protein is determined through X-Ray Crystallography and Nuclear Magnetic Resonance (NMR) Spectroscopy. X-ray Crystallography was the first method used to determine the structure of proteins. X-ray crystallography is one of the best methods because the wavelength of an x-ray is similar to that of covalent bonds found throughout proteins, creating a clearer visualization of a molecule's structure. The scattering of x-rays by electrons is analyzed to determine the structure of proteins. In order to use x-ray crystallography, the protein in question must be in crystal form. Some proteins crystallize readily, while others do not. For those proteins that do not crystallize readily, nuclear magnetic resonance (NMR) spectroscopy must be used to determine its structure. NMR spectroscopy uses the spin of nuclei with a magnetic dipole and chemical shifts to determine a molecule’s relative position.

Hemoglobin is one of example of quaternary structure. HEMOGLOBIN: the oxygen-carrying protein in blood, consists of two subunits of one type (designated alpha) and two subunits of another (designated beta).

Quaternary Structure[edit]

Atomic structure of the 50S Subunit from Haloarcula marismortui. Proteins are shown in blue and the two RNA strands in orange and yellow.[11] This is an example of the tertiary structure of the large unit of a ribosome

A quaternary structure refers to two or more polypeptide chains held together by intermolecular interactions to form a multi-subunit complex. The interactions that hold together these folded protein molecules include disulfide bridges, hydrogen bonding, hydrogen bonding interactions, hydrophobic interactions interactions and London forces. These forces are usually conveyed by the side chains of the peptides.

These polypeptide chains are the subunits of a protein, capable of taking part in a variety of functions such as serving as enzymatic catalysts, providing structural support in the cytoskeletons of cells, and even composing the hair on our heads.

The peptides of the protein can be identical or different. Insulin is a dimer consisting of two identical peptides, while Hemoglobin is a tetramer consisting of two identical alpha subunits and two identical beta subunits.

Naming Quaternary Structures[edit]

In naming quaternary structures, the number of subunits (tertiary structure) and the suffix -mer (Greek for "part, subunit")are used:

  • 1 subunit = Monomer
  • 2 subunits = Dimer
  • 3 subunits = Trimer (These are sometimes viewed as cyclic trimers. For example: aliphatic and cyanic acids)
  • 4 subunits = Tetramer

The pattern continues with pent-, hex-, hept-, oct-, and so forth.


Computer-generated image of insulin hexamers highlighting the threefold symmetry, the zinc ions holding it together, and the histidine residues involved in zinc binding.
  • Insulin
    • Dimer – alpha chain and beta chain
    • Linked by 2 disulfide bridges
  • HIV Protease
    • Dimer
    • Composed of identical subunits


  • Collagen
    • Composed of 3 helical polypeptide chains
    • Glycine appears at every third residue because there is no space in center of the helix
    • Stabilized by steric repulsion of the pyrrolidine rings of the proline and hydroxylproline residues
    • Hydrogen bonds hold together the strands of the collagen fibers


Structure of human hemoglobin. The protein's α and β subunits are in red and blue, and the iron-containing heme groups in green. From PDB 1GZX Proteopedia Hemoglobin
  • Hemoglobin
    • Consists of 2 alpha and 2 beta groups
    • Has a globular shape
    • Has reverse turns that contribute to circular shape of the protein
  • Aquaporin
    • Made of 6 alpha helices
    • Form hydrophobic loops
    • Forms tetramers in the cell membrane with each monomer acting as water channels

Breaking Apart the Quaternary Structure[edit]

The quaternary structure of a protein can be denatured by breaking the covalent and non-covalent forces that keep it together. Heat, urea or guanidinium chloride will denature a protein by disrupting the non-covalent forces, while beta-mercaptoethanol will break disulfide bridges by reducing the bridges.

Protein Folding[edit]

A protein is never "half folded", at the point where the concentration of the denaturant is in between that of the folded and unfolded form of the protein, there are two structures that exist. Folded and Unfolded, at a ratio of 1:1

Proteins are either folded, or not. There does not exist a stage where a protein is "half-folded". This can be observed by slowly adding denaturant to a protein. This will result in a sharp transition, from the folded state to the unfolded state, suggesting there only exist these two forms. This is a result of cooperative transition.

For instance, if a protein is put in a denaturant where only one part of the protein is unstable, the entire protein will unfold. This is due to the domino effect where destabilizing one part of the protein will in turn destabilize the remainder of the structure. When a protein is in conditions which correspond to the middle of the transition between folded and unfolded, there is a 50/50 mixture of folded and unfolded protein, instead of 'half-folded' protein.

After all is said about being in one structure or the other, there must be something in between them on an atomic level. Unfortunately, this is an area that is still under development, and much research is still being done. Theories such as the condensation Nucleation Principle are concerned with this area of protein folding.

The properties of quaternary structure are: 1. Polypeptide chains can assemble into multisubunit structure. 2. It refers to the spatial arrangement of subunits and the nature of their interactions.


If one takes each student in a class to be a different amino acid, each right hand to be an alpha-carboxyl group, each left hand to be an alpha-amino group, and the head to be the R group; then by joining right hands to left hands, the class will form a polypeptide. The "bonds" joining the hands will be peptide bonds. This can be considered the primary structure of a protein.

If one then takes students and "attract" them to other students 4 "bonds" away, this structure will then fold into a secondary structure; namely the alpha-helix. If the students were put into lines and were attracted to respective students in another line, they would form a beta-pleated sheet.

Now imagine that the heads, or R groups, vary in areas such as personalities, or polarity, like will attract like. The people who are more compatible will then gather together, for instance, hydrophobic areas will usually gather together in the center while surrounded by hydrophilic areas. This makes up the tertiary structure.

Now add in a different class, the people from the new class would have their own tertiary structure, these new people will then come in and react with the original class to form quaternary structures.

Human attempt to manipulate protein assemblies (Quaternary Structures)[edit]

Controlling the quaternary structures is currently catching more and more interest in academics. There are many advantages in manipulating protein assemblies. Firstly, people are able to grow/synthesize enzymes that are beneficial to human. Yet, to get these enzymes to work is the hard part. For example, nitrogenase, the enzyme that can fix nitrogen gas to yield ammonia, can only work under aerobic environment and coupled with ATP as energy source. In addition, researchers have revealed that nitrogenase is compose of two proteins, one for ATP coupling&electron source and the other is the reactive center for nitrogen fixation. The two protein assemble to work as a whole. Recently, scientists remove the ATP coupling protein and replace it with a Ruthenium complex. It turned out that Ruthenium complex can provide electrons with light exposure. Now scientists don't have to deal with the complicate chemistry of coupling ATP, but just shine lights on engineered nitrogenase to get it work! Secondly, protein assemblies can have a lot of clinical/material applications. Ferritin is a family of high-order protein assembly family, usually 12mers or 24mers. Previous researches showed it can absorb large amount of Fe ion. Many researchers are working to control the association and disassociation of Ferritins, seeking for solutions of drug delivery, gas storage, metal harvest and etc. Many approaches have been developed to control protein assembling. Following are some of them.

1. Transition metal-directed. Metal centers in protein are important, not only because they are reactive centers, but also they help stabilize the shape of protein by coordination. Many amino acids are ligands by themselves. Cysteine, Histidine, lysine are the common ones. Plus, researchers can engineer inorganic ligands onto proteins by cysteine substitution. Thus, introducing inorganic ligands much broaden the horizon of protein assemblies.

the structure of Phenanthroline (inorganic ligand).
the structure of Terpyridine (inorganic ligand).

Metal-ligand bonding has several properties. Most obviously, it is a strong interaction. It is stronger than hydrogen bond and weaker than colvant bond. Therefore metal-ligand bond is strong yet not so strong that it is still reversible. Spatially speaking, metals have its coordination orientation, mostly, octahedral and tetrahedral. This property provides human great convenience in arranging proteins spatially.

shown is the cartoon model of a dimer of two terpyridine-labeled proteins.
shown is the cartoon model of a trimer of three phenanthroline-labeled proteins.

2. Hydrophobic interaction. In aqueous environment, amino acid with hydrophobic side chains tend to aggregate together to minimize the exposure to water. Researchers utilize this character and engineer certain matching pair of non-polar amino acids onto proteins to obtain protein oligomers in water solution.

3. Salt bridges. It is well known that amino acids have different pI's. So at certain pH, some amino acids are negatively charged, some are positively charged. If an area on a protein is occupied by mostly negatively charged amino acid and another area is occupied by positively charged amino acids, proteins can aggregate by electrostatic attraction. However, this technique is usually not so selective.

More technique to direct protein assemblies are being investigated, such as coiled-coil. Human's ability to control quaternary structures is promising.


In most archaebacteria, a protein coat is the primary structure that surrounds and shapes the cell. This coat of protein armor is composed of a paracrystalline array of “surface layer proteins.”

Half a million surface layer proteins line next to each other to form a shell that encloses the cell. Inside the shell, they bind to sugar chains on the cell surface, or in the case of archaebacteria, interact directly with the membrane. This protein coat provides protection, and it can also assist in the gathering of nutrients and attachment to targets in the environment.


  1. Kern, J. et al. Structure of surface layer homology (SLH) domains from Bacillus anthracis surface array protein. J. Biol. Chem. 286, 26041-26049 (2011)

Protein folding is a process in which a polypeptide folds into a specific, stable, functional, three-dimensional structure. It is the process by which a protein structure assumes its functional shape or conformation

Proteins are formed from long chains of amino acids; they exist in an array of different structures which often dictate their functions. Proteins follow energetically favorable pathways to form stable, orderly, structures; this is known as the proteins’ native structure. Most proteins can only perform their various functions when they are folded. The proteins’ folding pathway, or mechanism, is the typical sequence of structural changes the protein undergoes in order to reach its native structure. Protein folding takes place in a highly crowded, complex, molecular environment within the cell, and often requires the assistance of molecular chaperones, in order to avoid aggregation or misfolding. Proteins are comprised of amino acids with various types of side chains, which may be hydrophobic, hydrophilic, or electrically charged. The characteristics of these side chains affect what shape the protein will form because they will interact differently intramolecularly and with the surrounding environment, favoring certain conformations and structures over others. Scientists believe that the instructions for folding a protein are encoded in the sequence. Researchers and scientists can easily determine the sequence of a protein, but have not cracked the code that governs folding (Structures of Life 8).

Protein Folding Theory and Experiment[edit]

Early scientists who studied proteomics and its structure speculated that proteins had templates that resulted in their native conformations. This theory resulted in a search for how proteins fold to attain their complex structure. It is now well known that under physiological conditions, proteins normally spontaneously fold into their native conformations. As a result, a protein's primary structure is valuable since it determines the three-dimensional structure of a protein. Normally, most biological structures do not have the need for external templates to help with their formation and are thus called self-assembling.

Protein Renaturation[edit]

Protein renaturation known since the 1930s. However, it was not until 1957 when Christian Anfinsen performed an experiment on bovine pancreatic RNase A that protein renaturation was quantified. RNase A is a single chain protein consisting of 124 residues. In 8M urea solution of 2-mercaptoethanol, the RNase A is completely unfolded and has its four disulfide bonds cleaved through reduction. Through dialysisof urea and introducing the solution to O2 at pH 8, the enzymatically active protein is physically incapable of being recognized from RNase A. As a result, this experiment demonstrated that the protein spontaneously renatured.

One criteria for the renaturation of RNase A is for its four disulfide bonds to reform. The likelihood of one of the eight Cys residues from RNase A reforming a disulfide bond with its native residue compared to the other seven Cys residues is 1/7. Futhermore, the next one of remaining six Cys residues randomly forming the next disulfide bond is 1/5 and etc. As a result, the probability of RNase A reforming four native disulfide links at random is (1/7 * 1/5 * 1/3 * 1/1 = 1/105). The result of this probability demonstrates that forming the disulfide bonds from RNase A is not a random activity.

When RNase A is reoxidized utilizing 8M urea, allowing the disulfide bonds to reform when the polypeptide chain is a random coil, then RNase A will only be around 1 percent enzymatically active after urea is removed. However, by using 2-mercaptoethanol, the protein can be made fully active once again when disulfide bond interchange reactions occur and the protein is back to its native state. The native state of the RNase A is thermodynamically stable under physiological conditions, especially since a more stable protein that is more stable than that of the native state requires a larger activation barrier, and is kinetically inaccessible.

By using the enzyme protein disulfide isomerase (PDI), the time it takes for randomized RNase A is minimized to about 2 minutes. This enzyme helps facilitate the disulfide interchange reactions. In order for PDI to be active, its two active site Cys residues needs to be in the -SH form. Furthermore, PDI helps with random cleavage and the reformation of the disulfide bonds of the protein as it attain thermodynamically favorable conformations.

Posttranslationally Modified Proteins Might Not Renature[edit]

Proteins in a "scrambled" state go through PDI to renature, and their native state does not utilize PDI because native proteins are in their stable conformations. However, proteins that are posttranslationally modified need the disulfide bonds to stabilize their rather unstable native form. One example of this is insulin, a polypeptide hormone. This 51 residue polypeptide has two disulfide bonds that is inactivated by PDI. The following link is an image showing insulin with its two disulfide bonds. Through observation of this phenomena, scientists were able to find that insulin is made from proinsulin, an 84-residue single chain. This link provides more information on the structure of proinsulin and its progression on becoming insulin. The disulfide bonds of proinsulin need to be intact before conversion of becoming insulin through proteolytic excision of its C chain which is an internal 33-residue segment. However according to two findings, the C chain is not what dictates the folding of the A and B chains, but instead holds them together to allow formation of the disulfide bonds. For one, with the right renaturing conditions in place, scrambled insulin can become its native form with a 30% yield. This yield can be increased if the A and B chains are cross-linked. Secondly, through analysis of sequences of proinsulin from many species, mutations are permitted at the C chain eight times more than if it were for A and B chains.

Determinants of Protein Folding[edit]

There are various interactions that help stabilize structures of native proteins. Specifically, it is important to examine how the interactions that form protein structures are organized. In addition, there are only a small amount of possible polypeptide sequences that allow for a stable conformation. Therefore, it is evident that specific sequences are used through evolution in biological systems.

Helices and Sheets Predominate in Proteins because They Efficiently Fill Space[edit]

On average, about sixty percent of proteins contain a high amount of alpha helices, and beta pleated sheets. Through hydrophobic interactions, the protein is able to achieve compact nonpolar cores, but they lack the ability to specify which polypeptides to restrict in particular conformations. As seen in polypeptide segments in the coil form, the amount of hydrogen boding is not lesser than that of alpha helices and beta pleated sheets. This observation demonstrates that the different kinds of conformations of polypeptides are not limited by hydrogen bonding requirements. Ken Dill has suggested that helices and sheets occur as a result of the steric hindrance in condensed polymers. Through experimentation and simulation of conformations with simple flexible chains, it can be determined that the proportion of beta pleated sheets and alpha helices increase as the level of complication of chains is increased. Therefore, it can be concluded that helices and sheets are important in the complex structure of a protein, as they are compact in protein folding. The coupling of different forces such as hydrogen bonding, ion pairing, and van der Waals interactions further aids in the formation of alpha helices and beta sheets.

Protein Folding is Directed by Internal Residues[edit]

By investigating protein modification, the role of different classes of amino acid residues in protein folding can be determined. For example, in a particular study the free primary amino groups of RNase A were derivatized with poly-DL-alanine which consist of 8 residue chains. The poly-Ala chains are large in size and are water soluble, thus allowing the RNase's 11 free amino groups to be joined without interference of the native structure of the protein or its ability to refold. As a result, it can be concluded that the protein's internal residues facilitates its native conformation because the RNase A free amino groups are localized on the exterior. Furthermore, studies have shown that mutations that occur on the surface of residues are common, and less likely to change the protein conformation compared to changes of internal residues that occur. This finding suggests that protein folding is mainly due to the hydrophobic forces.

Protein Structures Are Hierarchically Organized[edit]

George Rose demonstrated that protein domains consisted of subdomains, and furthermore have sub-subdomains, and etc. As a result, it is evident that large proteins have domains that are continuous, compact, and physically separable. When a polypeptide segment within a native protein is visualized as a string with many tangles, a plane can be seen when the string is cut into two segments. This process can be repeated when n/2 residues of an n-residue domain is highlighted with a blue and red color. As this process is repeated it can be seen that at all stages, the red and blue areas of the protein do not interpenetrate with one another. The following link shows an X-ray structure of HiPIP (high potential iron protein) and its first n/2 residues on the n-residue protein colored red and blue. Furthermore, the subsequent structures shown in the second and third row show this process of n/2 residue splitting reiterated as shown where the left side of the protein has its first and last halves with red and blue while the rest of the chain colored in gray. Through this example, it is clearly seen that protein structures are organized in a hierarchical way, meaning that the polypeptide chains are seen as sub-domains that are themselves compact structures and interact with adjacent structures. These interactions forms a larger well organized structure largely due to hydrogen bonding interactions and has an important role in understanding how polypeptides fold to form their native structure.

Protein Structures are Adaptable[edit]

Since the side chains inside globular proteins fit together with much complementary its packing density can be almost like that of organic crystals. As a result, in order to confirm whether or not this phenomena of high packing density was an important factor in contributing to protein structure, Eaton Lattman along with George Rose attempted to verify if there was an interaction between side chains that was preferred in a globular protein. They analyzed a total of 67 well studied structures of globular proteins, and concluded that there were no preferred interactions. This experiment demonstrated that packing is not what directs the native fold, but instead the native fold is necessary for packing of a globular protein. This notion can be further supported as members of a protein family result in the same fold despite their lack of sequence similarity and distant relationships.

In addition, structural experimental data have shown that there are a variety of ways that a protein's internal residues can become compact together in an efficient manner. In an extensive study done by Brian Matthews based on T4 lysozyme, which is produced by bacteriophage T4, it was found that changes in the residues of the T4 lysozyme only affected local shifts and did not result in any global structure change. The following link gives an X-ray view of T4 lysozyme and a brief biochemical description of the structure. Matthews took over 300 different mutants of the 164 residue T4 lysozyme, and compared them with one another. Also, it was observed that the T4 lysozyme could withstand insertions of about 4 residues while still not having any major structural changes to the overall protein structure nor enzyme activity. Furthermore, by using assay techniques it was demonstrated that only 173 of the mutants in T4 of the 2015 single residue substitutions done had significant amounts of enzymatic activity diminished. Through these experiments, it is evident that protein structures are extremely withstanding.

The Levinthal Paradox[edit]

Levinthal's paradox is a thought experiment, also constituting a self-reference in the theory of protein folding. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. An estimate of 3300 or 10143 was made in one of his papers.

The Levinthal paradox observes that if a protein were folded by sequentially sampling of all possible conformations, it would take an large amount of time to do so, even if the conformations were sampled at a rapid rate . Based upon the observation that proteins fold much faster than this, Levinthal then proposed that a random conformational search does not occur, and the protein must, therefore, fold through a series of meta-stable intermediate states.

In 1969 Cyrus Levinthal calculated that if a protein were to randomly sample every possible conformation as it folded from the unfolded state to the native state it would take an astronomical amount of time, even if the protein reached 100 billion conformations in one second. Observing that proteins fold in a relatively short amount of time, Levinthal proposed that proteins fold in a fixed and directed process. We now know that while protein folding is not a random process there does not seem to be a single fixed protein folding pathway.This observation came to be known as the Levinthal paradox. This paradox clearly reveals that proteins do not fold by trying every possible conformation. Instead, they must follow at least a partly defined folding pathway made up of intermediates between the fully denatured proteins and its native structure.

Cumulative Selection[edit]

The way out of the Levinthal Paradox is to recognize cumulative selection. According to Richard Dawkins, he asked how long it would take a monkey poking randomly at a typewriter to reproduce "Methinks it is like a weasel", Hamlet's remark to Polonius. A large number of keystrokes, of the order of 1040 would be required. Yet if we suppose that each correct character was preserved, allowing the monkey to retype only the wrong ones, only a few thousand keystrokes, on average, would be needed. The crucial difference between these scenarios is that the first utilizes a completely random search whereas in the second case, partly correct intermediates are retained. This also reveals that the essence of protein folding is the tendency to retain partly correct intermediates, although the protein-folding problem is much more difficult than the one presented to Shakespeare example above.

Nucleation-Condensation model[edit]

In order to correctly understand the protein-folding problem, we must consider certain characteristics of protein. Since proteins are only marginally stable, the free-energy difference between the folded and the unfolded states of a typical 1000-residue protein is 42 kJ mol−1 and thus each residue contributes on average only 0.42 kJ mol−1 of energy to maintain the folded state. This amount is less than the amount of thermal energy, which is 2.5 kJ mol−1 at room temperature. This meagre stabilization energy means that correct intermediates, especially those formed early in folding, can be lost. The interactions that lead to cooperative folding, nonetheless, can stabilize intermediates as structure builds up. Thus, local regions that have significant structural preference, though not necessarily stable on their own, will tend to adopt their favored structures and, as they form, can interact with one other, resulting in increased stabilization. Nucleation-condensation model refers to this conceptual framework in solving the protein-folding challenge.

Intramolecular Interactions Role in the Folding Mechanism[edit]

Proteins folding forms energetically favorable structures stabilized by hydrophobic interactions clumping, hydrogen bonding and Van der Waals forces between amino acids. Protein folding first forms secondary structures, such as alpha helices, beta sheets, and loops. Different amino acids have different tendencies for whether they are going to form Alpha Helices, Beta sheets, or Beta Turns based upon polarity of the amino acid and rotational barriers. For example, the amino acids, valine, threonine, isoleucine, tend to destabilize the alpha helices due to steric hindrance. Thus, they prefer conformational shifts towards Beta sheets rather than alpha helices. The relative frequencies of the amino acids in secondary structures are grouped according to their preferences for alpha helices, beta sheets or turns (Table 1). Table 1: Relative frequencies of amino acid residues in secondary structures These structures in turn, fold to form tertiary structures, stabilized by the formation of intramolecular hydrogen bonds. Covalent bonding may also occur during the folding to a tertiary structure, through the formation of disulfide bridges or metal clusters. According to Robert Pain’s “Mechanisms of Protein Folding”, molecules also often pass through an intermediate “molten globule” state formed from a hydrophobic collapse (in which all hydrophobic side-chains suddenly slide inside the protein or clump together) before reaching their native confirmation. However, this means all the main chain NH and CO groups are buried in a non-polar environment, but they prefer an aqueous one, so secondary structures must fit together very well, so that the stabilization through hydrogen bonding and Van der Waals forces interactions overrides their hydrophilic tendencies. The strengths of hydrogen bonds in a protein vary depending on their position in the structure; H-bonds formed in the hydrophobic core contribute more to the stability of the native state than H-bonds exposed to the aqueous environment.

Water soluble proteins fold into compact structures with non-polar, hydrophobic cores. The inside of protein contains non-polar residues in center (i.e. - leucine, valine, methionine and phenylalanine), while the outside contains primarily polar, charged residues (i.e. - aspatate, glutamate, lysine and arginine). This way the polar, charged molecules can interact with the surrounding water molecules while the hydrophobic molecules are protected from the aqueous surroundings. Minimizing the number of hydrophobic side chains on the outer part of the structure makes the protein structure thermodynamically more favorable because the hydrophobic molecules prefer to be clumped together, when surrounded by an aqueous environment (i.e. – hydrophobic effect). Proteins that span biological membranes (i.e. - porin) have an inside out distribution, with respect to the water soluble native structure, they have hydrophobic residue covered outer surfaces, with water filled centers lined with charged and polar amino acids.

Folding of Membrane Proteins[edit]

In “Folding Scene Investigation: Membrane Proteins”, a paper written by Paula J Booth and Paul Curnow, the authors attempt to answer how the folding mechanisms of integral membrane proteins with α helical structures work. Studying the folding of membrane proteins has always been difficult as these proteins are generally large and made of more than one subunit. The proteins posses a high degree of conformational flexibility—which is necessary for them to perform their function in the cell. Also, these proteins have both hydrophobic surfaces, facing the membrane, and hydrophilic surfaces, facing the aqueous regions on either side of the membrane. The proteins are move laterally and share the elastic properties of the lipid bilayer in which they are embedded. In order to study these proteins, Booth and Curnow believe that one must manipulate the lipid bilayer and combine kinetic and thermodynamic methods of investigation.

Reversible Folding and Linear Free Energy The free energy of protein folding is measured by reversible chemical denaturation. The reversible folding of a protein depends on this free energy. For the α helix proteins that were being studied, it was proven that a reversible, two-state process is followed. bR (a α helical membrane protein called bacteriorhodopsin) reversibly unfolds if SDS (a denaturant which is an anionic detergent) is added to mixed lipid, detergent micells. The two-state reaction involves a partly unfolded SDS state and a folded bR state. By comparing the logs of the unfolding and folding rate, and the SDS mole fraction, a linear plot was generated proving a linear relationship. This plot proved that bR had a very high stability outside of its membrane—proving that it was unexpectedly stable. Furthermore, bR was so stable outside of the membrane that it would not unfold during a reasonable period of time without addition of denaturant.

Comparison with Water-Soluble Proteins Booth and Curnow studied the 3 membrane proteins about which the most information is held: bR, DGK (Escherichia coli diacylglycerol kinase) and KcsA (Sterptococcus lividans potassium channel). These three membrane proteins were compared to water-soluble proteins (which fold by 2 or 3 state kinetics). The overall free energy change of unfolding in the absence of denaturant was the same for water-soluble proteins and membrane proteins of similar size. This proves that it is the balance of weak forces rather than the types of forces that stabilize the protein that determines its stability. It was proven that H-bonds in the membrane proteins were of similar strength to those of the water soluble proteins, rather than being stronger in membrane proteins as was expected.

Mechanical Strength and Unfolding Under Applied Force Dynamic force microscopy can be used to measure the mechanical response of a particular region of a protein under applied force. The unfolding force in this case depends on the activation barrier. This unfolding has nothing to do with the thermodynamic stability of a protein. For unfolding under applied force, the membrane proteins (especially bR) seem to follow the rules of Hammond behavior. The energy difference between two consecutive states of this reaction is reduced and the states become similar in structure.

Influence of Surrounding Membrane Membrane proteins are influenced greatly by the membranes they are surrounded by. If the lipids incorporate in detergent micells—-increasing the stability of the lipid structure—both the protein and its folding are stabilized. Different combinations of different lipids can result in different stabilities or folding of membrane proteins. The size of the membrane can also affect the membrane protein. Different types of lipids cause different membrane properties. A type of lipids called PE lipids have higher spontaneous curvatures than a second type of lipid called a PC lipid. By adding PE lipids to PC lipids the monolayer curvature of the bilayer increases. Increasing the curvature of the lipid bilayer increases the stability of the protein folding.

Protein translocation in biological membranes[edit]

In mitochondria, the proteins that are made from the ribosomes are directly take in from the cytosol. Mitochondrial proteins are first completely synthesized in the cytosol as mitochondrial precursor proteins, then taken up into the membrane. The Mitochondrial proteins contain specific signal sequence at their N terminus. These signal sequences are often removed after entering the membrane but proteins entering membranes that has outer, inner, inter membrane have internal sequences that play a major movement in the translocation within the inner membrane.

Protein translocation plays a major role in translocating proteins across the mitochondrial membranes. Four major multi-subunit protein complexes are found in the outer and the inner membrane. TOM complexes are found in the outer membrane, and two types of TIM complexes are found integrated within the inner membrane: TIM23 and TIM22. The complexes act as receptors for the mitochondrial precursor proteins.

TOM: imports all nucleus encoded proteins. It primarily starts the transport of the signal sequence into the inter membrane space and inserts the transmembrane proteins into outer membrane space. A Beta barrel complex called the SAM complex is then in charge of properly folding the protein in the outer membrane. TIM23 found in the inner membrane moderates the insertion of soluble proteins into the matrix, and facilitates the insertion of transmembrane proteins into the inner membrane. TIM23, another inner membrane complex facilitates the insertion inner membrane proteins comprised of transporters that move ADP, ATP, and phosphate across the mitochondrial membranes. OXA, yet another inner membrane complex, helps insert inner membrane proteins that were synthesized from the mitochondria itself and the insertion of inner membrane proteins that were first transported into the matrix space. File:Translocation.jpg

Folding on Ribosome[edit]

The place where the protein chain begins to fold is a topic that is greatly studied. As the nascent chain goes through the “exit tunnel” of the ribosome and into the cellular environment, when does the chain begin to fold? The idea of cotranslational folding in the ribosomal tunnel will be discussed. The nascent chain of the protein is bound to the peptidyl transferase centre (PTC) at its C terminus and will emerge in a vectorial manner. The tunnel is very narrow and enforces a certain rigidity on the nascent chain, with the addition of each amino acid the conformational space of the protein increases. Co translational folding can be a big help in reducing the possible conformational space by helping the protein to acquire a significant level of native state while still in the ribosomal tunnel. The length of the protein can also give a good estimate of its three dimensional structure. Smaller chains tend to favor beta sheets while longer chains (like those reaching 119 out of 153 residues) tend to favor the alpha helix.

The ribosomal tunnel is more than 80 Å in length and its width is around 10-20 Å. Inside the tunnel are auxiliary molecules like the L23, L22, and L4 proteins that interact with the nascent chain help with the folding. The tunnel also has hydrophilic character and helps the nascent chain to travel through it without being hindered. Although rigid, the tunnel is not passive conduit but whether or not it has the ability to promote protein folding is unknown. A recent experiment involving cryoEM has shown that there are folding zones in the tunnel. At the exit port (some 80 Å from the PTC), the nascent chain has assumed a preferred low order conformation. This enforces the suggestion that the chain can have degrees of folding at certain regions. Although some low order folding can occur, the adoption of the native state occurs outside the tunnel, but not necessarily when the nascent chain has been released. The bound nascent chain (RNC) adopts partially folded structure and in a crowded cellular environment, this can cause the chain to self-associate. This self-association, however, is relieved with the staggered ribosomes lined along the exit tunnel that maximizes the distances between the RNC.

Generation of RNC for studies:

One technique of generating RNC and taking snapshots as it emerges from the tunnel is to arrest translation. A truncated DNA without a termination sequence is used. This allows for the nascent chain to remain bound until desired. To determining the residues of the chain, they can be labeled by carbon-13 or nitrogen-15 and later detected by NMR spectroscopy. Another technique is the PURE method and it contains the minimal components required for translation. This method has been used to study the interaction of the chains and auxiliary molecules like the TF chaperone. This method is coupled with quartz-crystal microbalance technique to analyze the synthesis by mass. An in vivo technique in generating RNC chain can be done by stimulating it in a high cell density. This is initially done in an unlabeled environment, the cells are then transferred to a labeled medium. The RNC is generated by SecM. The RNC is purified by affinity chromatography and detected by SDS-PAGE or immunoblotting.

By generating the RNCs, many experiments can be done to study more about the emerging nascent chain. As mentioned above, the chain emerges from the exit tunnel in a vectorial manner. This enables the chain to sample the native folding and increases the probability of folding to the native state. Along with this vectorial folding, chaperones also help in favorable folding rates and correct folding.

Protein Folding in the Endoplasmic Reticulum[edit]

Protein Entering the Mammalian ER: The endoplasmic reticulum (ER) is a main checkpoint for protein maturation to ensure that only correctly folded proteins are secreted and delivered to the site of action. The protein entrance to the ER begins with recognition of a N’ terminus signal sequence. Specially, this sequence is detected by a signal recognition protein (SRP) causing the ribosome/nascent chain/SRP complex bind to the ER membrane. Then, the complex travels through a proteinaceous pore called Sec61 translocon which allows the polypeptide chain enter the lumen portion of the ER.

Processes in Conflict During Protein Folding: After the protein enters the ER, the proteins break up into an ensemble of folding intermediates. These intermediates take three different routes. They are either folded properly and sent to be exported out of the endoplasmic reticulum (ER) into the cytosol, aggregated or picked out for degradation. These three processes are in competition to properly secrete a protein. In order for a protein to be properly secreted, the competition between folding, aggregation and degradation must be in favor of folding, so that folding occurs faster than the other processes. This balance is termed proteostasis. The balance of proteostasis can be tipped in favor of folding by either using smaller molecules to stabilize the protein (called co-factors) or increasing the concentrations of folding factors. This ability to control proteostasis allows scientists the power to overcome some of the protein folding diseases such as cystic fibrosis.

The proteins that are folded properly are ready for anterograde transport, and secreted through the membrane of the ER into the cytosol by a cargo receptor that recognizes the properly folded protein. The proteins that are incorrectly folded are not secreted and are either targeted for degradation or aggregated. The aggregated proteins are able to re-enter the stage of protein ensembles ready to be folded so that they may try again at being folded properly.

Protein Folding in ER

Folding Factors in the Endoplasmic Reticulum:

Biochemical research on folding pathways has provided a comprehensive list of folding factors, or chaperones, involved with protein folding in the ER. Folding factors are categorized based on whether they catalyze certain steps or if they interact with intermediates in the folding pathway. General protein folding factors are typically separated into four different groups: heat shock proteins as chaperones or cochaperones, peptidyl prolyl cis/trans isomerases (PPIases), oxidoreductases, and glycan-binding proteins.

Many folding factors are great in that they are multi-functional. One folding factor can take care of different areas of the folding pathway. Unfortunately, this leads to redundancy due to different classes of proteins carrying out overlapping functions. This functional redundancy complicates the understanding of the specific roles of individual folding factors in aiding maturation of client proteins. Folding factors also prefer to act in concert during the maturation process, which further obscures the individual roles of each factor. Since these roles are not clear, it is difficult to confirm that even if one folding factor deals with a particular reaction in one protein, that same folding factor will carry out the same function in another.

In addition to aiding non-covalent folding and unfolding of proteins, folding factors in the ER sometimes delay interactions with the protein. This allows time for nascent proteins to fold properly and enables folded proteins to backtrack on its folding pathway, which prolongs equilibrium in a less folded state, preventing the protein from being held in a non-native state.

Folding after Endoplasmic Reticulum: Although ER provides only correctly assembled proteins to be secreted, some examples exist in which proteins change conformation in the Golgi bodies and beyond. Typically, newly folded proteins are sensitive and prone to unfolding while in the ER but resistant to unfolding after exit. In an environment without chaperones and other folding enzymes, proteins are compact and relatively resistant to change after exiting the ER. However, this doesn’t necessarily mean that protein folding ends because some molecular chaperones like Hsp 70s and Hsp 90s continue to assist in protein conformation throughout the protein’s existence.

Folding Factors in the ER and their Functions

Techniques for Studying Protein Folding[edit]

A strategy for studying the folding of proteins is to unfold the protein molecules in high concentrations of a chemical denaturant like guanidinium chloride. The solution is then diluted rapidly until the denaturant concentration is lowered to a level where the native state is thermodynamically stable again. Afterwards, the structural changes of the protein folds may be observed. In theory, this sounds simple. However, such experiments are complex, since unfolded proteins have random coil states in chemical denaturants. Moreover, analyzing the structural changes taking place in a sample may is difficult, since all of the molecules may have significantly different conformations until the final stages of a reaction. As such, the analysis would have to be performed in a matter of seconds rather than days or weeks that are normally allowed to deduce the structure of a single conformation of a native protein. To avoid this problem, the disulphide bonds can be reduced after the protein is unfolded and reformed under oxidative conditions. The protein can then be identified by standard techniques such as mass spectroscopy to draw conclusions about the structure present at stages of folding where disulfide bonds are formed.

Multiple techniques are used to monitor structural changes during the refolding. For instance, in circular dichorism, UV is used from far away to provide a measurement of the appearance of the secondary structure during folding. UV at a close distance monitors the formation of the close-packed environment for aromatic residues. NMR is also a useful technique for characterizing conformations at the level of individual amino-acid residues. It can also be used to monitor how the development of structures protect amide hydrogens from solvent exchanges.

Circular Dichroism: This type of spectroscopy measures the absorption of circularly polarized light since the structures of protein such as the alpha helix and beta sheets are chiral and can absorb this sort of light. The absorption of light indicates the degree of the protein’s foldedness. This technique also measures equilibrium unfolding of protein by measuring change of absorption against denaturant concentration or temperature. The denaturant melt measures the free energy of unfolding while the temperature melt measures the melting point of proteins. This technique is the most general and basic strategy for studying protein folding.

Dual Polarization Interferometry: This technique uses an evanescent wave of a laser beam confined to a waveguide to probe protein layers that have been absorbed to the surface of the waveguide. Laser light is focused on two waveguides, one that senses the beam and has an exposed surface, and one that is used to create a reference beam and to excite the polarization modes of the waveguides. The measurement of the interferogram can help calculate the protein density or fold, the size of the absorbed layer, and to infer structural information about molecular interactions at the subatomic resolution. A two-dimensional pattern is obtained in the far field when the light that has passed through the two waveguides is combined.

Mass Spectrometry: The advantages of using Mass Spectroscopy to study protein folding include the ability to detect molecules with different amounts of deuterium, which allows the heterogeneity of the protein folding reactions to be studied. It can also measure the conformation of folding intermediates bound to molecular chaperones without disrupting the complex. Mass spectrometry can also directly compare refolding properties, since mixtures of proteins can be studied without separation if the two proteins have sufficiently different molecular weights.

High Time Resolution: These are fast time-resolved techniques where a sample of unfolded protein is triggered to fold rapidly. The resulting dynamics are then studied. Ways to accomplish this include fast mixing of solutions, photochemical methods, and laser temperature jump spectroscopy.

Computational Prediction of Protein Tertiary Structure: This is a distinct form of protein structure analysis in that it involves protein folding. These programs can simulate the lengthy folding processes, provide information on statistical potential, and reproduce folding pathways.

Protein Misfolding[edit]

Protein misfolding refers to the failure of a protein to achieve its tightly packed native conformation efficiently or the failure to maintain that conformation due to reduction in stability as a result of environmental change or mutation. It has been established that failure of protein folding is a general phenomenon at elevated temperatures and under other stressful circumstances. The two most common results of misfolded proteins are degradation and aggregation. When a polypeptide emerges from the cell, it may fold to the native state, degraded by proteolysis, or form aggregates with other molecules. Proteins are in constant dynamic equilibrium so even if the folding process is complete, unfolding in the cellular environment can occur. Unfolded proteins usually refold back into their native states but if control processes fail, misfolding leads to cellular malfunctioning and consequently diseases. Diseases associated with misfolding cover a wide array of pathological conditions such as cystic fibrosis where mutations in the gene encoding the results in a folding to a conformer whose secretion is prevented by quality-control mechanisms in the cell. About 50% of cancers are associated with mutations of the p53 protein that eventually lead to the loss of cell-cycle control and causing the growth of tumors. Failure of proteins to stay folded can result in aggregation, a common characteristic of a group of genetic, sporadic, and infectious conditions known as amyloidoses. Aggregation usually results in disordered species that can be degraded within the organism but it may also result in highly insoluble fibrils that accumulate in tissue. There are about twenty known diseases resulting from the formation of amyloid material including Alzheimer’s, Type II diabetes, and Parkinson’s disease. Amyloid fibrils are ordered protein aggregates that have an extensive beta sheet structure due to intermolecular hydrogen bonds and have an overall similar appearance to the proteins they are derived from. The formation of the amyloid fibrils are the result of prolonged exposure to at least partially denatured conditions.

An abnormal amount plaques and tangles can kill surrounding neurons.

Alzheimer's: This neurological degeneration is caused by the accumulation of Plaques and Tangles in the nerve cells of the brain.[2] Plaques, composed of almost entirely a single protein, are aggregation of the protein beta-amyloid between the spaces of the nerve cells and Tangles are aggregation of the protein tau inside the nerve cells. Tangles are common in extensive nerve cell diseases whereas neuritic plaque is more specific to Alzheimer's. Although scientists are unsure what role Plaques and Tangles play in the formation of Alzheimer's, one theory is that these accumulated proteins impede the nerve cell's ability to communicate with each other and makes it difficult for them to survive. Studies have shown that Plaques and Tangles naturally occur as people age, but more formation is observed in people with Alzheimer's. The reasons for this increase is still unknown.

Creutzfeldt-Jakob Disease (Mad Cow Disease): This disease is caused by abnormal proteins called prions which eat away and form hole-like lesions in the brain. Prions (proteinaceous infectious virion) were discovered to be proteins with an altered conformation. Scientists hypothesize that these infectious agents could bind to other similar proteins and induce a change in their conformation as well, propagating new, infectious proteins.[3] Prions are highly resistant to heat, ultraviolet light, and radiation which makes them difficult to be eliminated. In Creutzfeldt-Jakob Disease there is an incubation period for years which is then followed by rapid progression of depression, difficulty walking, dementia and death. Currently there is no effective treatment for prion diseases and all are fatal.[4]

Parkinson's disease:A mutation in the gene which codes for alpha-synuclein is the cause of some rare cases of familial forms of Parkinson's disease. Three point mutations have been identified thus far: A53T, A30P and E46K. Also, duplication and triplication of the gene may be the cause of other lineages of Parkinson's disease.Victims of Parkinson's disease have primary symptoms that result from decreased stimulation of the motor cortex by the basal ganglia, normally caused by the insufficient formation and action of dopamine. Dopamines are produced in the dopaminergic neurons of the brain. People who suffer from this disease have brain cell loss (death of dopaminergic neurons), which may be caused by abnormal accumulation of the protein alpha-synucleinbinding to ubiquitin in the damaged cells. This makes the alpha-synuclein-ubiquitin complex unable to be directed to the proteosome. New research shows that the mistransportation of proteins between endoplasmic reticulum and the Golgi apparatus might be the cause of losing dopaminergic neurons by alpha-synuclein.

Cystic Fibrosis: Francis Collins first identified the hereditary genetic mutation in 1989. The problem occurs in the regulator cystic fibrosis transmembrane conductance regulator (CFTR), which regulates salt levels and prevents bacterial growth, when the dissociation of CFTR is disturbed as a protein regulating the chloride ion transport across the cell membrane.[5] The deleted amino acid doesn't allow bacteria in the lungs to be killed thereby causing chronic lung infections eventually leading to an early death.[6] Scientists have used nuclear magnetic resonance spectroscopy (NMR) to study Cystic Fibrosis and its effects.

Normal and sickle-shaped red blood cells.

Sickle Cell Anemia: Sickle-shaped red blood cells cling to walls in narrow blood vessels obstructing the flow of blood define sickle cell anemia. The shortage of red blood cells in the blood stream in addition to the lack of oxygen-carrying blood causes serious medical problems. The defect in the Hemoglobin gene is detected with the presence of two defective inherited genes. The sickle cell shape is formed as hemoglobin give up their oxygen resulting in stiff red blood cells forming rod-like structures. Some symptoms include: fatigue, shortness of breath, pain to any joint or body organ lasting for varying amounts of time, eye problems potentially leading to blindness, and yellowing of the skin and eyes which is due to the rapid breakdown of red blood cells. Luckily, sickle cell anemia can be detected by a simple blood test via hemoglobin electrophoresis. Even though there is no cure, blood transfusions, oral antibiotics, and hydroxyurea are treatments that reduce pain caused.[7]

Huntington's Disease: Also known as the trinucleotide repeat disorder, Huntington's disease results from glutamine repeats in the Huntingtin protein. Roughly 40 or more copies of C-A-G (glutamine) will result in Huntington's disease as the normal amount is between 10 and 35 copies. During the post-translational modification of mutated Huntingtin protein(mHTT), small fractions of polyglutamine expansions misfold to form inclusion bodies. Inclusion bodies are toxic for brain cell. This alteration of the Huntingtin protein does not have a definite effect except that it affects nerve cell function.[8] This incurable disease affects muscle coordination and some cognitive functions.

Cataract in human eye

Cataracts: Eye lens are made up of proteins called crystallins. Crystallins have a jelly-like texture in a lens cytoplasm. The current leading cause of blindness in the world, cataracts occurs when crystallin molecules form aggregates scattering visible light causing the lens of the eye to become cloudy. UV light and oxidizing agents are thought to contribute to cataracts as they may chemically modify crystallins. In children, it has been observed that the deletion or mutation of αB-crystallin facilitates cataracts formation. The likelihood of developing cataracts exponentially increases with age. Pain, Roger H. (2000). Mechanisms of Protein Folding. Oxford University Press. pp. 420–421. ISBN 019963788. Retrieved 2009-10-18. 

Amyloid Fibrils[edit]

Protein misfolding caused by impairment in folding efficiency leads to a reduction in number of the proteins available to conduct its normal role and formation of amyloid fibrils, protein structures that aggregate, resulting in a cross-β structure that can generate numerous biological functions. Protein aggregation can come from different processes occurring after translation including the increase in likelihood of degradation through the quality control system of the endoplasmic reticulum (ER), improper protein trafficking, or conversion of specific peptides and proteins from its soluble functional states into their highly organized aggregate fibrils.


X-ray Crystallography

From X-ray crystallography, three-dimensional crystals of amyloid fibril structures were formed and the structure of the peptide formation and how the molecule is packed together were examined. In one particular fragment, the crystal was found to contain parts of parallel β-sheets where each peptide contributes one single β-strand. The β-strands are stacked and β-sheets formed are parallel and side chains Asn2, Gln4 and Asn6 interact with each other in a way that water is kept out of the area in between the two β-sheets with the rest of the side chains on the outside are hydrated and further away from the next β-sheet.

Solid State Nuclear Magnetic Resonance (SSNMR)

Through solid-state nuclear magnetic resonance (SSNMR) and the help of other methods such as computational energy minimization, electron paramagnetic resonance and site-directed fluorescence labeling and hydrogen-deuterium exchange, mass spectrometry, limited proteolysis and proline-scanning mutagenesis the structure of an amyloid fibril was suggested to be four β-sheets separated by approximately 10Å.

Through NMR with computational energy minimization, a 40-residue form of amyloid β peptide at pH 7.4 and 24˚Celius was determined to contribute one pair of β-strand to the core of the fibril which is connected by a protein loop. The amyloid β peptides are stacked on each other in a parallel fashion.

From experiments of site-directed spin labeling coupled to electron paramagnetic resonance (SDSL-EPR), the molecule was found to be very structured in the fibrils and in parallel arrangement. SDSL-EPR along with hydrogen-deuterium exchange, mass spectrometry, limited proteolysis and proline-scanning mutagenesis suggests that the structure has high flexibility and exposure to solvent of N-terminal side, but is rigid in the other parts of the structure.

Experiments through SSNMR with fluorescence labeling and hydrogen-deuterium exchange determined that the C-terminals are involved in the core of the fibril structure with each molecule contributing four β-strands with strands one and three forming one β-sheet and strands two and four forming another β-sheet about 10Å apart.


Further experimentation approaching the atomic level with SSNMR techniques resulted in very narrow resonance lines in the spectra, showing that the molecules within fibrils hold some uniformity with peptides that display extended β-strands with the fibrils.


The structures determined from X-ray crystallography or SSNMR were similar to previously proposed structures from cryo-electron microscopy (EM) formed from insulin. EM, which uses electron density maps, revealed untwisted β-sheets in the structure. The similarities of the structures found in these experiments suggest a lot of amyloid fibrils can have similar characteristics such as the side-chain packing, aligning of β-strands and separation of the β-sheets. [9] Annu. Rev. Biochem. 2006.75:333-366. Retrieved 24 Oct 2011</ref>


The capability to form amyloidal protein structures that are considered to be genetic is from the findings that an increasing number of proteins show no signs of protein related diseases. It has been found that amyloidal proteins can be converted from its own protein that has a function rather than disease- related characteristics in living organisms.

In these protein mutations, different factors that affect the formation of amyloid fibril formation and different chains form amyloid fibrils at different speeds. In different polypeptide molecules, hydrophobicity, hydrophillicity, changes in charge, degree of exposure to solvent, the number of aromatic side chains, surface area, and dipole moment can affect the rate of aggregation of protein. It has been found that the concentration of protein, pH and ionic strength of the solution the protein is in as well as the amino acid sequence it is in determines the aggregation rate from the unstructured, non-homologous protein sequences.

As the hydrophobicity of the side chains increases or decreases can change the tendency for the protein to aggregate.

Charge in a protein can create aggregations through interaction of the polypeptide chain with other macromolecules around it. Also, the low tendency for β-sheets to form along with the high tendency for α-helixes to form contributes in facilitating amyloid formation.

It was found that the degree in which the protein sequence are exposed to solvent tend to affect the formation of amyloids. Proteins that are exposed to solvent seem to promote aggregation. Even though some other parts of the protein that had a high tendency to aggregate were not involved in the aggregation, they seem to at least be partially unexposed to the solvent but other regions that were exposed to solvent that were not involved in the aggregation had a low tendency to form amyloid fibrils.

It has even been raised that protein sequences have evolved over time to avoid forming clusters of hydrophobic residues by alternating the patterns of hydrophobic and hydrophillic regions to lower the tendency for protein aggregation to occur. [9]

The Affects of Sequence on the Formation of Amyloid Proteins

Amyloid formation arises mostly from the properties of the polypeptide chain that are similar in all peptides and proteins, but sometimes, the sequence affects the relative stabilities of the conformational states of the molecules. In that case, the polypeptide chains with different sequences form amyloid fibrils at various rates. Sequence difference affects the behavior of the protein aggression instead of affecting the stability of the protein fold. Various physicochemical factors affect the formation of amyloid structure by unfolded polypeptide chains.

Hydrophobicity of the side chains affects the aggregation of unfolded polypeptide chains. The amino acid in the regions of the aggregation site can change the ability of aggregation of a sequence when they increase or decrease the hydrophobicity at the site of the mutation or folding site. Over time, sequences have evolved to avoid creating clumps of hydrophobic residues by alternating hydrophobic areas of the protein.

Charge affects the aggregation of amyloid protein folding. A high net charge can have the possibility of impeding self association of the protein. Mutations in decreasing the positive net charge may result in the opposite effect of aggregate formation as increasing the positive net charge. It has been seen found that polypeptide chains can be run by interactions with highly charged macromolecules, displaying the importance of charge of a protein aggregation.

Secondary structures of proteins affect the amyloid aggregation as well. Studies show that a low probability to form α-helix structures and a high probability to form β-sheet structures are contributive factors to amyloid formation. However, it has been found that β-sheet formation is not particularly favored by nature since there are little alternation of hydrophilic and hydrophobic residue sequence patterns to be found.

The characteristics of the amino acid sequences affect the amyloid fibril structure and rate of aggregation. Different mutations, including changes in the number of aromatic side chains, the amount of exposed surface area and dipole moment, have been said to change the aggregation rates of lots of polypeptide chains.

Unfolded regions play vital roles in promoting the aggregation of partially folded proteins. Some regions that were found to be flexible or exposed to solvent were fond of aggregation. Other regions that are not involved in the aggregation were found to not be exposed, but rather half buried even though they have high possibility of aggregating while the exposed regions of the structure that are not involved in the aggregation have a low probability of aggregating amyloid fibrils. The fibrils tend to come together by association of unfolded polypeptide segments rather than by docking the structural elements.

Overall, it has been found that unfolded proteins have lower less hydrophobicity and higher net charge than that of a folded protein. Residues that tend not to form the secondary structure of β-sheet structured proteins seem to inhibit the occurrence of amyloid aggregation. Concentration of protein, pH and ionic strength were found to be associated with the amino acid sequence, which affects the rate of aggregation.


Environmental Effects[edit]

It is understood that the primary structure (the amino acid sequence) of a protein predisposes the protein for a specific three dimensional structure and how it will fold from the unfolded form to the native state. The concentration of salts, the temperature, the nature of the primary solvent, macromolecular crowding, and the presence of chaperones are all factors that affect the mechanism of folding and the ratio of unfolded proteins to those in the native state. More than anything, these environmental factors affect the likelihood of any single protein reaching the correct final structure.

Isolated proteins placed in proper environments (specific solvent, solute concentrations, pH, temperature, etc.) tend to “self-fold” into the correct native conformation. Altering any of these environmental characteristics can disrupt the structure and/or interfere with the folding mechanism. A pH outside the “normal” range of a given protein can ionize specific amino acids or interfere with both polar and dipole-dipole intramolecular forces that would otherwise stabilize the structure. Excess heat (cooking) proteins can break hydrogen bonds essential to the secondary structure of proteins.

Extreme environments or the presence of chemical denaturants (such as reducing agents that can break disulfide bonds) can cause proteins to denature and lose its secondary and tertiary structure, forming into a “random coil.” Under certain conditions fully denatured proteins can return to their native state. Intentional denaturing is used in various methods to analyze biomolecules.

The complex environments within cells often necessitate chaperones and other biomolecules for proteins to properly form the native state.

Protein is an essential part of living thing. The development of human body is needed to be parallel with the development of protein. But protein contains so many mysteries that we did not discovery yet. For example, that is protein folding. Folding is a necessary activity of proteins. They need to fold to continue their biological activity. Folding is also a process that very protein goes through to have a stable conformation. But sometimes this process is happened incorrectly, and the scientist call this problem is protein misfolding. The results of protein folding incorrectly are so many bad diseases happening for human, animals and living things such as Alzheimer’s disease and Mad Cow disease. Because of this reason, the researches about protein folding and misfolding become very important. During the process of discovering about protein, folding, misfolding and its affects, the scientists have been collecting many successes; the mystery about protein is unraveled gradually. As a scientist, W. A. (Bill) Thomasson records many importance things about protein in the article Unraveling the Mystery of Protein Folding; in this article, he make the points about Alzheimer’s disease and Mad Cow disease and some affects of protein misfolding beside the successes of science about them. Dr Thomasson begins his article by introduce generally about protein folding and misfolding. First of all, proteins consists the sequences of amino acid. The scientists have discovered 20 amino acids appearing in proteins. The protein structure is known with 2 basic shapes which are α_helix and β_sheet. “Most of proteins probably go through several intermediate states on their way to a stable conformation” (Campbell and Reece, 79). Proteins need to fold to continue its activity. The scientists have listed 3 type of protein folding; the protein can be folded, partial folded or misfolded. In the process of folding, the “proteins called chaperones are associated with the target protein; however’ once folding is complete (or even before) the chaperone will leave its current protein molecule and go on to support the folding of another” (Thomasson). The author of the article records the very important conclusion of Anfinsen about protein misfolding. In his point of view, the misfolding is occurred in the process of folding when the folding goes wrong. The research of protein misfolding is focus on the temperature sensitive mutation; the scientists observe the bacteriophage P22 with the changing of temperature to cause the mutation. And they conclude that the mutant proteins are less stable than the normal. It means, they give a conclusion is that in the tailspike of bacteriophage the misfolded proteins is less stable than the correctly folded proteins and they are difficult to reach the properly folded state. When the protein misfolding occurs, it results many bad disease. The aggregation can appear along with the appearance of misfolding and it is at the brain to cause Alzheimer’s disease and Mad Cow disease as many scientists consider. One affect of protein misfolding on human life that is Alzheimer’s disease. This is a disease of the elderly. According to the research of scientist, this disease is occurred when the amyloid precursor protein is misfolding. This protein is processed into a soluble peptide Aβ. The scientists have not known exactly the reason of this disease yet. But the main reason causing the misfolding is the protein apolipoprotein E (apoE) inside our blood stream. The protein apoE has three forms such as apoE2, apoE3 and apoE4. The affects of each form of apoE on the Aβ is not discovered yet but the scientists consider that the apoE can bind to the Aβ. In the process of misfolding, the β-amyloid is formed to make “neuritic plaque in the Alzheimer’s patient”. This disease is just happened with the older people because in the amyloid process, a nucleus is formed very slowly. The mutation of this protein is not stable and causes the disease. The studying about apoE is still a secret because some scientists show that one form of this protein is developing the disease but another form is decreasing the development of the disease. Finally, the research about Alzheimer’s disease is continued in order to affirm the results of protein apoE on Aβ and to find the treatment for this disease successfully. Another affect from the protein misfolding is the Mad Cow disease. This is a very dangerous disease because it can be transmitted from animals to human. This disease causes by the misfolding of prions. The process of misfolding is the self-replicating of the prions. Prions are protein particles containing DNA and RNA. The mutation appear in the process of folding, the prions self-replicate and cause the misfolding of the proteins. They contain DNA and RNA. This is a special situation of the protein; it can be served as its chaperons. Because of the replicating, the prion was multiplied very quickly along with the increasing of normal proteins. This disease shows that the protein folding can be occurred without the genetics such as the experiment on the sheep. Dr. Thomasson continues his article by some more information about the misfolding and the way of the scientist to prove the mystery. He gives the information about the protein p53 and its mutation. It can cause the cancer, it also one type of protein misfolding. The point Dr. Thomasson wants to make that is his idea about the drug that can make the protein misfolding becoming more stable and minimize the misfolding of protein. This idea seems very good but its results are like a mystery as the mystery of protein folding. The research about the protein folding is very important to our lives. The misfolding is one of the main reasons causing so many dangerous disease but we did not have a successful treatment yet. The study of protein folding is more and more successful to help the human to be able to destroy the disease causing by misfolding. The disease caused by protein misfolding has become one problem of human that need to be solved.

Molecular Chaperones[edit]

Molecular Chaperones are known mainly for assisting the folding of proteins. Chaperones are not just involved in the initial stages of a protein’s life. Molecular Chaperones are involved in producing, maintaining, and recycling the structure and units of protein chaperones. Chaperones are present in the cytosol but are also present in cellular compartment such as the membrane bounded mitochondria and endoplasmic reticulum. The role or necessity of chaperones to the proper folding of proteins varies. Many prokaryotes have few chaperones and less redundancy in the types of chaperones and whereas eukaryotes have large families of chaperones containing some redundancy. It is hypothesized that some chaperones are essential to proper protein folding such as the example of the prokaryote which has less variations of a chaperone family available. Other chaperones play less of an essential role such as in eukaryotes where more variations within a family of chaperones exist and gradients of efficiency or affinity are produced. This redundancy or existence of less efficient chaperones may exist in one state but the effectiveness of chaperones is also a function of their environment. The pH, space, temperature, protein aggregation and other external factors may render a chaperone that was once ineffective into a more essential chaperone. These environmental factors show why it is important to simulate cellular in vivo conditions, or native states, in order to grasp the conditions that require use of chaperones. This briefly summarizes the difficulties in analyzing and comparing chaperone function in vivo vs. in vitro. Simulating in vivo, or the environment within the cell, is important not just because of physical factors such as pH or temperature but also because the time in which the chaperone begins to conform the polypeptide. Some chaperones are nearby the ribosome and attach immediately to the polypeptide to prevent misconformation. Other chaperones allow the polypeptide to begin folding by itself and attach later on. Thus the role of each chaperone becomes specific to its vicinity to the polypeptide and time and place in which it assists folding. Recent research has implicated that chaperones within the nucleolus not only catalyze protein folding but also catalyze other functions important to maintain a healthy cell. These nucleolar chaperones are called Nucleolar Multitasking Proteins (NoMP's). Heat shock proteins, for example, not only help other proteins fold but also act during moments of stress to regulate protein homeostatis. Furthermore, there is evidence that chaperones work together in networks to oversee certain functions like dealing with toxins, starvation or infection.

The nucleolar chaperone network is divided into different branches that have specific functions. The network is dynamic and can vary in concentration or location of the network components depending on changes in the physiology and environment of the cell. Heat shock proteins (HSPs), which are classified based on their molecular weights, are integral components of the chaperone network. HSP 70s and 90s maintain proteostasis by ensuring that proteins are properly folded and preventing proteotoxicity, which is the damage of a cell function due to a misfolded protein. HSP70s help to fold recently synthesized proteins, while HSP90s help later in the folding process. The nucleolar network also contains chaperones that are part of ribosome biogenesis, or the synthesis of ribosomes in the cells. Proteins in the HSP70 and DNAJ families, which help to process pre-rRNA, are regularly found in protein complexes that process pre-rRNA in Saccharomyces cerevisiae (a species of yeast). Other HSPs are important in ribosome biogenesis as well, including HSP90 which works together with TAH1 and PIH1 to create small nucleolar ribonucleoproteins. The nucleolar chaperone network provide the organization and assistance needed to complete the biological taks necessary for cell survival, and if it does not function properly there can be many problems. For instance, when cancer cells have increased levels of rRNA synthesis, ribosome biogenesis is increased. Scientists are researching the compound CX-3543, which can stop nucleolin from binding with rDNA and impede RNA synthesis, leading to cell death. It is possible to potentially use drugs designed to target specific branches of the nucleolar chaperone network in malfunctioning cells. Other networks of chaperones include networks that specifically participate in de novo protein folding, meaning they help to fold newly made proteins, and the refolding of proteins that have been damaged. One chaperone network that exists in tumor cell mitochondria contains HSP90 and TRAP1, which protect the mitochondria and prevent cell death, allowing the cancer cells to continue to spread uncontrollably.[10]

Example: Molecular Chaperone (HSP 70)[edit]

HSP 70 is a protein in the Heat Shock Protein family along with HSP 90. It works together with HSP 90 to support protein homeostasis. It binds to newly synthesized proteins early in the folding process. It has three major domains, the N-terminal ATPase domain, the Substrate binding domain, and C-terminal domain. The N-terminal ATPase binds and hydrolyzes ATP, the substrate binding domain hold a affinity for neutral, hydrophobic amino acid residues up to seven residues in length while the c-terminal domain acts as a sort of lid for the substrate binding domain. This lid is open when hsp 70 is ATP bound and closes when hsp 70 is ADP bound. HSP70, or DnaK, are bacterial chaperones and can help in folding by clamping down on a peptide. [11]

Example: GroEL and GroES[edit]

GroEL and GroES, or 60kDa and 10kDa, are both bacterial chaperones. Both GroEL and GroES are structured so that they are a stacked ring with an empty center. The protein fits in this hollow center. Conformational changes within the chamber can then change the shape and folding of the protein. [12]

Example: Molecular Chaperone (HSP 90)[edit]

HSP 90 is a protein in the Heat Shock Protein family. This particular protein, however, is different from other chaperones in that HSP90 is limited in the folding aspect of molecular chaperones. Instead, Hsp 90 is vital to study and understand because many cancer cells have been able to take over and utilize the Hsp 90 in order to survive in many virulent surroundings. Therefore, if one were to structurally study and somehow target Hsp90 inhibitors, then there could be a way to stop cancer cells from spreading. Furthermore, many studies have been performed in order to test whether or not the Hsp 90 chaperone cycle is driven by ATP binding and hydrolysis or some other factor. But after much research by Southworth and Agard, there was enough evidence to state that HSP90 protein could conformationally change without nucleotide binding but rather the stabilization of an equilibrium is the factor that will change the Hsp90 to a closed or compact or open state. The three conformations of the Hsp90 were found through x-ray crystallography and also through single electron particle microscopy and by studying the three-state conformational changes in yeast Hsp90, human Hsp90 and bacteria Hsp 90 (HtpG) it was clear that there are distinct conformational changes for specific species. Overall, Hsp90 is a chaperone that is more involved with maintaining homeostasis within a cell rather than the involvement of protein folding. Hsp90 has rising potential in the area of drug development in the future since it plays such an essential role in aiding the survival for cancer cells.

Example: Molecular Chaperone (TF)[edit]

This is the first chaperone to interact with the nascent chain as it exits the ribosome tunnel. Without the nascent chain, the TF cycles on and off but once the nascent chain is present, it binds onto the chain, forming a protecting cavity around. In order to do its function, TF scans for any exposed hydrophobic segment of the nascent chain and it can also re-associate with the chain. Folding is found to be more efficient in the presence of the TF, however, this is done at the expense of speed, it can stay with the chain for more than 30 seconds. The release of the chain is triggered when the hydrophobic portions is buried as the folding progresses toward the native state.

Example: Molecular Chaperone (YidC, Alb3, Oxa1)[edit]

YidC, Alb3, and Oxa1 are proteins that facilitate the insertion of proteins in the plasma membrane. YidC is a protein that has only two polypeptide chains. The formation of its structure has been supported by particular phospholipids. YidC proteins can be found in Gram-negative and Gram-positive bacteria. Oxa1 can be found in the inner membrane of the mitochondria. Alb3 locates in the membrane of the thylakoid inside the chloroplast. Experiments showed that YidC protein actively contributes to the insertion of Pf3 coat protein. In addition, YidC also has direct contact with the hydrophobic segment of Pf3 coat protein. Although Oxa1 can only be found in the mitochondria it can also facilitate the insertion of membrane proteins in the nucleus. The role of YidC and Alb3 seems to be interchangeable because Alb3 can replace YidC in E. coli. Moreover, YidC, Oxa1, and Alb3 all support the insertion of Sec-independent proteins. Oxa1 only supports the insertion of Sec-independent proteins because the mitochondria in yeast cell do not have Sec proteins.


Nucleotide-binding domains that are leucine- rich (NLR) provide a pathogen-sensing mechanism that is present in both plants and animals. They could either be triggered directly or indirectly by a derivation of pathogen molecules via elusive mechanisms. Researches show that molecular chaperones like HSP90, SGT1, and RAR1 are main stabilizing components for NLR proteins. HSP90 can monitor the function of its corresponding clients that apply to NLR proteins in three practical ways: promotion of steady-state of functional threshold, activating stimulus-dependent activity, and raising the capacity to evolve.

Plants contain many NLR genes that considered being polymorphic in the LRR domain in order to be familiar with the highly diversified pathogen effectors. The NLR sensor stability will be the mechanism that will determine the pathogen recognition. The HSP90 system is advantageous for plants because it will couple metastable NLR proteins and stabilize them in a signaling competent condition. This will allow for the masking of mutations that would be detrimental.

Molecular Chaperone Mechanism for Substrate Binding in Protein Folding[edit]

It is known that chaperones work together to aid in the folding of protein in order to prevent misfolding. However, the mechanism of how chaperones help in protein folding was not fully understood. Recent studies on Hsp40 and Hsp70 have provided more insights into the mechanism of chaperones and their substrate. The Hsp40 family consists of many Hsp40 with different J-domain. Different J-domain will carry out different Hsp70 ATPase activities when Hsp40 binds to Hsp70. In protein folding, an unfolded polypeptide binds to a Hsp40 co-chaparone. From there, the J-domain of Hsp40 binds to the nucleotide-binding domain (NBD) of Hsp70. A conformation change in the Hsp70 substrate-binding domain occurs when the hydrolysis of ATP to ADP takes place on the HSP70 NBD. This causes Hsp70 to have a higher affinity for the polypeptide substrate and unbind the substrate from Hsp40. When ADP is exchange for ATP, the polypeptide substrate is released from Hsp40. Studies have shown that nucleotide exchange factors make changes to the lobe on the Hsp70 ATPASE domain in way that decreases Hsp70’s affinity for ADP. Once the polypeptide is released from Hsp70, it can fold to its native state or it can be refolded by the chaperones if there is a misfolding. If a polypeptide that is bounded to Hsp70 is recognized by E3 ubiquitin ligase CHIP, it will be degraded.[13]

Small Heat Shock Proteins & α-crystallins as Molecular Chaperones[edit]

It is known that small heat shock proteins (sHSPs) and the related α-crystallins (αCs) are virtually ubiquitous proteins that are strongly induced by a variety of stresses, but that also function constitutively in multiple cell types in many organisms. Extensive research has demonstrated that a majority of sHSPs and αCs can act as ATP-independent molecular chaperones by binding denaturing proteins and reversing denaturation. This approach thereby protects cells from damage due to irreversible protein aggregation. Many inherited diseases have been discovered to result from defects in sHSP/αCs, and these proteins accumulate in neurodegenerative disorders and other diseases linked to aberrant protein folding. sHSP/αC proteins range in size from ~12 to 42 kDa and is a C-terminally located domain of ~90 amino acids, known as the αC domain. sHSP-substrate complexes can be observed by size exclusion chromatography . They are large and heterogenous, and their size distribution depends on the ratio of sHSP/αC to substrate as well as the rate of substrate aggregation, which is affected by concentration and temperature. Substrate binding is generally facilitated by an increase in available hydrophobic surface on the sHSP/αC, which seem to occur without significant loss of defined sHSP/αC secondary and tertiary structure. There is no single, specific substrate binding surface on sHSP/αCs. It rather appears that many sites contribute to substrate interactions, and binding is probably different for different substrates dependent on the conformation of surfaces exposed when a substrate unfolds. However, some sHSP/αCs recognize almost any unfolding protein, which suggests that they act on any labile or damaged cellular component.

The Energy Landscape for Protein Folding[edit]

If proteins folded randomly and unpredictably, the amount of time taken to reach the native conformation would be much larger than the actual time it takes. The current theory on how protein folding occurs naturally and efficiently involves a "funnel" of sorts-the idea being that there exists not a step by step means of reaching the correct 3-D structure, but rather a number of paths that become progressively narrower from top to bottom. The funnel starts at the top and proceeds downward from energetically disfavorable folding at the top to energetically favoring proper folding at the bottom.

The experiment that sparked the idea of proteins relying on energetics and thermodynamics to reach their native folding was conducted by Christian Anfinsenf in 1961, when he discovered that ribonuclease could spontaneously refold into its proper structure after being denatured without the help of other molecules. Further theoretical proof that protein folding is not random is seen in Levinthal's Paradox, which states that it would take roughly 10^81 years for a protein 100 amino acids long to reach the proper conformation, when in reality, it takes anywhere from a millisecond to a day.

Energy Landscape.jpg

These funnel models (such as the Go-type model) show funnels with hills and bumps that represent the protein taking the path of least resistance when moving down the energy funnel. These bumps are termed "points of frustration". It is believed that funnels with the fewest frustration points or bumps fold into their native forms faster since fewer energy boundaries exist. Although these models are simplified attempts and do not account for misfoldings, they nonetheless prove accurate in the case of many proteins.

Another model that uses algorithms and computers is the empirical force field. This model uses hundreds of thousands of computers running idly to compute folding scenarios of proteins under 50 amino acids with surprising accuracy. However, these computer models will sometimes overestimate unlikely folding structures or produce folding patterns that are rarely or never seen. For example, some simulations/algorithms have a tendency of getting stuck in the local minima and are unable to reach the global minima, which is the correctly folded protein. Simple models such as Go-type models not only predict the folded protein, but also the transition states that determine the rate of the protein folding.

These models are just beginning to show the dynamics of the intermediate stages of protein folding. As such, this is an area under further investigation. The understanding of the kinetics of protein folding is less established, and the movement of proteins between initial amino acid strands and the final product is also an area under investigation. The energy landscape model also has trouble accounting for external factors like crowding and aggregates. One such example of external interaction, called "domino swapping", involves the swapping of monomers from one protein to another in order to activate the correct folding of both proteins.

Recent studies have combined human and computer power to correctly predict the protein conformation. Websites like, overseen by the University of Washington's Computer Science department, turn the folding problem into a video game, allowing people around the world to solve protein folding problems like puzzle games. Users are given partially folded proteins, usually those stuck in a locally favorable conformation that seems optimal to a computer, and asked to reconfigure the protein into a shape that looks more stable. Utilizing a computer's computing power and speed along with a human's ability to manipulate objects in space shows promise in helping to solve protein folding problems more efficiently.

Co-operativity and Protein Folding Rates[edit]

The cooperative nature expressed in protein folding is one of the most remarkable aspects of protein folding. Contrary to the traditional viewpoint of complex and heterogeneous mechanisms involved in the folding of a protein, the cooperative two-state folding kinetics shown by many proteins is relatively simple. Due to its simplicity, efforts to understand what determine the co-operativity and the diversity of protein folding rates are made recently by means of applying the cooperative two-state folding kinetics.

The co-operativity of the protein is usually referred to the mechanism by which the presence of a structural region makes additional order more favorable in protein folding. As mentioned previously, the cooperative two-state folding kinetics of small globular proteins is relatively simple and become an interest of study of many scientists. The experiment that excites single molecule that is sensitive enough to allow estimation of transition time reveals two-state co-operativity.

The general trends revealed by two-state folding proteins may be summarized as the following two points. Firstly, more topologically complex proteins tend to fold more slowly than proteins with simpler, local topology; secondly, larger proteins tend to fold more slowly than smaller proteins. The largeness and smallness of a protein here are defined base on its chain length.

Protein folding kinetics is controlled by the free energy barrier determined by the gain of energy and the loss of entropy in the transition state. In describing the pattern, scientists introduce principle of minimum frustration of energy landscape theory. The theory refers to the concept that native-like structures have lower free energy than other random configurations during protein folding. Thus, native-like structures encourage fast folding of the protein and serve as a driving force toward native state, the functional form or the tertiary structure of the protein. This principle can be expressed by the funnel energy landscape.

Funnel Energy Landscape

Funnel energy landscape depicts the energy landscape of a folding protein as a rough funnel. The roughness comes from non-native contacts in protein folding process.The landscape is inherently many-dimensional, so funnel is a projection on the two-dimensional graph. The depth of the funnel represents the energy of a conformational state; the width of the funnel represents the measure of l entropy. The bottleneck of the funnel represents the transition state configuration of the folding protein, whereas the bottom of the funnel represents the native state of the protein. As the protein goes toward its native state, it experiences entropy loss and it achieves lower energy state. The funnel energy landscape serves as a convenient illustration for scientists to envision the thermodynamics and kinetics of the protein folding process.

φ (phi)value

Another concept that plays a role in the study of protein folding kinetics is the φ (phi) value. The value refers to the approximate measurement of native structure content in transition state configuration. The comparison with φ value serves as one of the ways to examine various models that studies protein folding kinetics.

General observations

The fist trend mentioned may be easily understood from an entropic point of view. More topologically complex proteins, or proteins that have long-range contacts, are expected to have higher entropic cost compared with proteins have short-range contacts in terms of folding. The second trend was recently confirmed by experiments focused on the influence of protein size on folding rates. It was found that simple model based only on chain length could roughly predict a protein’s folding rate and stability.


Coarse-grained topology models (Go¯model) are widely used to study the co-operativity and kinetics of protein folding, as it is noted that the topology of native protein determines the folding mechanism. Typical Go¯model simplifies the protein where there is only one interactions stabilizing the folding protein. Early models often examine the non-additive force acting in the protein folding, such as side-chain ordering and hydrophobic effects. Recently, more variety of Go¯models is used to study the protein folding kinetics.

  • Bulleted list item
  • The Go¯model (this refers to Eastwood and Wolynes’ model here) with nonpairwise-additive interactions between the native contacts of the protein demonstrates that short-ranged multi-body interaction can increase the free energy barrier and make the transition state configuration more localized.
  • The lattice Go¯model, on the other hand, demonstrates the coupling local and core burial interactions promoting co-operativity as well as increasing the correlation with contact order.
  • The Go¯model with pairwise-additive interactions, particularly the ones focusing on the effects of varying strength of three-body interactions and φ values, shows that three-body interactions increase energy barrier and increases the agreement with measured φ values.
  • In addition, solvent-mediated interactions are also introduced into Go¯model. Where the interactions between contacts are replaced by solvent separated minimum and desolvation barrier, it is observed that kinetics and co-operativity of protein function increase as a function of the height of desolvation barrier. The advantage of solvent-mediated Go¯model is that it is useful in distinguishing short-ranged contacts and long-ranged contacts and therefore differentiating proteins with simple topologies and the ones with more complex topologies. In study of solvent-mediated Go¯model the chevron plot is often used. The chevron plot is a way to represent protein folding kinetic datas in varying concentration of denaturation that disrupts the native structure of the protein.
  • Variational Go¯model improves co-operativity by excluding volume force between the residues that are in close contacts in native state. In this model it is achieved that a) the Co-operativity is stronger for long-ranged contacts; b) the range of calculated rate is broaden; c) the calculated φ values are improved. There is also Go¯models that entirely focus on the funnel aspect of the protein folding energy landscape and ignore the non-native contact effects.

Other model, such as capillarity model, assumes the volume of folding nuclei scales with number of monomers. In such model, it is shown that increased co-operativity tends to slow down kinetics and smooth the energy landscape.


The recent development of topological models with non-additive forces is becoming a more popular and reliable way to understand the co-operativity of protein folding rates. Refinement of this model has shown its promising future on a more explicit and through understanding of what determines protein folding rates and mechanism. Go¯models that enables long-ranged contacts become more cooperative, and φ values more accurate need further improvement and more attention in the study of protein folding kinetics and the folding mechanism.

Relationship between Protein Sequence, Structure, and Function[edit]

There have been several protein prediction methods developed in the past 20 years. A universal method has not been developed that applies to all proteins because each method has its advantages and disadvantages. The difficulty of developing such a method is due to our incomplete understanding of the highly intricate relationship between protein sequence, structure, and function.

The theory of correlating amino acid sequence to its structure was shown by Anfinsen. He demonstrated that a denatured (unfolded) protein could regain its native tertiary structure spontaneously. This method is also a useful contributor for assigning function to protein structure. A protein researcher could predict that hydrophobic substrates could potentially bind to hydrophobic regions of the protein and vice versa for charged regions. The problem with this method is that it doesn’t take into account certain factors such as atypical environmental conditions.

It was thought that similar sequencing implies related structures. This theory only holds true for a handful of proteins. Researchers saw that similarities in protein folds aren’t always related to its protein sequence. Due to these findings, the ‘Paracelsus Challenge’ was purposed in 1995. The theory behind the ‘Paracelsus Challenge’ was to develop two proteins that were more than 50% identical in sequence, but they both had completely different folds. The challenge was satisfied in 1997 by with two protein sequences that shared 88% sequence identity (GA88 and GB88). Recent studies show that as little as 3 mutations are enough to induce different folding patterns. Although the outcomes of the ‘Paracelsus’ challenge are very interesting, they rarely occur in nature.

Functional convergence causes problems in assigning a specific function to a structure. Various structures can adopt similar functions, but some can adopt very different functions as well. However, there is a significant correlation between certain folds and specific functions. There are two major variables in function prediction: (1) the locations of binding site, and (2) the range of functions at the site. Metal, ions, cofactors, and other proteins that contribute to functions must be taken into considerations as well. One problem that arises with these factors is when determining a structure via crystallography. The PROCOGNATE resource and PIDA database offers a solution to this problem.

A widely used method by which protein function is defined is derived from the Gene Ontology, which consists of three graph structures in which functional terms and relationships between them are defined. Limitations of gene ontology arise with proteins that are non-positional and when proteins have no defined relationship between ligand in its crystal structure. Other developments that attempt to bridge this gap includes The Protein Feature Ontology (PFO [29]) and The Distributed Annotation System (DAS[30]).

Two approaches are used to determine a functional site: (1) either with no knowledge of where the site is or what it binds, or (2) with prior knowledge of the interaction partner. The most highly used methods involve bioinformatics such as the SOIPPA method. A very important contributor to assigning function to protein is sequence conservation, but it is difficult to determine if residues are conserved for structural or functional reasons. Another method involves energy-based approach. A recent development is the ProFunc server, which combines methods such as InterProScan and BLAST search.

Predicting binding sites (which are immensely complex in its own nature) is only the first step of the puzzle. The next step is to determine the overall function in terms of biochemical function, and even more challenging is determining its biological role. The difficulties with analyzing protein function increased another magnitude of complexity when researchers came across the fact that protein function may not only depend on its final folded product. A protein could have functionalities in its partially denatured state and it fully denatured state. With all of this said, it is safe to say that there is still a lot to learn about the relationship between sequence, structure, and function of proteins.

Domain Swapping, Folding and Misfolding[edit]

The domain swapping that occurs in proteins may be important in the folding or misfolding process in proteins. Domain swapping occurs when two or more identical protein chains swap with each other. The domain swapping can be thought of as a mechanism for the interchanging of monomers and oligomers. What happens in oligomeric swapping is that one monomer from one protein will swap with another identical monomer from a different protein. This domain swapping mechanism has been observed in various proteins, more than 40 different proteins. The swapping mechanism is important for some protein functions. For a specific protein for example, p13suc1 it has been seen that the swapping and aggregation correlate meaning that they have a common mechanism. P13suc1 is required for cyclin-dependent kinase (Cdk) during the cell cycle progression. P13suc1 has two different states, one being a monomer and the other a swapped dimmer. The domain swapped part is a β strand is not an independently folded domain. While studying this, it was found that β4 has a critical role when in contact with β2 because they pair with each other early on in the folding process. Therefore, for p13suc1, it has been shown that the regions that have been interchanged are responsible for the folding and misfolding of the protein. There seems to be a competition between folding and misfolding in proteins because polypeptide chains can fold into structures or misfold into amyloid fibrils. What seems to be even more crucial in protein folding is the presence of a folding nucleus which forms part of the protein chain in the transition state. A correlation between residues involved in protein folding nuclei location and amyloidogenic regions have been found as well as important information that fibril formation and protein folding may contain key residues. By using the modeling of folding of proteins and looking at the exchangeable regions in the oligomeric form, the relationship can be seen as responsible for folding and misfolding. This may take researchers one step closer to solving the protein solving problem and understand how proteins get their folding instructions. Reference:

Death-fold Superfamily[14][edit]

There are 4 subfamily structures in the death-fold superfamily. They consist of Death Domains (DDs), Death Effector Domains (DEDs), CAspase Recruitment Domains (CARDs) and PYrin Domains (PYDs). These subfamily structures are involved in the assembly of multimeric complexes which may be implicated in cell inflammation and death.

Structure and Function of a Death-Fold Domain

There are currently 102 known proteins that have death-fold superfamily domains. These domains contain homotypic interactions. These proteins consist of 39 DDs, 8 DEDs, 33 CARDs, and 22 PYDs. Although these domains have up to a 90% difference in sequence, they all have the characteristic death-fold. This fold consist of a "globular structure where 6 amphipathic alpha-helices are arranged in an anti-parallel alpha-helix bundle with Greek key topology" (Peter Vandenabeele et al, 2012). The difference between these death-domains which constitute either of the subfamilies is found in the alpha-helices length and orientation and the distribution of hydrophobic and charged residues along the surfaces of the complexes.

The believed function of the death-fold domains is to mediate the assembly of large oligomeric signaling complexes. At these complexes, caspases and kinases activity is increased. Before now, little was known about the structural conformation of protein assemblies with death-fold domains.

Three distinct Interaction Types

Type I Interaction: Residues from helices 1 and 4 (Patch Ia) of one death-fold domain interact with residues from helices 2 and 3 (Patch Ib) of another death-fold domain. Type II Interaction: Residues from helix 4 and the loop between helices 4 and 5 (Patch IIa) of one death-fold domain interact with residues of the loop between helices 5 and 6 (Patch IIb) of another death-fold domain. Type III Interaction: Residues from helix 3 (Patch IIIa) of one death-fold domain interact with residues located on the loops between helices 1 and 2 and between helices 3 and 4 (Patch IIIb) of another death-fold domain.

Previous theory suggested that the three interaction types were conserved throughout the death-fold superfamily but it now seems that there are differences seen between interactions of the same type of death-fold domains.

Crystal Analysis of Death-Fold Domains

Only three DD complexes have had their crystal structure analyzed. They are PIDDosome, MyDDosome, and the Fas/FADD-DISC. The analyses of these structures have shown that DDs can engage in up to six interactions.

Death-Domains and Medicine

Death-domains have been shown to facilitate the assembly of multimeric complexes that lead to inflammation and cell death. Understanding of these structures can generate therapeutic benefit by preventing or triggering the formation of these oligomeric complexes. Diseases that may be affected by these interactions can include neurodegenerative and inflammatory disorders as well as many others that have characteristic of inflammation or excessive cell death.

Disordered Proteins[edit]

While folding is typically a major contributor to protein function, some proteins do not fold into a specific structure, yet still possess a function. Instead of a specific structure, these proteins often shift between different forms and/or have disordered regions that do not hold to a particular shape.

Just as a protein's folding is determined by its amino acid sequence, non-folding proteins are non-folding because of their sequence. These proteins tend to have much less of certain amino acids than folding proteins, and much more of others. Specifically, non-folding have less of the amino acids that form the hydrophobic cores of folding proteins and more of the surface amino acids. The formation of a hydrophobic core is one of the first steps in most protein folds and, once formed, the core tends to provide the driving force for stable final structures. Without the amino acids to form a core, proteins are not driven towards a specific structure.

= = CONCEPT = =[edit]

Several molecular chaperons that are fully folded and inactive under non-stress conditions have been known as conditionally disordered proteins. These chaperons have a partially disordered conformation when they exposed to distinct stress conditions. This disorder is very important because they are able to protect cells against stressors. The study of these disordered chaperons lead to more understanding of the functional role for protein disorder in molecular recognition. X-ray crystallography is a useful technique that helps visualize the structures of the proteins. Based on this technique, over 95% of the entire molecule is represented by 25% of crystal structures and all others have missing electron density for more than 5% of their sequence due to the multiple conformations on these regions. Proteins actually have some disordered conformation and these disordered proteins lie at one extreme part from very flexible to static structural states on a continuous spectrum. Either only a part of the protein or the whole complete polypeptide chain is found in this disorder. Therefore, investigating only some parts of the proteins would not help summarize the flexibility of the protein. The term “conditionally disordered” means the disorder of proteins may happen under some certain conditions and may not happen under other conditions. It is very common to see the intrinsic disorder within proteins. For example, between 30% and 50% of eukaryotic proteins are estimated to have more than 30 amino acids that violate the defined secondary structure in vitro and many complete unstructured proteins have been predicted to exist too. It is still very challenging to verify the status of folding of proteins within the region of cells despite a lot of computational methods that have been used. There is a chance that many proteins which are seen either partially or fully folded happen to be unstructured in cells. The number of these chances is still uncertain. It is however thought that the presence of the appropriate binding pairs would make the disordered proteins come into their folded state, which means that the percentage of intrinsically disordered proteins in vitro might be lower in the cell. The extent of the disorder might be decreased by the stabilizing interactions within the cells. Through chemical shift, residual dipolar coupling, and paramagnetic resonance enhancement measurements, NMR serves as a good method to provide the detailed information on extent of disorder of the proteins.


There are two states of disordered proteins. One shows a high degree of flexibility and the other state is where the protein is found more ordered. Thus, in order to know the cause and effect relationships between disorder and function, it is essential to study both states. Many disordered proteins like DNA, proteins, and membranes refold once they find a partner to bind to. Also, order-to-disorder-to-order transitions can occur. Proteins that are involved into multiple binding are very good examples of conditional disorder. Binding surfaces that are disordered before binding are able to fold into distinct conformations with other partners better than the binding surfaces that are already well-organized. The ‘conformational selection hypothesis’ suggests that different members of conformational ensemble can be stabilized by the binding of different partners. On the other hand, the ‘folding upon binding’ model proposes that proteins may be able to fold into different conformations when they bind with different partners.


Predictions done on whole proteomes suggest that the frequency of disordered proteins in eukaryotes is much larger than in prokaryotes, with the frequencies in the two groups of prokaryotes, archaea and eubacteria, being similar. In mammals, about half of all proteins are predicted to have large unordered regions, with about a quarter being fully disordered.


Disordered proteins are prevalent in signaling and regulation, especially in interactions with biomolecules such as nucleic acids and other proteins. Molecular recognition and protein assembly and modification frequently involve proteins with disordered regions. The ability of these proteins to interact with multiple molecular partners means that they are also common in protein-protein networks, either as hub proteins or as proteins interacting with hub proteins.


Disordered proteins are implicated in a number of human diseases. In particular, the amyloid diseases, which involve the accumulation of misfolded proteins, seem to be associated with disordered proteins, probably because their variable regions make them more likely to have a structure that favors their accumulation. This category includes many neurodegenerative diseases, such as Alzheimer's and Parkinson's.

The Role of Computers in Determining Structure and Function of Proteins[edit]

The structure or folding of an amino acid and by extension its function can be analyzed and compared through its primary structure or amino acid sequence using computer algorithms. Comparisons of amino acid sequences of unknown folding patterns with similar amino acid sequences of known folding is enhanced using computers. A computer automated tool called Protein Basic Local Alignment Search Tool, or protein BLAST, is a free search tool open to the public that allows quick comparison of amino acid sequences in an online database. The output of this tool is the percent match of amino acids and the known properties of the sequence matches. Furthermore because amino acid sequences are based on DNA sequences, three bases code for one amino acid, the protein under scrutiny can be analyzed on a DNA level using DNA BLAST. The integration of public databases of amino acid and DNA sequences along with computer algorithms has accelerated the genome and proteome field by allowing scientists around the world to share and analyze sequences.


The Role of Computers

The scientists credited for creating the BLAST program are Webb Miller, David J. Lipman, Warren Gish, Eugene Myers, and Stephen Altschul from the NIH

Molecular Chaperones

Pain, Roger H. Mechanisms of Protein Folding. 2nd ed. 364-85

The Energy Landscape for Protein Folding

Cho, Samuel S. "Energy Landscapes for Protein Folding, Binding, and Aggregation: Simple Funnels and Beyond." UCSD Dissertation (2007).

Cheung, Margaret S. "Energy Landscape Aspects of Protein Folding Dynamics Relevant to Molecular Functions." UCSD Dissertation (2003).

Yang, Sichun. "Extending the Theoretical Framework of Protein Folding Dynamics." UCSD Dissertation (2006).

Intramolecular Interactions

Pain, Roger H. "Mechanisms of Protein Folding" 2nd ed.

Berg "Biochemistry" 6 Edition

Co-translational protein folding[edit]

In silico modeling studies have helped identify several characteristics of co-translational folding pathway. First, it was determined that in vivo protein folding is a vectorial process, which is a dispersion change. Second, co-translational vectorial folding of the developing polypeptide from its N-terminal end to its C-terminal end results in a sequential structuring of the distinct regions of the polypeptide emerging from the ribosomal tunnel. Third, attachment to the developing polypeptide chain to the ribosome during protein synthesis reduces the conformational space and the degrees of freedom of the growing chain. This limits the number of possible intermediates and reduces the number of possible folding pathways. Fourth, co-translational protein folding begins early during the process of polypeptide chain synthesis on the ribosome, with some elements forming inside the ribosomal tunnel. Fifth, folding catalysis and molecular chaperones interact with the growing developing chain as soon as it emerges from the tunnel. This accelerates the slow steps in protein folding and prevents misfolding of proteins.


  1. Berg, Jeremy, Tymoczko J., Stryer, L.(2012). Protein Composition and Structure.Biochemistry(7nd Edition). W.H. Freeman and Company. ISBN1-4292-2936-5
  2. "Alzheimer's Disease". Ohio State University Medical Center. 2009. Retrieved 2009-10-09. 
  3. Lindquist, Susan (1999). "What is a Prion?". Retrieved 2009-10-09. 
  4. "Mad Cow Disease and Variant Creutfeldt-Jakob Disease". eMedicine Health. Retrieved 2009-10-09. 
  5. "Unraveling the Mystery of Protein Folding". [Thomasson, W.A. "Unraveling the Mystery of Protein Folding]. Retrieved 2009-10-18. 
  6. "Folding Away Cystic Fibrosis". [1]. Retrieved 2009-10-18. 
  7. "Genetic Disease Profile: Sickle Cell Anemia". [2]. Retrieved 2009-10-18. 
  8. "The Basics of Huntington's Disease". [3]. Retrieved 2009-10-18. 
  9. a b c Invalid <ref> tag; no text was provided for refs named annu
  10. Piotr Banski, Mohamed Kodiha and Ursula Stochaj (2010). "Chaperones and multitasking proteins in the nucleolus: networking together for survival?". Retrieved 2010-10-16. 
  11. Joan L. Slonczewski, John W. Foster. "Microbiology: An Evolving Science."
  12. Joan L. Slonczewski, John W. Foster. "Microbiology: An Evolving Science."
  13. Summers, Daniel W., and Peter M. Dougla (2009). "Polypeptide Transfer from Hsp40 to Hsp70 Molecular Chaperones.". Retrieved 2010-10-24. 
  14. 11. Kersse K, Verspurten J, Vanden Berghe T, Vandenabeele P. The death-fold superfamily of homotypic interaction motifs. Trends in biochemical sciences. 2011;36(10):541–52. Available at: Accessed October 29, 2012.

12. Small heat shock proteins and α-crystallins: dynamic proteins with flexible functions. Basha E, O'Neill H, Vierling E. Trends Biochem Sci. 2012 Mar;37(3):106-17. Epub 2011 Dec 14

Conditional disorder in chaperone action. Bardwell JC, Jakob U. Trends Biochem Sci. 2012 Sep 24. pii: S0968-0004(12)00127-2. doi: 10.1016/j.tibs.2012.08.006. [Epub PMID: 23018052 [PubMed - as supplied by publisher] "Molecular Biology of the cell." Fifth Ed-Alberts, Johnson, Lewis, Raff, Roberts, Walter. pg. 716-717

Braakman, Ineke, and Neil J. Bulleid. "Protein Folding and Modification in the Mammalian Endoplasmic Reticulum." Annual Review of Biochemistry. 80. (2011): 71-99. Web. 29 Oct. 2011. <>.

Cabrita LD, Dobson CM, Christodoulou J. Protein folding on the ribosome. Current Opinion in Structural Biology 2010, doi:10.1016/

A Keith Dunker, Israel Silman, Vladimir N Uversky, Joel L Sussman. "Function and structure of inherently disordered protein." Curr Opin Struct Biol. 2008 Dec;18(6):756-64

Booth Paula J, Curnow Paul. Folding Scene Investigation: Membrane Proteins. Current Opinion in Structural Biology 2009, doi:10.1016/

Heijne, Gunnar Von. "Membrane Protein Folding and Insertion." Annual Review of Biochemistry 80 (2011): 157-60. 26 Oct. 2011 <>

Kuhn, Andreas, Rosemary Stuart, Ralph Henry, and Ross E. Dalbey. "The Alb3/Oxa1/YidC protein family: membrane-localized chaperones facilitating membrane protein insertion?" TRENDS in Cell Biology 13 (2003): 510-16. 26 Oct. 2011 <>

Table 1: Berg , Jeremy. Relative Frequencies of Amino Acid Residues in Secondary Structures. 2012. Biochemistry , New York . Print.

Voet, Donald, Judith G. Voet. Biochemistry 3rd ed. New Jersey: John Wiley & Sons, Inc, 2004. Print.

Original hard-sphere, reduced-radius, and relaxed-tau φ,ψ regions from Ramachandran, with -180 to +180 axes
Backbone dihedral angles φ and ψ (and ω)

A Ramachandran plot, also known as a Ramachandran diagram or a [φ,ψ] plot, was originally developed by Gopalasamudram Ramachandran, an Indian physicist, in 1963. Ramachandran Plot is a way to visualize dihedral angles ψ against φ of amino acid residues in protein structure. Ramachandran recognized that many combinat ions of angles in a polypeptide chain are forbidden because of steric collisions between atoms. His two-dimensional plot shows the allowed and disfavored values of ψ and φ: three-quarters of the possible combinations are excluded simply by local steric clashes. Steric exclusion is the fact that two atoms cannot be in the same place at the same time is the powerful organizing principle that propels the use of the Ramachandron plot forward. Kirandeep

Torsion Angles[edit]

The two torsion angles of the polypeptide chain, also called Ramachandran angles, describe the rotations of the polypeptide backbone around the bonds between N-Cα (called Phi, φ) and Cα-C (called Psi, ψ). The Ramachandran plot provides an easy way to view the distribution of torsion angles of a protein structure. It also provides an overview of allowed and disallowed regions of torsion angle values, serving as an important factor in the assessment of the quality of protein three-dimensional structures.

Torsion angles are among the most important local structural parameters that control protein folding - essentially, if we would have a way to predict the Ramachandran angles for a particular protein, we would be able to predict its 3D structure. The reason is that these angles provide the flexibility required for folding of the polypeptide backbone, since the third possible torsion angle within the protein backbone (called omega, ω) is essentially flat and fixed to 180 degrees. This is due to the partial double-bond character of the peptide bond, which restricts rotation around the C-N bond, placing two successive alpha-carbons and C, O, N and H between them in one plane. Thus, rotation of the main chain (backbone) of a protein can be described as the rotation of the peptide bond planes relative to each other.

Regions in Ramachandran Plot[edit]

The Ramachandran Plot helps with determination of secondary structures of proteins.

  • Quadrant I shows a region where some conformations are allowed. This is where rare left-handed alpha helices lie.
  • Quadrant II shows the biggest region in the graph. This region has the most favorable conformations of atoms. It shows the sterically allowed conformations for beta strands.
  • Quadrant III shows the next biggest region in the graph. This is where right-handed alpha helices lie.
  • Quadrant IV has almost no outlined region. This conformation(ψ around -180 to 0 degrees, φ around 0-180 degrees) is disfavored due to steric clash.


Exception from the principle of clustering around the α-helix and β-strand regions is glycine. Glycine does not have a complex side chain, which allows high flexibility in the polypeptide chain as well as torsion angles, something normally not allowed for other amino acid residues. That is why glycine is often found in loop regions, where the polypeptide chain makes a sharp turn. This is also the reason for the high conservation of glycine residues in protein families, since the presence of turns at certain positions is a characteristic of a particular fold of a protein structure.

Another residue with special properties in terms of its torsion angles is proline. Proline, in contrast to glycine, fixes the torsion angles at values, which are very close to those of an extended conformation of the polypeptide (like in a beta-sheet). Proline is often found at the end of helices and functions as a helix disruptor.



The “protein folding problem” consists of three closely related puzzles:

  1. What is the folding code?
  2. What is the folding mechanism?
  3. Can we predict the native structure of a protein from its amino acid sequence?

Protein Folding Problem[edit]

The Protein Folding Problem is the obstacle that scientists confront when they try to predict 3D structure of proteins based on their amino acid sequence. Although it is known that a given sequence of amino acids almost always folds into a 3D structure with certain functions, it is impossible to predict, with high precision, the exact folding pattern. Understanding the speed of proteins folding, which occurs extremely quickly, has also become a challenge to scientists. To be able to understand any type of biochemical reaction requires isolation and structure determination of reactants, intermediates and products. In protein folding, the isolation of reactants, intermediates and products is complicated because most interactions in proteins are non-covalent and weak interactions which lead to rapid rates of interconversion between each reaction state. Therefore, the isolation of intermediates is not easily achieved and therefore inaccessible for X-ray crystallography. In addition, several advances in protein folding research have been made in characterizing reactants and intermediates. Based on the complexity of protein folding, there are 3 major problems of protein folding: The folding code, structure prediction and the folding speed and mechanism.

The Three Folding Problems[edit]

The Folding Code[edit]

In the late 1980s, scientists discovered that there is a sequence of amino acid code that folds proteins in a particular way. The starting point of protein folding is indeed the primary structure (the sequence of amino acids), also known as denatured state of the protein. Even the smallest amount of the denatured state can activate nucleation and proliferation carried out through protein folding pathways. Characterization of these denatured states of proteins at physiological conditions is very difficult because it is necessary to unfold the proteins to their denatured states without the presence of denaturants [2, Travagilini-Allocatelli et al.].

Recent research has allowed the study of denatured states to reach new heights using the single-molecule approach. Researchers used single-molecule experiments to examine coil to globule transition of proteins and have demonstrated that the denatured state showed steady expansion as the concentration of denaturant was increased. Similarly, at low denaturant concentrations, the peptide chain of the protein collapsed in a sequence dependent manner [2, Travagilini-Allocatelli et al.].

Also there have been advancements to study intermediates in protein folding. For example, the denatured state of the engrailed homeodomian (En-HD) was engineered to be denatured in physiological conditions and Nuclear Magnetic Resonance (NMR) has shown that it resembles a folding intermediate. An additional study discovered that the specific section of the En-HD called the helix-turn-helix motif (HTH) behaves as an independent folding domain. When examining the full protein, the HTH motif represents a folding intermediate in the En-HD folding pathway [2, Travagilini-Allocatelli et al.].

Although the folding of protein is still an enigma, scientists have taken the advantage of these protein information to design new materials, such as medicine, reagents and inhibitors, to benefit the society.

Structure Prediction[edit]

Nowadays, researchers predict the structure of a protein by inputting the amino acid sequence into a computer. The advanced technology and modeling software allow scientists and researchers to form a predicted structure. However, the structure is not accurate, as there is always a small degree of errors present. Nevertheless, this can speed up discovery of new medications since the digital structure can be manipulated.

Secondary structure prediction

Secondary structure prediction is a set of techniques that aim to predict the secondary structures of proteins and RNA sequences based only on their primary structure which is amino acid or nucleotide sequence. For example, proteins, a prediction consists of assigning regions of the amino acid sequence as alpha helices, beta strands, or turns. The success of a prediction is determined by comparing it to the results of the DSSP (the DSSP algorithm is the standard method for assigning secondary structure to the amino acids of a protein, given the atomic-resolution coordinates of the protein) algorithm applied to the crystal structure of the protein; for nucleic acids, it may be determined from the hydrogen bonding pattern. Specialized algorithms have been developed for the detection of specific well defined patterns such as transmembrane helices and coiled coils in proteins, or microRNA structures in RNA.

Tertiary structure prediction

Experimental methods such as NMR spectroscopy or x-ray diffraction analysis are widely used in order to determine tertiary protein structures. But the rate at which protein structures can be determined by experimental techniques is much lower than the rate at which new genes are identified by the various genome projects.

Ab initio protein modelling methods have been used to build 3-D protein models. For example, based on physical principles rather than on previously solved structures. There are many possible procedures that either attempt to mimic protein folding or apply some stochastic method to search possible solutions (like, global optimization of a suitable energy function). These procedures require massive computational resources, and have thus only been carried out for tiny proteins. To predict protein structure for larger proteins will require better algorithms and larger computational resources like those afforded by either powerful supercomputers. Although these computational barriers are massive, the potential benefits of structural prediction make ab initio an active research topic.

Side-chain geometry prediction describes a computational approach that can make predictions for a series of coiled-coil dimers. This method comprises a dual strategy that augments extensive conformational sampling with molecular mechanics minimization.

Quaternary structure

In the case of complexes of two or more proteins, where the structures of the proteins are known or can be predicted with high accuracy, protein–protein docking methods can be used to predict the structure of the complex.

Annexin II

Folding Speed and Mechanism[edit]

In 1968, Cyrus Levinthal pointed out that protein folding, with precision, happens in microseconds, which seems unrealistic and impossible. This is also known as the Levinthal's paradox. Nowadays, we have advanced methods such as mutational methods, which give us the value of phi and psi during folding, and hydrogen exchange methods, which allow us to see structural folding events. However, the dynamics and mechanism of protein folding still require additional research and understanding.

The dynamics and kinetics of unfolded polypeptide chain have been addressed by recent studies of loop formation by Keifhaber and coworkers. They used different model systems each representing different types of loops: end to end, end to interior, or interior to interior. Their experiments showed that end to interior and interior to interior loop formation formed slower than end to end loops. This discovery suggests that chain motion of one part of the unfolded polypeptide chain is coupled to other parts of the chain. These kinetics experiments also revealed that protein folding processes take place on different time scales and thus there is a hierarchy in loop formation[2, Travagilini-Allocatelli et al.].

Although additional research is necessary to understand mechanisms in protein folding, there are two different classical mechanisms that have been used to describe folding of single domain proteins. The first of the mechanisms is called the Diffusion-Collision Model. Proteins that follow this mechanism fold in a stepwise manner that involves growing secondary structure elements. These elements then collide, combine and strengthen. For example, there is evidence that the En-HD mentioned above follows the diffusion-collision model. The second mechanism is known as the Nucleation-Condensation Model. Proteins following this method have been seen to fold from an unstructured denatured state with simultaneous formation of secondary and tertiary structure. For example, a homologous protein of En-HD called hTRF1 has been shown to follow this model. However, there are many proteins that exhibit characteristic pathways of both diffusion-collision and nucleation-condensation models [2, Travagilini-Allocatelli et al.].

The starting point of protein folding: the denatured state

In the denatured state, the structure can trigger nucleation and propagation, which may carry through the folding pathway. Characterization of denatured states of proteins at physical conditions represents a hard task as needed to disfavor the population of native states without adding denaturants. Chemically denatured states may act like random-coil polymer at high denaturant concentrations. Sherman and Haran used single-molecule experiments to analyze the coil to globule transition of protein L and showed that the denatured state of the protein increases as the denaturant concentration increases. Also Eaton and co-workers compared the size and dynamics of the denatured states of those two proteins, displaying a similar length of 64 and 66 amino acids.

Mechanisms of protein folding

There were two different mechanism used to describe the folding of single-domain proteins. Some proteins such as barnase, has been described to fold in a stepwise manner with rapid formation of distinct nuclei and also with their collision and consolidation. There are also other proteins, with chymotrypsin inhibitor 2 as an example of the nucleation-condensation model. The folding pathway of the small alpha beta protein domain has been shown to be distinct from the pure nucleation-condensation and diffusion-collision, but still displaying the characteristics of both models.

Folding stability and function

The inherent stability of individual protein segment is a key factor in determining the folding mechanism of a given protein. Many times, cell’s life relies on the ability of its constituent proteins to fold into 3D structures that are crucial for their function. The amount of folded functional protein in a cell depends on several factors such as, rate of protein biosynthesis and degradation.

There was a question about whether the stability and folding of fully folded proteins can be related to their activity. Allostery can be the bridge where protein folding meets function. Allosteric effects involve communication between ligand binding sites which is critical to many physiological processes. As allostery is a thermodynamic process, it should not only be considered by changes in conformation but also by changes in the dynamics of the mean conformation.

Therefore more research is necessary to fully comprehend the mechanism of protein folding and find a solution to the protein folding problem.


  1. Ken A Dill, S Banu Ozkan, Thomas R Weikl, John D Chodera and Vincent A Voelz. The protein folding problem: when will it be solved?Current Opinion in Structural Biology 2007.
  2. Carlo Travaglini-Allocatelli, Yiva Ivarsson, Per Jemth and Stefano Gianni. Folding and stability of globula proteins and implications for function Current Opinion in Structural Biology 2009, 19:3-7.
  3. Mount DM (2004). Bioinformatics: Sequence and Genome Analysis. 2. Cold Spring Harbor Laboratory Press. ISBN 0879697121
  4. Zhang Y (2008). "Progress and challenges in protein structure prediction". Curr Opin Struct Biol 18 (3): 342–8. doi:10.1016/ PMC 2680823. PMID 18436442

Although much work has been done on protein folding "in vitro", few research has significantly advanced the work contributing to "in vivo" protein folding. The importance of the latter comes as a consequence that protein folding is presumably guided by a molecular mechanism instead of a protein independently folding according to the lowest energy conformation. Although it has proven that proteins are highly successful at reaching their native state only by chaperone proteins, it seems that at the creation of a new protein, something must assist the development of the secondary and tertiary structure. The authors of a current opinion article in Structural Biology, Lisa D. Cabrita, Christopher M. Dobson, and John Christodoulou have published an update on the recent discoveries of how the nascent chains of a newly synthesized protein emerges in the article entitled, "Protein Folding on the Ribosome."

Folding on Ribosome[edit]

The place where the protein chain begins to fold is a topic that is greatly studied. As the nascent chain goes through the “exit tunnel” of the ribosome and into the cellular environment, when does the chain begin to fold? The idea of cotranslational folding in the ribosomal tunnel will be discussed. The nascent chain of the protein is bound to the peptidyl transferase centre (PTC) at its C terminus and will emerge in a vectorial manner. The tunnel is very narrow and enforces a certain rigidity on the nascent chain, with the addition of each amino acid the conformational space of the protein increases. Co translational folding can be a big help in reducing the possible conformational space by helping the protein to acquire a significant level of native state while still in the ribosomal tunnel. The length of the protein can also give a good estimate of its three dimensional structure. Smaller chains tend to favor beta sheets while longer chains (like those reaching 119 out of 153 residues) tend to favor the alpha helix.

The ribosomal tunnel is more than 80 ampere in length and its width is around 10-20 ampere. Inside the tunnel are auxiliary molecules like the L23, L22, and L4 proteins that interact with the nascent chain help with the folding. The tunnel also has hydrophilic character and helps the nascent chain to travel through it without being hindered. Although rigid, the tunnel is not passive conduit but whether or not it has the ability to promote protein folding is unknown. A recent experiment involving cryoEM has shown that there are folding zones in the tunnel. At the exit port (some 80 ampere from the PTC), the nascent chain has assumed a preferred low order conformation. This enforces the suggestion that the chain can have degrees of folding at certain regions. Although some low order folding can occur, the adoption of the native state occurs outside the tunnel, but not necessarily when the nascent chain has been released. The bound nascent chain (RNC) adopts partially folded structure and in a crowded cellular environment, this can cause the chain to self-associate. This self-association, however, is relieved with the staggered ribosomes lined along the exit tunnel that maximizes the distances between the RNC.

The current understanding of protein folding has come from in vitro studies of renaturation of proteins through a variety of different environments as well as in silico computer simulations. These studies can only help to extrapolate fractions of the in vivo process of protein formation. Protein folding is initiated following the synthesis of the nascent polypeptide chain as it is synthesized by the ribosome. The start of protein folding is therefore coupled with the continuing synthesis of the polypeptide chain.

Currently, protein folding is view as a process that takes place as a consequence of interactions been the amino acid of that protein which can take certain paths to achieve a lowest energy state, the native state. However, there are certain paths a protein may start to fold by and lead to a conformation that is of low energy but not the native state. The protein has not way of coming of this conformation without a significant amount of energy input. This non-native state is a way a protein can be misfolded and lead to aggregation. Another factor that can influence the likelihood of obtaining the native state is the fact that larger proteins have more possibilities of folding, this decreases the likelihood of forming the most energetically favorable state. Proteins us the "co-translational folding' to reduce the extent of conformational space available to the protein. Adding to this, molecular chaperones help to further assist proteins in achieving their native conformational state.

Generation of RNC for studies[edit]

One technique of generating RNC and taking snapshots as it emerges from the tunnel is to arrest translation. A truncated DNA without a termination sequence is used. This allows for the nascent chain to remain bound until desired. To determining the residues of the chain, they can be labeled by carbon-13 or nitrogen-15 and later detected by NMR spectroscopy. Another technique is the PURE method and it contains the minimal components required for translation. This method has been used to study the interaction of the chains and auxiliary molecules like the TF chaperone. This method is coupled with quartz-crystal microbalance technique to analyze the synthesis by mass. An in vivo technique in generating RNC chain can be done by stimulating it in a high cell density. This is initially done in an unlabeled environment, the cells are then transferred to a labeled medium. The RNC is generated by SecM. The RNC is purified by affinity chromatography and detected by SDS-PAGE or immunoblotting.

By generating the RNCs, many experiments can be done to study more about the emerging nascent chain. As mentioned above, the chain emerges from the exit tunnel in a vectorial manner. This enables the chain to sample the native folding and increases the probability of folding to the native state. Along with this vectorial folding, chaperones also help in favorable folding rates and correct folding.

Ribosome Structure and Co-translational Protein Folding[edit]

In E. coli the 70S ribosomal particle is composed of 50 proteins and three RNA molecules. The most interesting structural feature in the 70S ribosomal particle in regards to protein folding is the ribosomal exit tunnel. This is a channel that links PTC(peptidyl transferase centre) with the cellular environment. The dimensions include a length of 80 angstoms, width between 10-20 angstroms. 70S is lined with a large RNA molecule and L4 and L22 ribosomal proteins. Also L23 serves as a docking point for other molecules to assist in the folding process. L4 and L22 proteins in the ribosomal exit tunnel have been shown by recent cryoEM studies that they can interfere with proteins synthesis along with other interactions with the nascent chain. In addition, arginine residues have been observes to stop the translation process by changing electrostatic potentials. Although ribosomal exit tunnel is presumably to have a more or less rigid structure, it seems that it does partake to a degree support nascent chain folding. This is evidence by the fact that on average the tunnel is able to accommodate about 30-40 residues, which is considerably more than a polypeptide chain sequence that is fully extended. The degree to which a nascent chain folds seems to vary depending on the kind of protein being synthesized. Certain nascent chains transmembrane protein sequences appear to possibly already construct an alpha-helical structure inside the tunnel. Studying nascent chains emerging from the ribosomal exit tunnel has proven to be a significant challenge for any of the current methods of structural and cellular biology. One idea presented in this paper is to take be able to have "snapshot" of the elongation process. In order to due this, translation must be arrested artificially which would involve engineering DNA strands that lacks a stop codon. Another issue is also in focusing on the particular residues of interest on the nascent chain within the sea of other residues form the ribosome.

Understanding Co-translational Folding by Biochemical and Biophysical Studies[edit]

Once examples illuminated in the article is using SDS-Page on the risbosomal bound nascent chains(RNCs) of influenza haemagglutinin which showed they can form disulfide bonds and undergo glycosylation. Also, using monoclonal antibodies, it has been discovered that there is variability in the emergence of the nascent chain from the tunnel. These examples among others demonstrate that not only can nascent chains acquire structure but also activity while still being attached to the ribosome. The speed of folding for nascent chains seems to be related to the number of stop and rare codons present. The reasoning is that a discontinuous translation rate will slow down the folding process. However, slower rates seem to produce more efficient folding since the nascent chain has more time to develop its native structure. Most of the biochemical and physical methods illuminating the understanding of co-translational folding has been eluded by x-ray crystallography because of the dynamic nature of the folding process which in crystallography is very difficult to obtain.

Auxiliary Factors in Co-translational folding[edit]

As the nascent chain starts emerging from the tunnel, it has to opportunity to interact with molecules that will assist int eh folding process. These include molecular chaperones, peptide deformylase, and the signal recognition particle. The first molecule in assisting the nascent chain in folding is the 48kDa TF which docks on L23. This protein in the absence of a nascent chain will dock on and off however iwth the presence of the nascent chain its affinity to bind to L23 increases. TF undergoes a conformational change in a where a protective cavity is formed for the nascent chain. TF enables enough of the polypeptide chain to emerge such that a significant degree of folding can be achieved. It does this by binding to hydrophobic segments of the chain even after is has released from L23. Once hydrophobic regions of chain are no longer exposed, TF seem to unbind and allow further helper molecules to assist in protein folding. TF seems to increase folding efficiency but at the expense of being slower to fold. Protein translocation is then done by SRT which shuttles the TF to a heterotrimeric integral membrane protein. This then allow further processing and folding.

Ribosome subunit in prokaryote cells and eukaryote cells[edit]

The ribosomes catalyze peptide bond formation, in a process called peptidyl transfer catalysis, and synthesize polypeptides by reading the genetic code of the mRNA. The ribosome is composed of a large and a small subunit both in prokaryote and eukaryote cells. Prokaryotes have 70S ribosomes, each consisting of a small (30S) and a large (50S) subunit. Eukaryotes have 80S ribosomes, each consisting of a small (40S) and large (60S) subunit. Due to the differences in their structures, the bacterial 70S ribosomes are vulnerable to these antibiotics while the eukaryotic 80S ribosomes are not. Within the cellular structure, mitochondria have ribosomes similar to the bacterial ones; however, mitochondria within eukaryote cells are not affected by these antibiotics because they are surrounded by membrane around its organelle. The initiation of the translation process in bacteria was found to locate on 30s subunit. This process requires the increase of both the incubation temperature and ionic strength in order to assemble into the correct tertiary structure contained with its amino acid sequence. The research experiments done by Dr. Masayasu’s research on the synthesis of ribosomes and ribosomal components in E-coli, also found that the correct assembly of the ribosomal particles is locating in the structures of their own molecular component and not by other nonribosomal factors.

A ribosome is the essential contributing factor in protein synthesis where it is assembled on the translation initiation region (TIR) of the mRNA during the initiation phase of translation. The mRNA is decoded as it slides through the large ribosomal subunit and places the a polypeptide chain in the other subunit of the ribosome. Newly synthesized protein will then dissociate once the stop codon is reached in the ribosome. In the final ribosome recycling phase, the ribosomal subunits dissociate and the mRNA is released. The main events of the translation process are relatively similar in both prokaryotic and eukaryotic cells. Major differences in the detailed mechanism of each phase exist. Bacterial translation involves relatively few factors, in contrast to the more complex process in eukaryotes.

Peptidyl Transfer Catalysis By Ribosome[edit]

During protein elongation, the ribosome PTC acts as a catalyst to cleave the

See also[edit]

Structural Biochemistry/Proteins/Protein Folding


Ki Yun Leung, Edward, et al. (2011). [8] The Mechanism of Peptidyl Transfer Catalysis by the Ribosome, 80(1):527-555.

The basic process of forming membrane proteins into complexes

Assembly of bacterial inner membrane proteins[edit]

Many membrane proteins form multiple sub unit protein complexes. They possess integral and peripheral subunits. Enzymes known as Sec translocase and YidC insertase insert bacterial membrane proteins into the inner membrane. This process is assisted by YidC and the phospholipid phosphatidylethanolamine. Glycine zippers and other motifs also help transmembrane-transmembrane helix interactions that can form alpha helical bundles of membrane proteins. When membrane insertion occurs or when after membrane insertion occurs, the subunits of oligomeric membrane proteins have to be able to locate each other to construct the homo-oligiomeric and the hetero-oligomeric membrane complexes. Even though chaperones can serve as assembly factors to construct the oligomer, numerous protein oligomers seem to fold and oligomerize spontaneously. It has been shown by experiments that many of the subunits of hetero-olgiomers are structured after a sequential and patterned pathway to create the membrane protein complex. If it so happens that the inserted protein folds improperly or the membrane protein is assembled incorrectly, quality control mechanisms can deactivate the proteins.

Membrane Proteins


Membrane protein can do a large variety of functions inside the cell from metabolite exchange to cell signaling and nerve conduction. They can also function as ATPases, electron carriers, ion channels, and transporters, sheddases, and photosynthetic reaction centers. They are abundant in both the eukaryotic and prokaryotic cell and they comprise about 20 percent to 30 percent of the total amount of proteins.

Many of the integral inner membrane proteins are alpha helical bundles with alpha helical membrane spanning areas. Advanced research has shown that the structures of the membrane proteins possess not only membrane spanning helices that are straight, but also possess very curved helices that span the membrane partially through. Alpha helical membrane proteins can exist as monomer or as multimeric complexes.

In order to guarantee that membrane proteins behave and function properly, they must be instructed to their destined membrane in the cell and then inserted and folded to the appropriate structure. Membrane tageting in the eukaryotic cells is necessary and more complicated than in eubacteria. Eukaryotic cells must instruct at least 10 membranes while eubacteria must only instruct 1 or 2 membranes in the gram-positive and gram-negative bacteria, respectively. After targeting, membrane protein integration and topogensis are instructed by a coordinated process of topogenic sequences and translocases. While this process is occurring, the transmembrane segments and extramembranous loops are folded.

The process of bacterial inner membrane protein assembling into the membrane is very complex. In addition, the mechanisms that control the protein targeting and inserstion into the membrane, folding of the alpha helical bundles, and the assembly into oligomeric membrane protein complexes will be explored more in depth.

Recognition and Targeting[edit]

The targeting of nascent chains to the membrane happens initially during the protein synthesis. It happens very early, even before the appearance of the polypeptide from the ribosomes channel. These nascent chains can already send signals in the ribosomes, which is a requirement of the signal recognition particle. A signal recognition particle is made up of a protein component Fth and a 4.5S RNA. The SRP combines with a hydrophobic part of a membrane protein as comes out from the ribosome at the membrane surface. The SRP-interacting area is most commonly the first TM region, but it can also be further apart and distinct from the TM segments. By studying the structure, it has been shown that a groove in the SRP M domain binds to the apolar segment.

When the receptor FTsy of the SRP- ribosome nascent chain complex is targeted by this complex, a SRP/FTsy complex is formed. The deconstruction of the complex and the freeing of the targeted protein needs GTP hydrolysis. The SRP and the FSty start out GTP bound and afterwards they construct into a complex by the interaction of their NG domains. A common trait between Ffh and FTsy is that they both have two homologous doamins and a distinct domain. By analyzing the structure if the Ffh and the FtsY NG doman complex, an interesting thing was found that there is a shared composite active-site area in the Ffh/FtsY hetereodimer, which is combined with two bound nucleotides. After the process of GTP hydrolysis, the membrane protein-nascent chain complex is sent to the SecYEG translocation channel, and the SRP and FtsY break apart from each other, which enables the SRP to recycl and interact in another round of SRP targeting. This sending of the nascent chain to the translocation channel is assisted by the interaction of the FtsY with SecY.

Insertion of the membrane proteins[edit]

It is necessary for the enzymes Translocases and intertases to put the freshly synthesized proteins into membranes. In bacteria, the SecYEG translocase and the YidC insertase have been depicted and analyzed. It reveals that they both display their translocation and insertion function in reconstituted systems. In addition, they are necessary processes for the bacterial life.

Sec Translocase Complex[edit]

The enzyme Sec translocase catalyzes the bacterial membrane protein insertion. The Sec translocase is made up of the membrane-embedded SeYEG and SecDFyajC complexes, in addition to the peripheral membrane component SecA. SecYEG supplies the protein-conducting channel. This is necessary for translocation and to make membrane protein insertion more efficient. Sec, which also known as the motor ATPase, is crucial for the translocation of preproteins through the membrane and for the translocation of particular hydrophilic areas of the membrane proteins. SecA utilizes ATP hydrolysis to propel the inserting polypeptide chain thorugh the Sec channel 20 to 30 residues simultaneously.

A major important discovery in the protein export area of studies was that the structure of the SecY complex was determine from an enzyme called Methanoccoccus jannaschii. This enzyme is made up of SecYEBeta. SecBeta does not have sequence homology to the eubacterial SecG but it does have sequence homology to the eukaryotic Sec61Beta. The SecY channel contains an hourglass structure with hydrophoibic narrow parts that is about 3 to 5 A in size which is found in the center of the channel. The narrow constriction wihtin the SecYEbeta splits the interior hydrophilic cavities on the periplasmic and cytoplasmic areas of the membrane. This narrow area is made up of a hydrophobic pore ring, which consists of 4 isoleucine residues, one valine, and one leucine residue. In addition, the aliphatic side chains of these amino acids are directed toward each other, which creates a hydrophobic collar through which the hydrophilic region of the polypeptide chain would be transport during translocation across the membrane.

Based on the crystal structure, the SecY channel is in sealed off state with the pore ring closed off by a helix on the luminal side. When the Sec channel opens up through signal peptide binding to the SecY TM2-TM7 region, the plug is relocated out of the channel site about 20 A away near the SecE helix.

Another important aspect of the SecY channel is the lateral gate. This is made to let the Tm regions of the inserting membrane proteins to be freed from the channel laterally and to split it into the lipid phase. The lateral gate is at the surface of SecY TM2 and TM7 of the Sec61alpha (SecY) which is found at the front side of the Sec channel. Before, TM2 and TM7 of the Sec61Alpha was thought to form the signal peptide-binding region because a signal peptide of preprotein can potentially be cross linked to these Tm parts during posttranslational translocation. When translocation of a polypeptide chain occurs, the lateral gate is opened up. The opening of this lateral gate is significant because locking the lateral gate by disulfide cross linking does not allow SecA-mediated preprotein translocation in Escherichia coli.

It is important to understand how the SecA operates with the SecY channel to translocate hydrophilic domains of membrane proteins across the membrane. The 4.5 A structure of the SecA/SecYEG from Thermotoga martima helps explain this process. First one copy of the SecA is attached to one copy of the SecY channel in the structure. The SecA is placed flat on the SecY channel about parallel to the membrane surface. It is important to note that the opening of the SecYEG channel has a two helix finger domain of SecA that can serve to transport substrates into the channel.

YidC Insertase[edit]

The YidC insertase is important because its job is to fit tiny proteins into the membrane. It was discovered that YidC influences membrane protein insertion. When the amounts of YidC is lessened in the cell, the insertion of Sec-independent proteins were slowed and discouraged. Before it was thought to be fit into the membrane spontaneously.

Through experiments it was thought that YidC affects the process of insertion of Sec-independent substrates. Photocross-linking studies that utilize a cell-free system displayed that membrane proteins that were stuck at different points of membrane protein insertion interact with YidC. Lipid vesicles that have YidC are enough to put the Sec-independent Pf3 coat protein and the ATP synthase subunit c. It was found that the Pf3 coat proteins sticks to the YiDC. This leads a significant conformational structure difference in the YidC protein.

Assembly of Multispanning Membrane Proteins[edit]

Many of important membrane proteins span the lipid bilayer often. They span it in such a way that the sequential TM segments are in an alternating N to C and C to N orientation of the alpha helices. The TM segments are put together by cytoplasmic and periplasmic loops. These loops are primary hydrophobic and have differences in how big or small it as including the charge. Small loops put the tow helices together. On the other hand, the big and longer transform into different domains by folding. This plays a role in how the protein behaves and functions.



  1. Ross E. Dalbey and Peng Wang and Andreas Kuhn(2011).[4]. "PubMed", p. 3-6.

Enzymes go through several mechanisms in order for it to survive and thrive in the biological world. The fact that proteins can fold amongst itself in their functional states after the process of synthesis is one of the most fascinating mechanisms ever studied by researchers.

Basis of Protein Folding[edit]

In a living cell, protein folding occurs in a highly complex environment and uses different utility proteins for function. Some proteins' sole function is to protect the incomplete folding process from malfunctioning or the polypeptide chain from interactions other than folding. It is especially protective against factors that could lead to aggregation, folding catalysis or others that can slow down the process of protein folding in relation to isomerization or the forming of disulphide bonds. There are exceptions to the process of folding where auxiliary proteins are not needed to protect the sequence. Evidence shows that the code for protein folding is contained within the protein sequence. This is because studies have been shown where proteins undergo in vitro processes and can still function the same way as a protein supported by auxiliary proteins, as long as the in vitro occurs within conditional environments.

Protein Folding Mechanisms[edit]

There have been a mass amount of studies performed on the mechanism of protein folding recently. Many researchers have also been receiving plenty of successful feedback on these conducted experiments. Many different types of applications, such as experimental and theoretical, have provided the basis for the main reason of studying protein folding in the first place.

One of the strongest cases of protein folding into new enzymes is known as the "stochastic process". The stochastic process is a random process that calculates different possibilities of pathways and conclusions to the final result of the experiment. The stochastic process is opposite to the deterministic process, which is having one initial possible result occur after an experiment is conducted. The stochastic process may initially start off with one possible result, but might end up with several different, plausible results, some more probable than others, after the experiment is completed.

Biased parties, nonetheless, believe that the original interactions between proteins are still more reliable and stable than newly-tested interactions and techniques. Studies have shown that the sequences of proteins can still be found in pristine condition even if the sequences live in very complex environments within a cell. However, when a protein folds on itself incorrectly or does not maintain to stay folded in the living cells, diseases of different types can occur.

An example of a possible group of diseases is called amyloidosis. Some common diseases that are derived from amyloidosis are Alzheimer's Disease and spongiform encephalophaties. These diseases occur when the protein is aggregated from failure of folding. An interesting fact about amyloidoses is that the formation of the aggregates show similarities to the property of polypeptides and not just a feature of proteins that suffer from poor or inadequate protein folding. It is not normal to find such amyloid aggregates in biological evolution, which begs the question if there are a variety of mechanisms that have been tampered with over time. In order to prevent such diseases from developing and to stop such mechanisms from mutating into insufficient mechanisms, the study of the folding of proteins is crucial to understanding the structure of a protein as well as the function to all living cells.

Issues and Possible Results of New Protein Folding Mechanisms[edit]

Although groundbreaking discoveries have been mass produced in the protein folding community, several issues arise. Tampering with the folding of a protein can alter the initial theory as to why humans should manipulate a natural occurring mechanism. Because of the high volume of magnitude and conformational changes done on a protein sequence, it is more likely that the experiment could lead to the stochastic process in producing several pathways and results. Also, due to a strong presence of heterogeneity at the end of the folding process, the changing of the protein folding sequence can alter desired results. According to Christopher Dobson, a researcher at Oxford Centre for Molecular Sciences in the University of Oxford, "there are two main approaches to try and overcome this issue".

The first approach lies with the use of biophysical techniques that can monitor the properties of the amino acid sequence as the folding takes place. Because the process of folding occurs in a rapid fashion, several outlets of methods are needed to map out the individual properties of the sequence. For example, an ultraviolet circular dichroism can be used to monitor the secondary structure of evolution and fluoresence microscopy can monitor the progress of the tertiary structure.

The second approach is to use protein engineering to study the mechanism of protein folding. Protein engineering is a particularly good method of studying the folding process because it can also map out the transition states of the protein sequence. Examination of the folding and unfolding parts of the mechanism takes place upon mutation of the individual amino acids in the sequence. By studying the intermediate steps of the folding process, the mechanism shows that there is a formation of native-like proteins surrounding a number of important amino acids. This provides evidence that for another mechanism called "nucleation-condensation", where the majority part of the protein sequence rapidly forms once the nucleus of the entire process has been found.


Dobson, Christopher M. Biochem. Soc. Symp. (2001) 68, (1–26) (Printed in Great Britain). Last accessed: 1 Dec. 2011.


A Fibrous protein is a protein with an elongated shape. Fibrous proteins provide structural support for cells and tissues. There are special types of helices present in two fibrous proteins α-keratin and collagen. These proteins form long fibers that serve a structural role in the human body. Fibrous proteins are distinguished from globular proteins by their filamentous, elongated form. Also, fibrous proteins have low solubility in water compared with high solubility in water of globular proteins. Most of them play structural roles in animal cells and tissues, holding things together. Fibrous proteins have amino acid sequences that favour a particular kind of secondary structure which, in turn, confer particular mechanical properties on the proteins.


Collagen is a triple helix formed by three extended proteins that wrap around one another. Many rodlike collagen molecules are cross-linked together in the extracellular space to form collagen fibrils that have the tensile strength of steel. The striping on the collagen fibril is caused by regular repeating arrangement of the collagen molecules within the fibril.

Elastin polypeptide chains are cross-linked together to form rubberlike, elastic fibers. Each elastin molecule uncoils into a more extended conformation when the fiber is stretched and will recoil spontaneously as soon as the stretching force is relaxed.

alpha helix beta pleated sheet triple helix
Hydrogen bonding Peptide -C=O----HN-, Intrachain between, and n+4 residues Parallel to helix axis Peptide -C=O-----HN- , Interchain, Perpendicular to chain axis Peptide, -C-----HN- and -C=O-----HO- (hydroxyl from side chain of Hyp), Interchain
Residues Many types, Small or uncharged residues, such as Ala, Leu, and Phe, most common; Pro never found Mostly Gly, Ala, and Ser Many types, Gly every third residue; Pro and Hyp common
Covalent cross-linking Interchain disulfide cross-link None Interchain lysine-derived cross-links
Chain direction and aggregation Four parallel right-handed alpha helices form a left-handed supercoil. Antiparallel chains Three parallel left-handed helices form a right-handed supercoil.

Unfolded Protein Response (UPR) is a response to cellular stress that is related to the endoplasmic reticulum (ER) in mammalian species, but has also been found in yeast and worms.

When ER conditions are disrupted (such as alterations of redox state, calcium levels, failure to posttranslationally modify secretory proteins, etc.) or the chaperone proteins that assist protein folding is overcapacity (both are considered ER stress), the cell launches signals that try to deal with these changes and make a favorable folding environment. When the UPR is not sufficient to deal with this stress, apoptotic cell death happens.


The ER lumen's environment is made so that it favors the production of secretory and membrane proteins and a good amount of these proteins are rapidly degraded which is probably due to improper protein folding. This would pose a problem for the cell due to a possibility of misfolded protein buildup. This would be even more of a problem if the changes in this environment would occur. These changes will deter the overall ability to make properly folded proteins and more improper proteins will build.

UPR monitors and responds to changes in the ER protein folding environment. It monitors the protein-folding capacity of the ER and sends signals of cell responses to help maintain the folding capacity to prevent a buildup of unwanted protein products. For mammals, this response is the transient inhibition of protein synthesis to hinder the production of new proteins, followed by transcriptional induction of chaperone genes to initiate protein folding and induction of the activation of the ER-assoiciated degradation system. If this process fails, then the UPR tells the cell to go to a destructive pathway. The UPR has three main signaling systems: (IRE1), PERK, and ATF6.

UPR Signaling[edit]

IRE1 Pathway[edit]

IRE1 is a type I transmembraned protein that contained serine/threonine kinase activity as a stress sensor. Once activated, the enodribonuclease activity in the carboxyl terminus of IRE1 catalyzes splicing of the HAC1 (which is responsible for inducing the expression of ER stress response genes) mRNA.

In yeast organisms, the IRE1 contains nuclear localization sequences in the carboxyl terminus, which can interact with components of nuclear pore complex and target IRE1 to the inner nuclear membrane. The result is that the COOH-terminal domain is now facing the inside of the nucleus and can now have access to nuclear mRNA. HAC1 then moves into the nucleus and binds to a promotor element to induce the expression of genes required for various reactions.

In mammals, the IRE1 pathway is like that of yeast, except that two IRE1 genes have been cloned. Alpha and Beta -IRE1. It does not contain nuclear localization sequences like in yeast IRE1. IRE1 has also shown to mediate cleavage of additional mRNAs targeted to the endoplasmic reticulum as well as cleavage of the 28S ribosomal subunit. This leads to the beliefe that IRE1 has a role in translation attenuation by degrading these mRNA transcripts and/or the ribosomal subunits.

PERK Pathway[edit]

When undergoing ER stress, the first response is transient global translation attenuation and this is mediated by PERK. PERK is a type I ER-resident transmembrane protein that detects stress though its lumenal domain. It also binds to chaperone protein Grp78, but when unfolded proteins start to build up during ER stress, this protein Grp78 starts to dissociate and PERK then autophosphorylates and dimerize. Once activated, PERK phosphorylates serine-51 of eukaryotic initiation factor 2α (eIF2α). eIF2α is unable start translation when phophorylated, and this leads to inhibition of global protein synthesis. In reverse, phosphorylated eIF2α initiates translation of ATF4 mRNA. ATF4 upregulates ER stress genes. Translational recovery is mediated by the stress-induced phophatase growth arrest and DNA damage-inducible gene.

ATF6 Pathway[edit]

ATF6 exist in to isoforms (alpha and beta ATF6) . These have fairly balanced tissue distributions. ATF6 pathway activation involves a mechanism called regulated intramembrane proteolysis (RIP). In RIP, the protein translocates from the ER to the Golgi for proteolytic processing. The stress-sensing mechanism of ATF6 dissociates the Grp78 from its lumenal domain (This is similar to the processes of IRE1 and PERK pathways). Frp78 signals to two Golgi localization signals to allow ATF6 to enter the COPII vesicles to translocate the Golgi compartment. Disulfide bonds in ATF6 lumenal domain are also believed to keep ATF6 inactive. During ER stress disulfide bonds are reduced and an increase ability of ATF6 to exit arises.


The three UPR pathways do not only contribute to fixing of improperly folded proteins, it also as can contribute to a cell's apoptosis if the UPR fails to restore folding capacity.


  1. Physiology Online [9]
  2. Nature [10]


Technology advances in sequencing and microarrays allow for us to better understand pre-mRNA splicing patterns in different cells. For example, cellular splicing changes when it is stimulated by factors such as DNA damage, neuron depolarization and or metabolic changes in cells. In the last few years, there have been more studies regarding patterns in mechanisms that relate cellular stimuli to downstream alternative splicing control. Some of these splicing events include degradation of splicing factors, altered nuclear translocation, and regulated synthesis of splicing factors.

What is alternative splicing and how does it work?[edit]

Splicing overview

Alternative splicing is a process that occurs during gene expression and allows for the production of multiple proteins (protein isoforms) from a single gene coding. Alternative splicing can occur due to the different ways in which an exon can be excluded or included from the messenger RNA. It can also occur if portions on an exon are exclude/included or if there is an inclusion of introns. For example, if a pre-mRNA has four exons (A, B, C, and D) these fours exons can be spliced and translated in a number of different combinations. Exons A, B, and C can be translated together or Exons A, C, and D can be translated. This is what results in alternative splicing.

The pattern of splicing and production of alternatively spliced messenger RNA is controlled by the binding of regulatory proteins (trans-acting proteins that contain the genes) to cis-acting sites that are found on the pre RNA. Some of these regulatory proteins include splicing activators (proteins that promote certain splicing sites) and splicing repressors (proteins that reduce the use of certain sites). Some common splicing repressors include: heterogeneous nuclear ribonucleoprotein (hnRNP) and polypyrimidine tract binding protein (PTB). Proteins that are translated from alternatively spliced messenger RNAs differ in the sequence of their amino acids and this results in altered function of the protein. This is the reason why the human genome can encode a wide diversity of proteins. Alternative splicing is a common process that occurs in eukaryotes; most of the multi-exonic genes in humans are spliced alternatively. Unfortunately, abnormal variations in splicing are also the reason why there are many genetic diseases and disorders.

A complex


The splicing of messenger RNA is accomplished and catalyzed by a macro-molecule complex known as the spliceosome. The areas for ligation and cleavage are determined by the many sub-units of the spliceosome. These sub-units include the branch site (A) and the 5' and 3' splice sites. Interactions between these sub-units and the small nuclear ribonucleoproteins (snRNP) found in the spliceosome create a spliceosome A complex which helps determine which introns to leave out and which exons to keep and bind together. Once the introns are cleaved and removed, the exons are joined together by a phosphodiester bond.

Regulatory Proteins[edit]

As noted above, splicing is regulated by repressor proteins and activator proteins, which are are also known as trans-acting proteins. Equally as important are the silencers and enhancers that are found on the messngerRNAs, also known as cis-acting sites. These regulatory functions work together in order to create splicing code that determines alternative splicing. The cis-acting sites will be discussed here.

Splicing repression

Splicing silencers are regulatory sites that are found in pre-messengerRNA's and are where the splicing repressor proteins bind to. When the repressor binds to the silencer site, it reduces the chance that a site close-by will be chosen as a splicing junction. These silencer sites can be found on introns or on exons. When found on introns, these sites are known as intronic splicing silencers and on exons they are called exonic splicing silencers. The sequences found on these sites are numerous and that allows for different kinds of proteins to bind.

Splicing activation

On the other hand, splicing enhancers are regulatory sites where splicing activator proteins can bind to. When the activator protein binds to the enhancer site, it increases the chance that a site close-by will be chosen as a splicing junction. Just like the splicing silencers, these sites can also be found in introns and exons. In introns they are called intronic splicing enhancers and in exons they are called exonic splicing enhancers. However, unlike their silencer counterparts, enhancer sites usually allow the binding of activator proteins that belong to the family of SR proteins. These proteins are rich in arginine and serine.

How is alternative splicing regulated by some specific signals? Alternative splicing has been recently revealed to occur in nearly all human genes. Most typically, a specific exon may be either included or excluded in different cell types or growth conditions when alternative splicing occurs. In each case, the pattern of splicing, the binding of regulatory proteins to cis-acting auxiliary sequences generally determines the pattern of splicing and these sequences in turn control where the binding occurs and/or how the enzymatic complex reacts at neighboring splice sites. (Combinatorial Regulation of Alternative Splicing) Importantly, the open reading frame of the resultant mRNA or the presence of cis-regulatory elements that control mRNA stability or translation can be altered by any of these above differential patterns. Therefore, shaping the proteome of any given cell requires the precise control of alternative splicing , and how the cellular function responses to changing environmental conditions can also be significantly altered by changes in splicing patterns.

Representation of intron and exons within a simple gene containing a single intron.

Combinatorial Regulation of Alternative Splicing The spliceosome is a macromolecular complex that catalyzes the removal of introns and the basic joining of extrons. The binding of various subunits of the spliceosome in order to sequence elements at the intron and extron boundaries in a pre-mRNA determines the precise sites of ligation and cleavage. Those subunits are the 5 splice site, the branch point sequence, a pyrimidine-rich track, and the 3 splice site. However, for mammals, the splice sites are poorly conserved; hence, they are typically not sufficient to bind the spliceosome with high affinity. The efficiency of spliceosomal binding via mechanisms can be impacted by proteins bound to non-splice site sequences within the exon or intron. Exonic or intronic splicing enhancers are the sequences that help promote spliceosomal recognition of an exon, while the splicing silencers are needed to inhibit recognition of the exon. Exon inclusion (green ovals) is promoted by the binding of the enhancers of members of the ubiquitously expressed SRSF protein family, while the exon usage is repressed by members of the hnRNP family of proteins via silencer elements (red ovals). FOX, CELF, neuro-oncological ventral antigen (NOVA) and muscleblind-like (MBNL) proteins are some other splicing regulators that are more tissue restricted and these regulators function equally as enhancers and repressors of splicing through mechanisms that are still largely undefined. Therefore, the ratio of mRNA isoform expression can frequently be altered by the binding of single regulatory proteins or the subtle changes in the balance of expression.

Post-Translational Modification of Splicing Proteins

Phosphorylation, acetylation, methylation, sumolylation and hydroxylation are involved in the modification of splicing regulatory proteins in many cases. The phosphorylation of the extensive Arg-Ser dipeptides found within SR proteins is the best characterized modification. The extensive post-translational modifications also includes the HnRNP proteins, along with other non-SR splicing factors.

Alternative Splicing and its Signals[edit]

An example of regulated degradation of a RNA-binding protein modulating alternative splicing.

Recently, technical tools such as deep sequencing and sensitive microarrays have opened up for more knowledge of alternative splicing events. Almost all human genes go through some sort of alternative splicing, which includes differential exclusion or inclusion of a specific exon, exclusion of a part of an exon, and inclusion of introns and exons. These differential trends can change the reading frame of the processed mRNA or alter any cis-regulatory factors that monitor mRNA translation or stability. For that reason, the regulation of alternative splicing is crucial in shaping the proteome of cells; alterations in splicing patterns can change functions in cells in response to environmental changes. Observations in heart tissue in its development stage, pre and post depolarization of neurons and cells before and after apoptosis have showed that alternative splicing events play a large role in the functional outcome of the signaling and developmental processes.

Since alternative splicing is generally determined by binding regulatory proteins to auxiliary sequences that control the location of binding and activity of the enzymatic complex at neighboring sites of splicing, it is used in response to DNA damage and T cell activation. One case for DNA damage includes the alternative splicing of the E3 ubiquitin ligase murine double minute-2 (MDM2). MDM2 specifically controls levels of p53, a tumor-suppressing gene, by targeting it for proteasomal degradation. Once DNA damage is perceived, Mdm2 exons are skipped to reduce the functioning of MDM2, thus allowing p53 to accumulate. This induced regulation of MDM2 provides an example of how splicing that is coupled with transcription as the exon skipping mimics the damaged DNA. In this case, cells show a "tight control of alternative splicing that helps regulate protein expression due to changing conditions in the cell."[3]

Altering the interactions of proteins is another method in which alternative splicing can be achieved. One demonstration of this is T cell activation. In T cell activation, alternative splicing is used similarly in DNA damage where the altered protein interaction with other proteins regulate the splicing of, specifically, the CD45 gene during T cell activation. In resting T cells, PSF, a RNA binding protein, is phosphorylated by the enzyme GSK3 and this causes the phosphorylated PSF to form a complex with TRAP150. As a result, the PSF cannot bind to the CD45 RNA. This prevents any possible exon exclusion and results in no participation in splicing. However, in an activated T cell, there is little to none GSK3 due to an inhibiting phosphorylation because an antigen binds to the T cell receptor and causes GSK3 activity to drop. Without the GSK3, PSF is not bound to the TRAP150 and is free to bind to the RNA. This is a major example of how splicing is controlled by signal-induced changes in transcription.

RNA-binding Proteins Regulate Splicing[edit]

Altering the level of expression of a regulatory protein is the most simple way that can affect alternative splicing. A small change in the expression of one splicing factor can change the elements that determine exon exclusion or inclusion, due to the complex influences on a given transcript. The control of transcriptional activators such as nuclear factor-kappa B and nuclear factor of T-cells have been proven to be altered by signaling pathways. Therefore, signaling induce transcription of genes encoding SR proteins or other splicing regulators that can change the splicing of genes that respond to these factors. In one instance, it is proposed that stimulation of T cells trigger the splicing signal of the gene that encodes tyrosine phosphatase CD45. Furthermore, the proteins PTB-associated splicing factor and hnRNP L-like activate the elimination of CD45 exons 4 and 6. Interestingly enough, inducible changes in protein expression do not only result from transcription. As shown in the splicing regulatory protein CELF1, its increased protein levels is due to an increase in the stability and phosphorylation of CELF1, which then leads to the overall up-regulated steady-state levels. This increase in phosphorylation is also responsible for the protein kinase C activity in DM cells. Not surprisingly, the increase in protein stability also has other regulations; it is also controlled by miRNAs during heart development. The two coupled- mechanisms highlight the idea that regulating regulatory protein expression is important to keeping a proper splicing pattern required for functions in cells. [3]

Localization of RNA-binding proteins[edit]

In addition to the method of protein expression and stability mentioned above, alternative splicing can occur when signals are changed due to the localization of regulatory proteins. Many of the regulatory proteins, such as SR proteins and hnRNP mentioned above, have to travel to and from the nucleus and cytoplasm. As a result, the relative distribution of these regulatory proteins in the nucleus versus the cytoplasm can alter signaling pathways. These altered pathways will lead to splicing differences. Two regulatory proteins that have their distributions regulated include SRPK1 and hnRNP proteins (hnRNP A1 specifically). In the case of SRPK1, this regulatory protein is normally found in the cytoplasm due to interactions with heat shock proteins. However, when the cell undergoes osmotic shock the SRPK1 proteins move to the nucleus and cause phosphorylation of SR proteins. This phosphorylation results in different interactions between the proteins and their target genes and produces varying splicing patterns. In the case of hnRNP, osmotic shock actually has an opposite effect on the localization of this protein in relation to SRPK1. hnRNP is also normally found in the cytoplasm but as opposed to SRPK1, osmotic shock does not cause it to move to the nucleus. In fact, phosphorylation of hnRNP prevents it from entering the nucleus.

Feedback Loops in Alternative Splicing[edit]

An example of a feedback loop in alternative splicing.

As all living things go through homeostasis, cells do the same. In order for cells to practice homeostasis, they must therefore turn off induced splicing signals once conditions are normal again. For example, these regulations can include getting rid of antigen, DNA repair and neurons repolarization. One way to reset gene expression is to deactivate signals by removing the initial receptors or signaling factors themselves. Of course, receptors such as phosphatases and kinases undergo autoinhibitory signal-induced alternative splicing. For instance, in response to T cell activation, alternative splicing of CD45 will reduce the sensitivity of the cell to receive antigen stimulation signals. In another example, molecules that encode kinases responsible for T-cell signaling activation such as the FYN proto-oncogene, signal-regulated kinase-1, and tyrosine kinase 2 beta protein, all go through alternative splicing due to T cell activation to lessen expression or change localization patterns.

Inducing expression of an opposing regulatory factor can help in resetting the induced splicing signals. Neuron chronic depolarization is an example of this, which results in increased skipping of exons controlled by CaRREs. Some of these CaRRE-reduced exons appear again in prolonged depolarization. This splicing pattern is related to CaMK-induced alternative splicing of FOX1 that encodes RNA-binding proteins. FOX1 regulates the splicing patterns of genes involved with synaptic activity. In addition, many genes controlled by CaRREs also have a FOX1 binding site which can have an antagonistic effect on exon inclusion like that of the CaRRE sequence. Since most studies only regulate a few genes, many further studies are needed to have a fuller grasp of alternative splicing that occurs in the downstream of a given pathway. [3]

What is next for protein splicing?[edit]

Despite the stimulated factors mentioned above, the overall picture of how signaling pathways regulate alternative splicing is far from being complete. The study of these signaling pathways is still very much in progress. The methods introduced here usually correspond to the alternative splicing of only a few genes. As a result, more progress needs to be made in order to understand the alternative splicing of an entire pathway.


1. Black, Douglas L. (2003). "Mechanisms of alternative pre-messenger RNA splicing". Annual Reviews of Biochemistry 72 (1): 291–336.

2. Clark, David (2005). Molecular biology. Amsterdam: Elsevier Academic Press.

3. Heyd, Florian, and Kristen W. Lynch. DEGRADE, MOVE, REGROUP: signaling control of splicing proteins Philadelphia: Trends in Biochemical Sciences, 2011. Print.

4. Matlin, AJ; Clark F, Smith, CWJ (May 2005). "Understanding alternative splicing: towards a cellular code". Nature Reviews 6 (5): 386–398.

5. Nilsen, T.W. and Graveley, B.R. (2010) Expansion of the Eukaryotic Proteome by Alternative Splicing. Nature 463, 457-463.

6. Pan, Q; Shai O, Lee LJ, Frey BJ, Blencowe BJ (Dec 2008). "Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing". Nature Genetics 40 (12): 1413–1415.

7. Images: Wiki-Media Commons

8. Heyd F, Lynch KW. Trends Biochem Sci. 2011 Aug;36(8):397-404. Epub 2011 May 17. Review. 2. PMID: 21596569 [PubMed - indexed for MEDLINE]

9. Barash, Y; et al (2010). "Deciphering the splicing code". Nature 465 (7294): 53–59.

10. Wang, Z; Burge, Cb (2008). "Splicing regulation: from a parts list of regulatory elements to an integrated splicing code"


To understand structure-function relationships, it is crucial to study the individual amino acid residues and each of their molecular interactions in protein structures. Experiments and work have been conducted, observing that residue networks created by a 3D protein structure provides more insight into structural and functional roles of interacting residues. There are software tools called the RINerator and RINalyzer to see the 2D visualization.

Protein structure visualization and residue networks[edit]

Viewing a 3D protein structure has been accessible by using X-ray crystallography and NMR spectroscopy. Although 2D visualization is very important in terms of observing structures of proteins, 2D representations of RINs have started to become popular.
RINs simplify the visual complexity of 3D protein structures and allows the scientist to focus on individual residues and their interactions within the molecular level. RIN is derived from 3D coordinates of a protein model. Each RIN is composed of nodes, representing amino acid residues. RINs can study residue interactions in many application scenarios, like, with regard to protein dynamics.
Recently, RINs have been applied to study protein-ligand interactions and to observe the structural and functional effects of residue changes under drug use or disease.

Visual analysis of RINs[edit]

The RINalyzer ( is a software tool that provides versatile structural analysis tools for RINs and one can observe the structure in either 2D or 3D. Residue nodes of interest are automatically highlighted in the RINalyzer.
Cytoscape plugin structureViz ( analyzes and supports the structural analysis of protein-to-protein interactions.

Network approaches to protein structure analysis[edit]

One software feature is the ability to perform analysis of residue interactions by comparing the residues with one another by loooking at the similarities and differences between two proteins. One can also observe the binding site similarities.

Generation of RINs[edit]

The RINerator module generates RINs from a 3D protein structure. This provides a more realistic visual by sampling contacts on the Van der Waals surface on each atom. By doing this, different residue interaction types can be observed and the strength of the interactions can be determined as well.


Doncheva, Nadezhda T, et al. "Analyzing and visualizing residue networks of protein structures" Trends in Biochemical Sciences 36.4 (2011) 179-182. Academic Search Complete. Web. 05 December. 2012.


Protein binding sites are the region where proteins interact with each other. This region usually contains the specific part of the three-dimensional of the protein. If we can identify their biding sites, we can proceed to study their function and the protein-protein docking by docking algorithms.

Protein Data Bank (or PDB) functions as storage of protein complex structures. Biochemists always try to obtain the structure of specific proteins, but under experiment condition, protein structures are really hard to be obtained under the condition when it needs for crystallization. Because of the disadvantages of constructing the experiment, biochemist leads to the development of protein-protein docking.

Binding-site prediction and protein-protein docking[edit]

Protein-protein docking is a computational approaches to predict the three-dimensional structure of complex proteins. The success of this technique depends mostly on pre-knowledge of the protein-protein binding sites. In order to predict the structure, the computational approach must focus difference in binding sites between the interfaces of a set of proteins. Most of the time, there are some proteins interfacing at the same regions which then become a hotspots, whereas others might change.

With the requirement in the precision of the binding sites, biochemists developed the algorithm- which is used for predicting the protein binding sites by preserving the protein surface structure and the properties of the fundamental protein structures. We have to insert this algorithm to ProBiS which is a host to detect protein binding sites. The idea behind the algorithm is that most of the conversed parts of protein surface are somehow in accompanying with other proteins or ligands. In order to obtain the conserved part of protein surface, we have to find out the similar local surface between the concerned protein and other proteins.

To conduct the example of this method, we choose the two unbound interacting proteins: FKBP12 (immunophilin) and TBR-1 (a growth factor) with PDB codes of 1d6o and 1ias. Some of the proteins that seem to share the same similarities in structure with FKBP12 are: 1ix5, 1jvw, 1pbk, 1q6h, 1r9h, 1u79, 2awg, 2d9f, 2if4, 2ofn, 2pbc, 2uz5, and 3b7x; with TBR-1 are : 1ckj, 1kob, 1m17, 1o6k, 1o9u, 1u59, 1wak, 1yhv, 1yvj, 2b7a, 2bfy, 2csn, 2f4j, 2ivt, 2izs, 2j0l, 2jbo, 2pzy, 2qkw, 2qlu, 2qr7, 2v7o, 3bkb.

ProBis is now used to predict the binding sites. The fundamental protein has to interact with the polypeptide chain. Our goal is to find out the similar surfaces of these proteins, so we want to minimize the dissimilarities as much as we can. As you can see on the picture on the right, all the conversed regions are mapped over the other ones.

AutoDock 4.0 is then used for docking of protein FKBP12 to the protein TBR-1.This program requires computational interference since it workswith the whole protein structures, so it needs a precise image. The AutoDock uses a force field to give a stronger attraction between the atoms on predicted binding site. The success of docking depends on the comparison between the regions of predicted binding site residues with the corresponding ones. This force field affects the docking. As you can see in this chart, five time larger force field has the highest number of best docked structure.

This 5x force field has 9 different structures between the predicted and the actual binding site residues. The most preferable clustering also belongs to this one since it has the most best docked structure. This theory somehow states that the docking algorithm can be able to explain the structure of the complex protein.


Scientific Paper. Binding-sites Prediction Assisting Protein-protein Docking. Acta Chim. Slov. 2011, 58, 396–401

Proposed New Protein Structure Classification[edit]

Three scientist in the field of structural biochemistry from the University of California San Diego(Ruben E. Valas, Song Yang, Philip E. Bourne), have proposed a new method of protein classification. This idea comes as a consequence of the great breadth of macromolecular structures having been solved and the many, yet, to not have been illuminated. This poses a grave problem of assimilation of the large amounts of structural information available. Secondly, it seems that the present manner of classification seems insufficient to unveil the great network of structural lineages that evolution has paved and therefore, their strategy is to employ a reductionist approach to better interpret the evolutionary basis of protein structure and the lineage amongst the diverse populations of such structures.

Two methods of protein classification are readily used today:

Bottom-up Approach[edit]

The bottom-up approach uses algorithms to in an attempt to compare proteins based on geometry, the ability to superimpose using a root means-square deviation(RMSD), length of alignment, number of gaps, and a score of statistical significance. The end result is a proteins domain comparison which renders very little biological significance.

Because of the diversity of methods available, there is usually more than one result for each sequence of amino acids analyzed. One drawback to the bottom-up approach is that, since sequences of amino acids in their primary state do not reveal much about the biological function of the protein, it is impossible to decide which one of the results is the most biologically important one. The benefit to the bottom-up approach is that it is a useful bit of reductionism that does give a representative comparison of different protein domains, which can prove useful.

Top-down Approach[edit]

Top-down approaches are considered today's gold standards as exemplified by CATH and SCOP. These methods primarily utilize homogous sequence comparisons to reflect a relationship among different protein domains and as a result a biological context. The authors agree that this technique can be taken one step further based on the premise that structural classification is developed as a consequence the evolutionary links among species. Furthermore, the authors propose to incorporate issues of gene duplication, convergence versus divergence, and co-evolution in a functional context as ideas that should be used in the future for protein classification.

The protein domain: a good unit of structural classification?[edit]

Both the bottom-up and top-down approaches rely on protein domains as the units of comparison. Domains are complicated units. Some domains have similar sequences and are evolutionarily related, some domains are vaguely related, with similar structures but different sequences, and some domains are similar topologies, but not enough to establish an evolutionary connection. The basic problem is that a domain can be a evolutionary or non-evolutionary unit. Many proteins are multi-domain proteins, which further increases the complexity.

The presence of folds, which are considered discreet components in most top-down classifications, further complicate matters. Folds are not a direct result of evolution, but they do provide insight into evolutionary practices. Folds sometimes change during evolution; it is possible for an alpha fold to change into a beta fold through a secondary structural change. It is also possible to create two peptides with similar sequences but different folds, leading to completely different functions. There are also chameleon sequences that can take on multiple different folds. Because of the diversity of structural variation in regard to folds, folds are not suitable units of classification. In essence, whether or not two proteins are in the same fold is really semantics, whereas determining which one led to the other evolutionarily actually gives insight into the relationship between proteins. The reason it has not been widely used is simply that it is more difficult than clustering similar structures.

Examples of Evolutionary Selection[edit]

Valas et al. present the prevalence of evolutionary selection by give two examples that highlight this phenomena. The first, Basu et al. found in the genomic analysis of 28 different eukaryotic cells, that there were 215 strongly promiscuous domains. Basu et al. define strongly promiscuous as those domains that occur in diverse domain architectures, where these architectures are represented as a linear combination of these domains. "Domain architectures arise through domain shuffling, domain duplication, and domain insertion and deletion leading to new functions." The degree of dmain promiscuity depends on the frequency of being with different domain partners. The second example is by Vogel et al. which found over-representation of 2-domain or 3-domain combinations which were coined, "supradomains" or macrodomains. These are structure that throughout proteins evolution have proven to have stable internal domains. Over 1400 hundred of these macrodomains have been found which show a natural associativity which seems to be evolutionarily advantageous.

Pluralistic Approach to Protein Classification[edit]

The protein domain has been the only manner of evaluating the of evolution protein structure. Although the evolutionary analysis of the protein domain alone has proven successful at evaluating protein structure, it seems that there needs to be other factors contributing the unknown pieces of the evolutionary network. Therefore, the authors propose using a pluralistic approach to protein structure classification which includes incorporating not just domains, but subdomains, macrodomains, and both convergent and divergent evolution. In regards to subdomains, the authors mention areas of subdomains that could be important components to connecting the evolutionary network of proteins.

There are many tools that can be used to compare proteins at the subdomain level. One database called Fragnostic facilitates analysis based on fragments from different proteins that share structural and/or sequence similarity. The edges of the fragments are ambiguous; that is, they are not defined as divergent or convergent evolution, but combined with other information the fragments can be tested for structural evolution.

Closed loops are another subdomain unit. Most protein structure consist of loops spanning 25-30 residues. Domain Hierarchy and closed Loops (DHcL) uses van der Waals energies to elucidate domains and closed loops from protein structures. Researches have discovered that fragments that correlate to closed loops were more likely to form large clusters, which have connections to one another. This description might represent a more detailed view of protein function. Similar closed loops in different structures can be evidence that those structures once shared a common ancestor.

Another subdomain unit is the functional site. Many different proteins can bind to the same ligand, which implies that perhaps they share a common ancestor that bound to the ligand in question. The proteins diverged in structure during evolution, but the functional site remained. SMAP can find such functional site that have both sequence and structural conservation, a perfect example of divergent evolution. On the other hand, different proteins can converge on the same ligand. The PROCOGNATE database uses information from the PDB to put together which proteins bind to which ligand. A combination of these methods could incorporate both divergent and convergent evolution.

Besides subdomains, macrodomains can also be used to aid in classification. Divergent evolution is evident in some protein–protein interaction sites (a macrodomain feature). In those cases, while the proteins differentiate over time, the domain interface stays the same. Many of the protein–protein interfaces in the PDB contain very similar interfaces in vastly different proteins.

In essence, a domain-based scheme would not be as efficient, as it would only be able to determine that the proteins evolved from a common ancestor, while an examination that includes analysis of both subdomains and macrodomains would provide an evolutionary hypothesis. One problem posing the pluralistic approach to protein classification would be convergent evolution. The fact that two proteins with completely different evolutionary lineages can come together to have very similar structures can pose a great problem for connecting the protein evolutionary network.

The authors argue that to obtain the last universal common ancestor(LUCA) of the protein, it is necessary to look at more than the amino acid sequence as has been done but incorporate other structural aspect to be able to mesh the evolutionary puzzle. Protein studies involves various step of sample preparation. The main protocols of protein studies are as followed:

  1. Protein Synthesis
  2. Purification
  3. Evaluation of purified protein
  4. Determination of Amino Acid Sequence
  5. Calculation of Protein's Mass
  6. Determination of Protein's 3-D Structure

There are many methods used to study proteins, including its shape and structure. For instance, X-Ray crystallography is used to give scientists the structure of the protein. Such information is used extensively in determining the characteristics of the protein as well as how it functions and under which circumstances. Other methods used includes amino acid sequencing, fluorescence microscopy, mass spectrometry, NMR, etc.

Carbohydrate-Binding Proteins[edit]

Carbohydrate-Binding proteins (CBPs) are identified as important mediators for numerous different types of cellular events through interactions between carbohydrates and proteins. There are three main families of CBPs.

  1. the C-type lectin family (including the Selectins)
  2. the Siglec family
  3. the galectin family

C-type Lectins (including Selectins)[edit]

C-type Lectins and selectins are present in humans and murines (household rats and mice). Roles of these specific CBPs include

  • promoting primary immune response
  • mediating leukocyte trafficking to sites of inflammation
  • mediating lymphocyte recirculation
  • mediating platelet binding to neutrophils

Clinical Use of Lectins[edit]

Pure forms of lectins are used for blood typing. Specifically, lectins are used to identify some glycolipids and glycoproteins on an individual's red blood cells. In the brain, PHA-L, a lectin from a kidney bean, helps to trace the path of efferent axons through the anterograde labeling method.



Siglecs occur mostly in humans, but some are also found in murines. Some of the primary roles of these CBPs are:

  • regulator in B cell activation
  • maintenance of myelin
  • inhibitor of axonal growth


Galectins are also found in humans, mice, and rats as well. These CBPs are abundant in most organs such as muscles, hearts, lung, liver, lymph nodes, thymus, and colon, stomach epithelial cells, gastrointestimal, erythroctyes, skin, brain, Hodgkin's lymphoma, kidney, and lens. Roles include:

  • acting as a marker for cell recognition
  • binding specificity



Two method of counting protein molecules have been used widely: stepwise photobleaching and ratio comparison to fluorescent standards.

Fluorescence takes place when light is given off from the fluorophore after light is absorbed, and GFP is able to fluorescence without enzymatic modification or a cofactor, which allows a single gene to be expressed in detectable emission in any organism. Counting the number of protein molecules in live cells allows researchers to determine the stoichiometry of functional protein complexes and to seek models of cellular structures. Since genome-wide studies may not recognize information about low-abundance proteins or local protein concentrations, single-molecule techniques, if successful, would be able to solve this problem.

Stepwise photobleaching[edit]

Stepwise photobleaching is one of the fluorescence microscopy method for counting protein molecules, which “relies on the irreversible and stochastic loss of fluorescence from repeated exposure of fluorescent proteins (FPs) to a light source.” The sample would be continuously exposed to excitation light at low intensity to allow the sample to be “slowly bleached until its emission intensity reaches background level.” The number of florescent molecules present in the structure determines the suitable light intensity and exposure time. The missed bleaching events need to be minimized because it would show a step approximately twice the size of other steps. The bleaching method is only useful for low protein numbers as the probability of missed events increases exponentially with the number of molecules in a structure. “Das et al. estimated that a maximum of 15 bleaching steps can be directly detected without mathematical extrapolation, although they detected no more than seven steps in their experiments.” The maximum number of molecules that can be counted by photobleaching can be increased to approximately 30 molecules using mathematical aids. A background correction is needed to eliminate fluorescence from diffused proteins and calibrate the starting intensity. During photobleaching, regions of interest (ROIs) should be selected to avoid confusing multiple structures. It is also essential to filter the data to reveal the discrete drops as the raw data are noisy. For example, Chung-Kennedy filter is the most commonly used filter for quantification of the bacterial replisome. “It calculates the mean and standard deviation in two consecutive sets within the data from one photobleaching ROI, and reports the mean of the set with the lower standard deviation.” The number of averaged data points in the data set should be big enough to reduce the noise but small enough to make sure that few steps are missed.

Quantification by ratio comparison to fluorescent standards[edit]

This method involves the measurement of ratio of the fluorescence intensities of a protein sample to a standard. It uses a serious of images of cells that express either the protein sample or the standard, with had obtained fluorescent properties by fusion with an FP. “If the standard can be distinguished from the protein of interest, it is desirable to include cells that express the standard and experimental fusion proteins on the same slide to ensure comparable illumination. If the standard is not distinguishable, images can be taken consecutively or another marker can be imaged separately to distinguish the control cells distant to eliminate Forster resonance energy transfer.” This method is advantageous in the way that a relatively larger number of protein molecules can be counted. Corrections need to be made to achieve more accurate measurements. For example, the uneven illumination in the microscope system needs to be corrected if the whole field is used. Also, if sample molecules are at different depths relative to the coverslip, calibrations on the effect of depth on intensity should be done using fluorescent beads. Different exposure times can be used to control the signal to noise ration and avoid saturation. Excitation intensity should be kept constant to avoid nonlinear changes to photon counts due to blinking molecules. “The background should be taken from a concentric area unles there are overlapping neighbouring signals or an inhomogenous cytoplasmic intensity.” It is important to use a trustworthy standard for this method. When proteins of different sizes or same structure proteins with very different intensities are compared, the sum of intensity of multiple z sections should be used. Additional verification using methods such as genomic DNA sequencing should be used to ensure accuracy of number measured. The number of molecules of each protein and their relative stoichiometries can be obtained using the ratio method at one or many time points.

Important considerations in counting proteins[edit]

Genetically encoded FPs should be used in order to generate a 1:1 stoichiometry with the protein sample, which may affect the maturation efficiency or proportion of unfolded FPs.

Properties of FPs[edit]

The best available FPs should be used by researchers to maximize the signal-to-noise ratio, especially for less abundant proteins. The folding and maturation efficiency, brightness, and photostability of the FPs that are going to be used in the fusions should be taken into consideration before constructing fusion proteins. Research conduced in the budding yeast Saccharomyces cerevisiae and the folded YFP in E. coli suggest that YFP maturation and folding efficiency are not major issues for counting proteins, in particular for proteins with low turnover rates.

Functionality of fluorescent fusion proteins[edit]

It is advantages to use yeast, because fluorescent fusion protein can replace native protein using homologous recombination, which allows the functionality of the fusion protein to be determined. The functionalities for some proteins could be improved by using a flexible linker between the FP and protein sample. The fact that endogenous genes cannot be replaced with tagged version, alternative methods of protein counting need to be used. The local actin abundance in actin patches can be quantified by making corrections after immunoblotting. However, this method is only possible given the assumption that tagged and untagged actins are utilized with similar efficiency in actin patches. Engel et al uses stepwise photobleaching method in a mutant background to count exogenous tagged proteins in exogenous tagged proteins in green algae Chlamydomonas reinhardtii flagella. Since the endogenous genes do not localize, the ratio of tagged and untagged protein assumptions do not need to hold. The recent development of ‘genome editing’ techniques has allowed endogenous genes to be tagged in any model organism in which “the zinc finger nuclease or transcription activator-like effector nuclease genes can be introduced.”

In vivo versus in vitro standards and quenching[edit]

The environment in which the number of proteins is measured is important. Early studies employed in vitro standards, where the effect of background on fluorescence intensity is unknown. This meant that immunoblotting or internal standards were needed to calibrate the fluorescence intensity inside the cell. Experiments were done recently to suggest that in vitro YFP/GFP is comparable to YFP/GFP in bacteria or yeast. Also, fluorescence quenching could take place if FPs were packed into very tight structures. The effects of quenching on counting proteins should be examined individually depending on the specific structures of interest. Fluorescence lifetime imaging with the aid of specialized equipment and analysis can be used to measure quenching due to environmental changes.

Validation of protein quantification by complementary approaches[edit]

Cellular concentrations should be authorized by a cell sorting device called the flow cytometry or fluorescence correlation spectroscopy fro a higher resolution. It is also important to ensure that protein concentrations from fluorescence microscopy are consistent with quantitative immunoblotting. In any protein counting experiment, suitable fluorescent protein genes, suitable standards and controls for environmental changes or the possibility of quenching will ensure appropriate interpretations of the data, which can then be confirmed with complementary experiments.

Future of counting proteins using fluorescence microscopy[edit]

Super-resolution microscopy techniques can produce high-resolution images of intracellular structures, which pinpoint exact locations of individual fluorescent molecules. For such techniques, it is most important to simplify the analysis of high-density images of FPs and minimize errors due to blinking or photobleach failure. Single-molecule techniques are now more commonly used due to the inability to observe stochastic events in average population behaviours. The advantage of using such techniques is that molecules can be counted directly without using collective images, or even determine different protein complexes that are within a diffraction limited area. Super-resolution imagining could lead to the quantification of higher numbers of proteins.


Counting proteins molecules in a cell is essential in determining structural models and protein function. In vitro, protein numbers help determine the reaction rate and also give more of an understanding to multiproteins. The two methods introduced are stepwise photobleaching and ratio comparison to a given standard. This maybe used in any laboratory with a fluorescence microscope to isolate a particular protein. Of course, there are many advantages and disadvantages in every method including this one. The properties of FPs is of high significance to both methods. There are other methods that will help validate the quantity of proteins such as electron microscopy. Fluorescence microscopy has help determine exact numbers of proteins and also their binding ranges.

Source: Coffman VC, Wu JQ. Trends Biochem Sci. 2012 Sep 1. Is the term referring to "protein homeostasis" where a system of biological pathways leads to proper protein function. The system is called a proteostasis network, which will be responsible for successful protein transport, proper folding of proteins, and elimination of misfolded proteins. The factors responsible for in improper protein function are genetic diseases and environmental stress. More knowledge of the proteostasis network is still in need for development but researchers have studied some of the pathways to create pharmaceutical agents and provide therapy for such protein abnormalities. The pharmaceutical agents used to modify the network pathways are called protein regulators which affect a pathway in a specific manner. For example, the antibiotic geldanamycin is known to act as an inhibitor for the chaperone protein HSP90. The HSP90 chaperone is involved in network pathways for protein folding, the success of HSP90 in assisting protein folding results in cell proliferation. Cancer cells are more sensitive to HSP90 inhibitors, consequently, by using geldanamycin as a protein regulator to inhibit HSP90 function will lead to cancer cell death. More research on the effects of HSP90 inhibitors is still done to propose a therapeutic treatment for cancer. Although the number of pathways involved in protein regulation is great, detailed study of these pathways will result in a successful treatment to ensure proteostasis.

Some diseases that can be caused by protein homeostasis are Parkinson’s, Alzheimer’s and cystic fibrosis. These diseases can occur as the results of the proteostasis network’s decreased ability to cope with misfolding prone proteins, aging, or environmental stress.

The protein homeostasis network and its networks are also controlled by integrated signaling pathways. These signaling pathways have the ability to maximize the capacity of the network in order to ensure consistent and correct protein function. Some examples of signal pathways include those that regulate protein synthesis, aggregation, as well as the degradative pathways of proteostasis.

Managing Proteostasis[edit]

For the proteostasis network to function correctly and in a stable condition, there are many interactions that help monitor and facilitate the process of successful protein folding.

1. The proteostasis network is made up of ribosomes, chaperons, aggregases, and disaggregases that control protein folding. There are also special pathways like the ubiquitin-proteasome system, endoplasmic reticulum-associated degradation systems, proteases, autophagic pathways, etc. that deal with the degradation of proteins.

2. There are the signaling pathways like mitochondria, aging, heat shock response, and unfolding protein response that affect the process of protein folding within the proteostasis network. This is perhaps the most direct influence that can alter the folding and stability of the proteins.

3. Outside influences include metabolities, physiological stress, genetics, and epigenetics that affect the overall activity of the proteostasis network. These influences can also alter the process of protein folding but some, like metabolites and physiological stress, can be prevented by the use of pharmacological chaperones and proteostasis regulators.

Within the cell the surroundings are compacted with many compartments and the lack of space triggers aggregation. Aggregation is related to the levels of toxicity and has to be balanced most importantly when the cell deals with stresses that are chemical, physical and metabolically related.

The overall energy of a protein is impacted by the folding aspect of the proteostasis network. The energy level of a protein achieves a good distribution by utilizing folding enzymes and chaperones to decrease the aggregation and improve folding. Chaperones and enzymes that help fold attach to the intermediate molecules and transition state.

The state and functionality of the proteostasis network directly influences the protein’s functional performance and proteins usually acquire intracellular help for protein folding.

Pharmacologic Chaperones and Proteostasis Regulators[edit]

The proteostasis, as the “protein homeostasis”, must maintain a stable level of activity in order to function correctly within a cell. The proteostasis boundary refers to the folding energies that the protein must have in order to achieve some level of functionality in a given proteostasis network. This proteostasis boundary can be regulated by both pharmacologic chaperones and proteostasis regulators. By regulation, the proteostasis boundary can be expanded to envelop destabilized protein (known as the node) by proteostasis regulators or pharmacologic chaperones can move the node from outside of the proteostasis boundary to the inside in order to stabilize the node. If the proteostasis boundary is not regulated, there will be loss-of-function misfolding diseases, which could create potential life-threatening diseases.

The pharmacologic chaperones (otherwise known as the PCs) perform its regulation by binding to the outside destabilized node in order to stabilize it. After binding to the node, the PCs can move the now stabilized protein inside the proteostasis boundary, which then increases the function within the proteostasis, maintaining a stable level of activity. This stability then translates to less misfolding diseases. The PCs can correct a misfolding disease in three ways:

1. The destabilized node can be thermodynamically stabilized

2. The folding rate of the node can be increased in order to stabilize the transition state of folding

3. Decrease the misfolding rate by stabilizing the native state

On the other hand, the use of proteostasis regulators (known as PRs) allow for an expansion of the proteostasis boundary for a number of destabilized nodes (as long as the nodes all share the same proteostasis network). By expanding the proteostasis boundary, the PRs can favor folding of the proteins by adjusting composition, concentration, and capacity of the proteostasis network. Besides promoting a stable proteostasis for proteins to fold correctly, PRs can also prepare the proteostasis network to handle metabolic stress and aging. The expansion of the proteostasis boundary helps increase the protective capacity of the proteostasis, hence expanding helps prepare for future abuse.

The overall energy of a protein is impacted by the folding aspect of the proteostasis network. The energy level of a protein achieves a good distribution by utilizing folding enzymes and chaperones to decrease the aggregation and improve folding. Chaperones and enzymes that help fold attach to the intermediate molecules and transition state. Binding to the transition state helps stabilize the protein so that there is a decrease in wrong folding and aggregation.

Chaperones help encourage more folding and also plays a role of preservation in the cell due to increasing correct folding and decreasing aggregation and wrong folding. Chaperones are understood as a large molecule that attaches to exterior hydrophobic areas during aggregated mode. Chaperones are specific and different for different compartments.

In all, the use of pharmacologic chaperones and proteostasis regulators both aid the proteostasis network in preventing numerous loss-of-function misfolding diseases. However, the advantages of using either lies in whether it is to bring in one destabilized protein (via pharmacologic chaperones) or to bring in a collection of similar destabilized proteins by expanding the proteostasis boundary (via proteostasis regulators).

Models for the Proteostasis Network[edit]

FoldEX and FoldFX are both models representing the proteostasis boundaries. FoldEX is a model that shows when a protein would get exported from the endoplastic reticulum, whereas the FoldFX model shows when proteins would have its function, hence where proteostasis working. FoldFX stands for Folding for the Function of Protein X. The models have three dimensions and they include the folding rate, the misfolding rate, and the stability of the protein.

The FoldEX model is important because it establishes a threshold for protein export. This boundary is characterized by the protein’s correct and wrong folding rate and its stability. Proteins will be exported if their energy level matches the energy level of the threshold.

In a healthy cell, all the proteins would be situated usually well within the boundaries of the FoldFX model and all the enzymes would be working. However, when there is a disease that affects protein folding or if proteostasis is not quite working well, there could be proteins represented that fall outside the boundaries, which would mean that the proteins are not functioning properly.

In conservative mutation the substitution that occurs does not have a heavy impact on the kinetics or thermodynamics of folding. It does not really affect the functional aspects that much because the replacement of a similar amino acid is not too different from the amino acid that was changed. In a slightly conservative missense mutation and elimination of an amino acid does affect the thermodynamics and kinetics of protein folding because the change of a base in the genetic sequence does not alter the functional aspect.

However, there are ways to correct this. One way is with the application of PC’s, or pharmacologic chaperones. Pharmacologic chaperones specifically target proteins that fall outside of the proteostasis boundary and push it within the boundaries giving it the ability to fold properly and function. It does so by either increasing the folding rate, decreasing the misfolding rate or stabilizing the structure of the protein. Another way to correct this is by way of PR’s, or proteostasis regulators. Proteostasis regulators can either expand or retract the proteostasis barrier allowing more or less proteins to be correctly folded.


Powers, T Evan. Morimoto, Richard. Dillin, Andrew. Kelly, W Jeffrey. Balch E William. Biological and Chemical Approaches to Diseases of Proteostasis Deficiency. 2009. Annual Review of Biochemistry The most popular method to synthesize peptides of more than 50 amino acids in length is automated solid-phase peptide synthesis. R. Bruce Merrifield first developed this method, and it can be used for both DNA and RNA. To begin the process, the carboxyl-terminal amino acid of the desired sequence is anchored to polystyrene beads, and the peptide is synthesized backwards from the C-terminal end to the N-terminal end (contrary to the usual sequence from the N-terminal end to the C-terminal end). The t-Boc protected group of this amino acid is then removed by a wash with trifluoroacetic acid (CF3COOH) and methylene chloride (CH2Cl2), which does not break covalent bonds. The next amino acid with t-boc (di-tri-butyl dicarbonate), a protected N-terminal, and a DCC (dicyclohexylcarbodiimide)-activated C-terminal is added to the reaction column. After the formation of the peptide bond, the excess reagents and dicyclohexylurea are washed away with an appropriate solvent. For the elongation of the peptides, the next amino acids continue to be added in the same manner. At the end of the synthesis, the peptide is released from the polystyrene beads by adding hydrofluoric acid (HF), which cleaves the ester bond without destroying the peptide bonds. Protected groups on the reactive side chains, such as lysine or histamine, also are removed at this time. The huge advantage of this method, besides the fact it is automated, lies in the purification step. Because the impurities are not bound to the reaction column, they can be washed away without losing the synthesized product. In the laboratories, this technique is used to synthesize drugs, such as insulin.



It starts in the nucleus. It is very similar to the DNA replication process in which the DNA is "unzipped" by helicase, producing one nucleotide chain ready to be replicated.

Transcription 3 Steps summary –> Producing an RNA message from DNA

(A) Binding and Initiation

DNA transcription unit divided into TATA Box and Enhencer region. TBP is bind to TATA region, other transcription factors (a protein has bound to the region) such as TFIIA and TFIIB are bonded to TATA regions as well. The RNA polymerase cannot bind to the DNA directly unless a transcription factor is bind first. Transcription begins when RNA polymerases bind to the enhancer region( or called the initiation site), separate it into two strands by requiring ATP energy Initiation initiate the location of the DNA strand to begin transcription.

(B) Elongation

RNA polymerase moves along the DNA promoter region by performs two elongate steps:

1) it untwists (unwind) the double helix DNA about 10 bases at a time at 3.4 A.

2) adds nucleotides to the 3’ end of the growing RNA.

As the RNA polymerase moves along, the growing mRNA molecule was replicated base on base. Transcription goes about 60 nucleotides per second. DNA’s nucleotides Adenine will be complimentary to RNA’s Uracil base. DNA’s nucleotides Guanine will pair with Cytosine.

(C) Termination

Transcription proceeds until the RNA polymerase reaches a termination site. No more RNA nucleotides will be added and the mRNA is released. So, mRNA will move out of the nucleus into the cytoplasm for the further use in protein synthesis.


The mRNA codons translates to amino acid polypeptide chains in three steps.

3 steps general guidance of translation

Initiation 2. Small subunits ribosomal attaches to mRNA. Large Subunit of ribosome is bind to small subunit with A site (entry for tRNA.)and P site ( leaving door for tRNA.) first attach to a tRNA. anticodon( nucleotide triplet in tRNA) is attaching to A site (entry site) to paired with 3 nucleotide codons from mRNA. tRNA carries an amino acid. As shown by the graph below, tRNA. carry an amino acid on the top

Elongation 3. Initiator tRNA. then moved to P site and A site is opened for the second triplet coded tRNA. to enter along with another amino acid. After the second tRNA. is bind to A site. The amino acid is then bonded together by peptide bonds. Afterwards the third tRNA comes in right after the second tRNA. move to P site. (Moving along from 3’’ to 5’’) 4. ribosomal enzymes link the amino acid into a chain. The process will continue until the stop codon (UAA) is reached.


5. a stop codon is reached (UAA, UAG, or UGA). A protein called a release factor binds in the A-site to the termination codon. The ribosomes adds a wtaer molecule to the end of the polypeptide chain. 6. ribosome dissociates into its component parts


Good yield and high purity. All reactions are carried out in the single vessel, eliminating losses caused by the repeated transfer of products. This method is good for synthesizing long chain of peptide (50 residues and above).

Synthetic Peptides[edit]

Peptides can be made synthetically by linking an amino group of one amino acid to the carboxyl group of another; this being an example of a condensation reaction. A condensation reaction is the reaction when two molecules come together, releasing water, to form one molecule.

Peptide synthesis can be specific; meaning specific/desired products can be formed. To make unique products and to prevent side reactions, protecting groups such as tert-butyloxycarbonyl (t-Boc) are used. T-Boc is used in the first step of the formation of simple peptides. This protecting group, in order to block the alpha-amino group, reacts with the alpha-amino group forming a complex [[Image:known as t-butyloxycarbonyl amino acid. The blocking of the amino group is followed by the activation of the carboxyl group of the same amino acid. The carboxyl group is activated by dicyclohexylcarbodiimide (DCC).

Now, with the alterations being done to the amino group and the carboxyl group of the first amino acid, a second amino acid can be linked to the first amino acid. The second amino acid has a free amino group, meaning not blocked, and it links to the activated carboxyl group of the first; forming a rigid peptide bond and releasing dicyclohexylurea. The carboxyl group of the newly formed dipeptide is activated with DCC and ready to react with a third amino acid which has a free amino group. Again, a new peptide bond is formed and dicyclohexylurea is released. This process can be performed continuously until the desired peptide is synthesized. To end the synthesis, dilute acid, which removes the t-Boc and leaves the peptide undisturbed, is added.

Dicyclohexylcarbodiimide (DCC)

Solid-phase method is used to form synthetic peptides that contain more than 50 amino acids. It involves binding the last amino acid's carboxyl group to polystyrene beads. The anchored amino acids t-Boc is removed, and the next amino acid with t-Boc protected amino group and DCC activated carboxyl group is added to the amino acid with polystyrene beads. The peptide bond forms, and the peptide with polystyrene beads is filtered and washed, so the peptide is pure before the synthesis is continued. The following amino acids are linked with the same process until the desired peptide is synthesized. Finally, the finished peptide is removed from the beads by using hydrofluoric acid(HF).

Peptide ligation is used to synthesize peptides with more than 100 amino acids. The long peptide is formed from two or more smaller sized peptides with no protecting groups on them. Native thiol ligation is the most powerful and widely used peptide ligation. The long peptide is formed from peptides with thioester on C-terminal carboxyl group and the other peptides with cysteine on N-terminal. The thioester on C-terminal carboxyl group of one peptide reacts with the cysteine on N-terminal of another peptide to form a thioester-linked intermediate. The intermediate is then rearranged(S->N acyl shift) to form a peptide bond. The small sized unprotected peptides are linked by this process to synthesize the long peptide.


Synthetic peptides are made for many purposes. These peptides can act as antigens, which will stimulate the immune system of the body to produce antibodies that target such peptide. These antibodies can then be used to isolate a protein. Peptides can also isolate receptors for hormones.

Synthetic peptides can also be used as drugs. Such example is the synthetic analog of Vasopressin, also known as 1-Desamino-8-D-arginine vassopressin. This synthetic peptide is used to treat patients with diabetes insipidus who lacks the peptide hormone vasopressin, which cause them to urinate excess liquid from their body. By using the analog of vasopressin to substitute for the natural vasopressin, such patients can be treated.


Lastly, synthetic peptides can be used to gain a greater understanding of the 3D structure of proteins. Using synthetic proteins to study the 3D structure of proteins is extremely helpful because such peptides can include many amino acids that are not found in normal proteins; meaning these peptides are not limited to just the 20 standard amino acids. This result in a much greater variety of structures.

Solid-Phase Peptide Synthesis[edit]

Polypeptide synthesis can be automated, known as the Merrifield solid-phase peptide synthesis, which uses a solid support of polystyrene to support a peptide chain. Polystyrene is a polymer whose subunits are derived from ethenylbenzene.


The beads of polystyrene are insoluble and rigid when they are dry; however, they swell in certain organic solvents, dichloromethane for example. Therefore, reagents are able to move in and out of the polymer matrix easily. The phenyl groups on polystyrene are functionalized by electrophilic aromatic substitution.

File:Electrophilic Chloromethylation of Polystyrene.jpg

Using a dipeptide as an example, the solid-phase synthesis of peptide on chloromethylated polystyrene proceeds as follows.

1. Attach protected amino acid

2. Deprotect amino terminal

3. Coupling to the second protected amino acid

4. Deprotect amino terminal

5. Disconnect dipeptide from polystyrene

Purpose of dicyclohexylcarbodiimide (DCC)[edit]

Dicyclohexylcarbodiimide (DCC) is used specifically in peptide synthesis in order to activate the electrophilicity of the carboxylate group. This allows the C-terminus to be more favorable as an attachment site for other amino acids. Then the negatively charged oxygen will act as a nucleophile which attacks the center carbon in DCC. This intermediate will eventually be converted into urea, a stable end product that is relatively unreactive throughout the remaining peptide synthesis process. In addition, DCC's activation ability may sometimes racemize peptide bonds if not monitored correctly, therefore sometimes triazoles may be used instead which do not racemize the stereochemistry of peptides.

File:Solid-Phase Synthesis of Peptide.jpg

Advantage of solid-phase synthesis[edit]

The advantage of solid-phase synthesis is that the products can be isolated easily since all the intermediates are immobilized on polystyrene. Thus, the products can be purified by filtration and washing. Repetition of the deprotection-coupling process will be able to synthesize larger peptides. A machine, designed by Merrifield, is able to carry out the series of manipulations automatically.

Protecting Groups[edit]

Peptide bond can be formed from the carboxyl group and amino group on the main chains of amino acids. It also can be formed from the side chains to synthesize an undesired peptide. In order to synthesize a desired peptide, protecting groups are used to prevent the formation of undesired products. They also prevent the polymerization from the excess amino acids used in the reaction. Protecting groups also aid in ensuring that the stereochemistry of certain amino acids remain unchanged. Configurations of amino acids may have their stereoisomers changed or racemized if not properly protected as well.

t-butyloxycarbonyl(t-Boc) protecting group[edit]

It is used to protect the N-terminal amino groups as well as the side chains of lysine, arginine, asparagine, and glutamine. Di-t-butyldicarbonate reacts with the NH2 of amino acid to form a t-Boc-amino acid. t-Boc group can be removed under acidic condition. Typically, they are treated with strong acid or Trifluoroacetic acid(TFA), CF3COOH. In the lab, Boc-amino acids are also available to buy since it can be synthesized easily in large quantity. People who synthesize peptides do not have to make Boc-amino acid on their own. Solid phase synthesis is effective because it allows the protein to remain in a primary structured configuration rather than being complicated by secondary or tertiary intermolecular interactions.

Boc-group, synthesized and removed
Mechanism of how T-boc is added to the amino acid
Mechanism of how T-boc is removed from the amino acid using HCl
Trifluoroacetic acid used to remove t-Boc group

Solution-Phase Peptide Synthesis (Using Benzyloxycarbonyl(Z) as protecting group)[edit]

Benzyloxycarbonyl is used to protect the N-terminal amino groups as well as the side chains of lysine, arginine, asparagine, and glutamine. The synthesis starts at the N-terminus and ends at C-terminus. For example, here are steps to synthesize a simple peptide such as Ala-Val:

First Step: Benzyl choloroformate react with the N-terminus of alanine, forming benzyloxycarbonyl alanine (alanine with the N-ternimus protected by Z-group). Typically, triethylamine is used as catalyst for this reaction.

Second Step: The protected alanine is treated with ethyl choloroformate. Carboxyl group of the alanine was activated by forming anhydride. It is sensitive to any nucleophilic attack from the N-terminus of Valine.

Third step: Valine is added to the protected, activated alanine. This forms peptide bond, connecting Valine and Alanine. We'll have the product of Z-Ala-Valine. Notice that the N-terminus is still being protected after this step.

Final Step: The Z-protected group was removed by hydrogenolysis under mild condition with metal such as Pd acting as catalyst. (check the image for detailed reactions in each step)

Synthesis of Ala-Valine, using solution-phase synthesis

In order to synthesize a larger protein, we have the repeat the second and third step. Activating the C-terminus and then, coupling the next amino acid. The advantages of this synthesis are it works very fast, and have a good percentage yield of the product. However, it can only be used for small protein chain. The yields become smaller with larger protein. Therefore, solid-phase is more preferred with large protein.

Z-group protecting group

9Fluoronylmethyoxycarbonyl(Fmoc) protecting group[edit]

It is used to protect the N-terminal amino groups as well as the side chains of lysine, arginine, asparagine, and glutamine. Fmoc can be removed by piperidine/DMF.

Fmoc protecting group
Piperidine. Used to remove Fmoc group

t-butyl and benzyl protecting groups[edit]

They are used to protect the C-terminal carboxyl groups as well as the side chains of serine, threonine, tyrosine, glutamate, and aspartate. t-butanol or benzenol reacts with the hydroxyl groups or carboxyl groups of amino acids to form t-butyl or benzyl amino acid. t-butyl or benzyl can be removed by strong acid and catalytic hydrogenation. Non ribosomal peptide synthesis is an alternative pathway that allows production of polypeptides other than through the traditional translation mechanism. The peptides are created here by enzymatic complexes called synthetases and the resulting peptides are generally short, 2-50 residues.[1] Non ribosomal peptide synthesis produces several pharmacologically important compounds including antibiotics and immunosuppressors. This biosynthesis pathway is found in many bacteria and fungi. Non Ribosomal Peptide Synthesis (NRPS) utilizes a large monomer pool including all the amino acids and several unnatural amino acids along with aryl acid substrates to produce small molecule metabolites by a series of loading and condensation of peptides. Peptides produced by NRPS show peculiar features compared to traditional proteins. First, they can contain standard as well as non-standard amino acids.[1] Secondly, amino acids are linked not only by an amino-peptide, but also by non-conventional links that form a non-linear peptide backbone.Non ribosomal peptide synthesis is a key mechanism responsible for the biosynthesis of bioactive metabolites in bacteria and fungi. Non ribosomal peptide synthetase genes, generally represent a part of multigene clusters, encode NRP synthetase which in turn, biosynthesize peptide products.[1] an NPR synthetase is generally composed of one or more modules and can terminate in a thioesterase domain that releases the newly synthesized peptide from the enzyme.[1] Unlike ribosomal peptide synthesis, they do not involve the translation of mRNA in order to begin the synthesis. Because of this there is a very large degree of diversity and gives rise to an extremely varied host of possible products. NRPS is especially relevant because many secondary metabolites produced by this process are of medical importance, creating numerous antibiotics, antibiotic precursors, and immunosuppressant drugs. NRPS is similar to polypeptide synthesis and fatty acid synthesis but NRPS multienzymes do not bind covalently to acyl carrier protein intermediates, instead utilizing only a peptidyl carrier protein (PCP). The PCP has a conserved serine group on an alpha helix replaced by a 4'-phosphopantetheine prosthetic group, which allows it to convert to the holo form, and consequently allows for the thiol group at the end of the prosthetic group to attach to other peptides. NRPS begins with the loading of an activated aminoacyl–adenylate onto the PCP, and then undergoes a process of adenylation and condensation until the thiostearase domain completes the polypeptide chain and the synthesis is completed.[2]

File:Type I NRPS production of the antibiotic tyrocidine.png
type I NRPS production of the antibiotic tyrocidine

Domains found in NRPS[edit]

  • F: Formylation (optional)
  • A: Adenylation (required in a module)
  • PCP: Thiolation and Peptide Carrier Protein with attached 4'-phospho-pantetheine (required in a module)
  • C: Condensation forming the amide bond (required in a module)
  • Cy: Cylization into thiazoline or oxazolines (optional)
  • Ox: Oxidation of thiazolines or oxazolines to thiazoles or oxazoles (optional)
  • Red: Reduction of thiazolines or oxazolines to thiazolidines or oxazolidines (optional)
  • E: Epimerization into D-amino acids (optional)
  • NMT: N-methylation (optional)
  • TE: Termination by a thio-esterase (only found once in a NRPS)
  • R: Reduction to terminal aldehyde or alcohol (optional)

After the peptide chain is synthesized, it can then be modified by halogenation, hydroxylation, acylation or glycosylation, which is typically carried out by an enzyme coded for in the same operon or gene cluster that was associated with the carrier protein. Since NRPS is similar to PKS and FAS, components of the other methods of metabolite synthesis are often cross-linked to each other and combine to form natural products. --A08954805 (discusscontribs) 22:32, 15 November 2011 (UTC)


  1. a b c d Invalid <ref> tag; no text was provided for refs named Campbell
  2. [5], additional text.
Bacterial Gene to Protein

Overview of Bacterial Gene to protein[edit]

The DNA has two strands, a sense strand and a template strand. The sense strand has the same sequence as the mRNA that will be transcribed, except the T on the DNA will be replaced with U’s on the mRNA. RNA Polymerase will make a complementary mRNA transcript from the template strand of DNA.


  1. Initiation: RNA polymerase will move along the DNA, looking for the -35 region and -10 region of the sigma-70 promoter in E.Coli. Once it finds the promoter, RNA polymerase will bind to the promoter, loosely at first then more tightly once DNA starts to unwind. RNA polymerase will then add a ribonucleoside triphosphate (rNTP), usually a purine. This rNTP will be complementary to the nucleotide on the +1 position of the DNA template. [1]
  2. Termination: The transcription termination site is located downstream from the translation stop codon. In bacteria, there are two types of terminations possible:
  • Rho dependent:
A Rho factor will bind to the RNA in a region, called the transcription terminator pause site-- this is rich in guanine and cytosine and is after the part of the gene that codes for protein. Rho will then wrap the downstream RNA (the RNA between where Rho binds and the RNA polymerase) around itself and slowly pull itself to the RNA polymerase, which is now paused. When Rho comes into contact with the RNA polymerase, termination occurs and the mRNA transcript and RNA polymerase are released from the DNA template. [1]
  • Rho independent-
A region of the mRNA transcript that is rich in guanine and cytosine forms a RNA stem loop that will hold onto the RNA polymerase and cause it to pause. During this pause, the poly-U and poly-A base pairs on the 3’ end of the mRNA is weak and therefore easy to melt. Transcription is stopped when the molecule is melted, and the mRNA transcript and RNA polymerase will be released. [1]


  1. Initiation: For bacteria, initiation factors (IF) are involved in the initiation of translation. IF3 will bring mRNA and the 30S subunit of ribosome together. The ribosome binding site on the mRNA can then bind the complementary sequence on the 16S rRNA. IF1 will bind to the A site of the 30S ribosomal subunit and block that A site. IF2 that is attached to GTP can then bring the initiatior fMet-tRNA (N-formylmethionyl-tRNA) to the start codon on the P site of the 30S ribosomal subunit. With the attachment of the initiator tRNA, IF3 will be released and then the 50S subunit of the ribosome will be attached to the 30S. This leads to the hydrolysis of the GTP and therefore the release of the IF2 and IF1. The ribosome continue through translation. [1]
  2. Termination: The ribosome will encounter a stop codon-- either UAA, UAG, or UGA, which appears in the A site of the ribosome. Instead of a tRNA binding, a protein release factor, either RF1 or RF2, will enter the A site of the ribosome. Peptidyltransferase will then cut the bond between the finished protein and the P site. Once the protein is released from the ribosome, RF3 will cause the protein release factor used to leave the ribosome. After, a ribosome recycling factor (RRF) and a bound EF-G will bind at the A site of the ribosome. GTP hydrolysis will take apart the 30S and 50S ribosomal subunit. IF3 will then bind to the 30S to remove any tRNA or mRNA left on the subunit. There is now a synthesized bacterial protein and ribosomal subunits that can help in further translations. [1]


  1. a b c d e Slonczewski, Joan L. Foster, John W. Microbiology: An Evolving Science, Second Edition, W.W. Norton & Company. 2009.

General Information[edit]

Protein Purification is the process of separating proteins for individual analysis. Protein purification is the second step of studying proteins, the first being the process of an assay. An assay is a procedure to measure the activity enzyme activity thus confirming the presence of the protein or proteins in interest. Popular assays include Western Blotting and ELISA(Enzyme-linked immunosorbent assay). Before the purification process, Cell Disruption is utilized to homogenize the cell's content. After the cell has been opened up, the process of purifying proteins from one another and the other organelles can be approached in several different methods. Protein mixtures are normally separated multiple times, each based on a different property, such as:

  • Solubility
  • Size
  • Molecular Weight
  • Charge
  • Binding affinity

The intended reason for purifying a specific protein governs the level and degree of protein purification. At times, a sample of protein that is only moderately purified suffices for its intended application; however, other situations require a higher degree of purification, especially if the fundamental ambition is to study the characteristics and tendencies of the specific protein in interest. By considering solubility, size, molecular weight, charge, and binding affinity, the goal of the scientist that conducts protein purification is to find a level of purification necessary and create a protein yield that is ample for further research and application. This means using the fewest amount of steps in order to keep the yield high, as each protein purification step incurs a degree of product loss. Therefore two factors serve as obstacles in protein purification: yield and purification level. The main goal of each protein purification project falls under two categories: analytical (for studying and research purposes) and preparative (for production and creation of commercial products).

There are many methods of purification including:

Proteins Purification Methods


Differential Centrifugation Salting Out Gel-Filtration Chromatography Ion-Exchange Chromatography Affinity Chromatography Hydrophobic Interaction Chromatography Gel Electrophoresis Isoelectric Focusing Two-Dimensional Electrophoresis Dialysis
Proteins are separated based on masses or densities by a centrifugal force. Centrifugation enables the separation of proteins in different cell compartments. Different proteins precipitate at different salt concentration. When the concentration of salt increases, more proteins are able to separate. Large molecules flow more rapidly to the bottom of the column. Proteins are separated according to its charge. Positively charged proteins bind to negatively charge bead, and negatively charge proteins are released. The negatively charged proteins flow through faster. Many proteins have high affinity for specific chemical groups. Proteins separate according to different levels of hydrophobicity. Electrophoresis separate protein while the gel enhances the separation. Small proteins move more rapidly through the gel. Different proteins have different pI (isoelectric point). Proteins are separated horizontally based on pI and vertically based on mass Proteins are separated through a semi-permeable membrane. Since the dimensions of proteins are generally larger than the pores of the membrane, proteins do not pass through and separate.

Purpose:you have the protein in some cells. Then, you want to remove the other protein to get the one you one.

General Information[edit]

Differential centrifugation is a method used to separate the different components of a cell on the basis of mass. The cell membrane is first ruptured to release the cell’s components by using a homogenizer. The resulting mixture is referred to as the homogenate. The homogenate is centrifuged to obtain a pellet containing the most dense organelles. Compounds that are the most dense will form a pellet at lower centrifuge speeds while the less dense compounds will likely remain in the liquid supernatant above the pellet. Each time, the supernatant may be centrifuged at faster speeds to obtain the less dense organelles. Performing centrifugation in a stepwise fashion, in which the centrifugation speed is increased each time, allows the components to be separated by mass. The rather dense nucleus is most likely to be found after the first centrifugation step, followed by the mitochondria, then smaller organelles, and finally the cytoplasm, which may contain soluble proteins.[1]

The result of the centrifugation of blood- compounds are separated by their weight.

Equilibrium sedimentation uses a gradient of a solution to separate particles based on their individual densities (mass/volume). A pivotal aspect about this type of sedimentation is that it is completely independent of the shape of the molecule. It is used to purify the differential centrifugation. A solution is prepared with the densest portion of the gradient at the bottom. Particles to be separated are then added to the gradient and centrifuged. Each particle proceeds until it reaches an environment of comparable density. Such a density gradient may be continuous or prepared in an incremental fashion. For instance, when using sucrose to prepare density gradients, one can carefully float a solution of 40% sucrose onto a layer of 45% sucrose and add further less dense layers above. The homogenate, prepared in a dilute buffer and centrifuged briefly to remove tissue and unbroken cells, is then layered on top. After centrifugation typically for an hour at about 100,000 x g, disks of cellular components residing due to the change in density can be observed from one layer to the next. By carefully adjusting the layer densities to match the cell type, specific cellular components can be enriched.

Sedimentation equilibrium is quite useful because a pellet is not formed. The speed of rotation creates enough force to make the protein leave the rotor, but it doesn’t condense it into a pellet. This is because a gradient in the concentration of the protein is produced. Diffusion reacts to counter the creation of the gradient and after a certain amount of time, a perfect balance between sedimentation and diffusion is achieved.

Sedimentation equilibrium is also practical to study the interactions between proteins. In particular it is used to ascertain the native state or native conformation of the protein. The native state tells us the exact structure in three dimensions. This information includes if it is a monomer, dimer, trimer, tetramer, etc. A monomer is a protein made up of one subunit. A dimer is two protein subunits that are rotated 180 degrees. A trimer is three subunits etc. This type of experimentation also allows us to determine whether the proteins can form oligomers (identical polypeptide chains tha make up two or more units of a protein). Additionally, the use of sedimentation equilibrium is that it determines equilibrium constants for protein-protein and protein-ligand interactions. The value of this Kd is often between 1nM-1mM. This is calculated by measuring the equilibrium constant (Kd). A final use of this is to determine stoichiometric ratios between protein complexes. An example of this is a ligand and its receptor or an antigen-antibody pair


  1. Berg, Jeremy (2006). Biochemistry (6th Ed. ed.). W. H. Freeman. ISBN 0716787245. 

Durdik, Jeaninne. "Sedimentaion Equilibrium". Alliance Protein Laboratories. Retrieved 2009-10-10. 


The process of "salting out" is a purification method that relies on the basis of protein solubility. It relies on the principle that most proteins are less soluble in solutions of high salt concentrations because the addition of salt ions shield proteins with multi-ion charges. Those charges help protein molecules interact, aggregate, and precipitate. The exact concentration resulting in precipitation varies from protein to protein, allowing for the separation of different proteins (as proteins will precipitate at different points with increases in salt concentration). Salting out can also concentrate dilute solutions of proteins; once the protein precipitates, the remaining liquid can be removed. However, the salt can pose a problem to the purity of protein.

"Salting in" refers to the observation that at solutions of low salt concentrations, the solubility of a protein increases. As the solubility of the salt is higher than that of the protein, it is more likely dissolve and take up space in the solution; therefore, proteins aggregate and precipitate. By contrast, "salting out" requires high salt concentration for the precipitation of the protein. There are two ways of "salting out". In one method, proteins are exposed to high concentrations of salt solutions, and in the other, the proteins are exposed to a series of low concentrated solutions.

Proteins contain various sequences and compositions of amino acids. Therefore, their solubility to water differs depending on the level of hydrophobic or hydrophilic properties of the surface. Proteins with surfaces that have greater hydrophobic properties will readily precipitate. The addition of ions creates an electron shielding effect that nullifies some activity between water particles and the protein, reducing solubility as the proteins bind with each other and begin to aggregate. Generally, larger proteins require less ionic input than do smaller proteins with lesser weight.

In the process of using low concentrations of salt solutions, the proteins are precipitated early in the process. In order to extract the proteins from the solution, cold solutions of ammonium sulfate at a series of decreasing concentrations are used on the precipitate. In order to recover the extracted protein, it is then recrystallized by warming the cold solution to room temperature. This process has many advantages because depending on the extracted protein, the efficiency rate can run anywhere from 30-90%, and rarely fails.

Ammonium sulfate is common substance used to precipitate proteins selectively since it is very soluble in water, it allows high concentration about 4M. At this state, harmful effects of proteins like irreversible denaturation are absent and NH4+ and SO42- are both favourable, non-denaturing, end of the Hofmeister series. Ammonium sulfate provides quantative precipitation of one protein from the mixture. This method is very useful to purify soluble proteins from the cell extracts.4

While proving itself to be an efficient method of protein separation, salting out requires that the solubility of the protein to be calculated or known initially. Proteins have differing amino acid chains and solubility. In trying to change the salt concentration to the point where the protein becomes insoluble, different ions can either increase or decrease the solubility of the protein. Hence, one must be careful in selecting the correct ions to alter salt concentration. A protein is typically least soluble near its isoelectric point, pI, or where it contains minimal net charge. The precipitation by salting out results in fractionation. An amount of precipitated protein is collected at one salt concentration and another amount from a different concentration. This is because some parts of the protein may be more soluble than another region.

Proteins with different pI values can be separated with salting out techniques via dynamic pH values in varying salt concentration. Since proteins are least soluble near their isoelectric point (pI), it is possible to cause them to precipitate them out of solution by increasing the salt concentration. This is possible since the hydration shell surrounding the protein structure is displaced by the increasing ionic concentration in the solvent. Thus by replacing the hydration shell with other ions, the water networks that solubilize proteins and allow for aggregation at high salt concentration due to hydrophobic groups coming together become destabilized. Ultimately proteins are precipitated with aggregation (or "crashed out"). This technique can be used to separate proteins that initially have similar precipitation points. By modifying the pH of the solution, one can increase or decrease the solubility of one protein without affecting the target protein. Furthermore, the solution can later be purified by using dialysis to remove the salt ions in solution.

Dialysis Process

Hofmeister Series[edit]

The effectiveness of the different ions was established by Franz Hofmeister in 1888. The first ion in the anion and cation series is the most effective in precipitating a protein out (dubbed "kosmotropes": ions that interact well with water, forming H-bonds and dehydrating proteins), and the ions at the end are the least ("chaotropes": ions that free up water by breaking H-bonds between water molecules, increasing protein solubility). ^

Cations: N(CH3)3+ > NH4+ > K+ > Li+ > Mg2+ > Ca2+ > Al3+ > guanidinium

Anions: SO42- > HPO42- > CH3COO- > citrate > tartrate > F- > Cl- > Br- > I- > NO3- > ClO4- > SCN-

The starting molecules strengthen hydrophobic interactions by decreasing solubility of the nonpolar molecules, thus salting out the system. However, the later molecules begin to denature the structure of the protein because of strong ionic interactions that disrupt hydrogen bonding. Although the later molecules can be salted out through solutions such as Ammonia Sulfate, certain molecules can also experience salting in, where the solubility of the protein increases through the later molecules of the list.


Dialysis is a protein purification process that separates proteins from other small molecules, such as salt, by using a semipermeable membrane. This membrane contain micro pores through which the small molecules will escape. Therefore, protein molecules having dimensions significantly greater than the pore diameter are retained inside the dialysis bag. The small molecules and salt will diffuse out through the membrane and into the dialysate outside of the bag. This technique is useful to remove salt ions and other small molecule but can not be used to distinguish proteins. To enhance the separation of the proteins in the bag from other impurities such as salt we can also take advantage of the equilibrium constants. In an aqueous environment the salt will flow through the plasma membrane until its concentration outside the dialysis bag is equal to the concentration inside the bag. At this point there is no net flow of salt through the membrane because equilibrium is reached. But if we add in a new solution of buffer, then the remaining amount of salt will then flow out of the dialysis bag until the concentration of salt in the new buffer equals the concentration in the dialysis bag. If we keep replacing the buffer solution this will enhance the purity of the proteins inside the dialysis bag because each time we replace the buffer the salt has to flow out inorder to attain its equilibrium constant. This principle can also be applied for other impurities that are able to escape through the membrane.

Dialysis in human body[edit]

In kidney-compromised patients, dialysis is often used as a procedure for removing undesirable solutes in the blood. For example, the calcium, potassium, and urea concentration of the dialysate is kept at low concentrations, enabling the target solutes in the blood to diffuse across the semi-permeable membrane. However, this entails the dialysate to be constantly cleaned in order to prevent concentration equilibrium, which would ultimately lead to a rising concentration of unwanted solutes in the blood. In another case, solutes can also be introduced into the blood. For example, bicarbonate ions are in high concentration in the dialysate, which diffuse across the membrane. This is done to prevent metabolic acidosis.


1. Berg, Jeremy M. 2007. Biochemistry. Sixth Ed. New York: W.H. Freeman. 68-69, 78.

2. Voet, Voet, Pratt (2004). - Fundamentals of Biochemistry

3. [[11]] Atlas of Diseases of the Kidney, Volume 5, Principles of Dialysis: Diffusion, Convection, and Dialysis Machines

4 [12] "Chapter 9: Protein expression, purification and characterization", Proteins: Structure and Function, Whitford, 2005, John Wiley & Sons, Ltd

Capillary Electrophoresis[edit]

Capillary Electrophoresis is a family of techniques that use narrow-bore capillaries to perform high efficiency separations of both large and small molecules. Using a high voltage power supply, the solution travels from the anode to the cathode through the capillary. By doing so, the solution passes through the detector and based on the flow of the molecules, the integrator computes the separation of the molecules from the original solution. There are five modes of capillary electrophoresis which include capillary zone electrophoresis, isoelectric focusing, capillary gel electrophoresis, isotachophoresis, and micellar electrokinetic capillary chromatography.

Capillary Zone Electrophoresis[edit]

Capillary zone electrophoresis is a separation mechanisms that is based on the differences in the charge-to-mass ratio of the molecules. The homogeneity of the buffer solution as well as the constant filed strength are fundamental to the capillary zone electrophoresis process. It can be used to separate both large (DNA) and small (drugs) molecules. Capillary Zone Electrophoresis is the simplest form of capillary electrophoresis.

Capillary Zone Electrophoresis

Isoelectric Focusing[edit]

Isoelectric focusing is when the solution tested is run through a pH gradient where the pH is low at the anode and high at the cathode. Therefore, when a voltage is applied, the ampholyte mixture separates in the capillary.

Capillary Gel Electrophoresis[edit]

Capillary gel electrophoresis is conducted in an anticonvective medium, oftentimes such as polyacrylamide or agarose gel. The composition of the media thus serves as a molecular sieve for size separations.


In isotachophoresis, there is zero electroosmotic flow with the heterogeneous buffer. In fact, the capillary is filled with a leading electrolyte with a higher mobility than any of the sample components as well as a terminating electrolyte where the ionic mobility of the electrolyte is lower than any of the sample components. As a result, the solution is separated based on the leading and terminating electrolytes.

Micellar Electrokinetic Capillary Chromatography[edit]

In Micellar Electrokinetic Capillary Chromatography (MECC or MEKC), the use of micelle-forming surfactant solutions can give rise to separations that resemble reverse-phase liquid chromatography. Based on the hydrophobic and electrostatic interactions, the analytes are organized at the molecular level.

Electroosmotic Flow[edit]

In comparison to HPLC which uses hydrodynamic flow, capillary electrophoresis is based on electroosmotic flow (EOF). The factors that influence the rate of electroosmotic flow are pH, voltage, temperature and the concentration of the buffer.


At neutral to alkaline pH, the electroosmotic flow is sufficiently stronger than the electrophoretic migration such that all species are swept towards the negative electrode. At high pH, the electroosmotic flow is large and the peptide is negatively charged; despite the peptide’s electrophoretic migration towards positive electrode (anode), the EOF is overwhelming and the peptide migrates towards negative electrode (cathode). At low pH, peptide is positively charged and EOF is very small, thus resulting in peptide electrophoretic migration and EOF towards the negative electrode. However, most solutes migrate towards negative electrode regardless of charge when buffer pH is above 7.0. Oftentimes, the pH selected is at least two units above or below pKa of the analyte in order to ensure complete ionization.


High voltages provide for greatest efficiency by decreasing the separation time.


At high temperatures, the viscosity of the solution is lower and the electroosmotic flow increases as a result. However, some buffers are known to be pH-sensitive with temperature.

Buffer Concentration[edit]

When the buffer concentration is reduced, the peak efficiency of the results is reduced by decreasing the focusing effect.


Wätzig, H., Degenhardt, M. and Kunkel, A. (1998), Strategies for capillary electrophoresis: Method development and validation for pharmaceutical and biological applications. ELECTROPHORESIS, 19: 2695–2752. doi: 10.1002/elps.1150191603


High Pressure Liquid Chromatography (also known as High Performance Liquid Chromatography, or simply HPLC) is an enhanced form of column chromatography that is commonly used in biochemistry to separate and purify compounded samples. Instead of the solvent dripping through the column as a result of gravity as is the case in other methods of chromatography, the solvent is pushed through with high pressures.

The column materials of HPLC are much more neatly and greatly divided, and so there are more interaction opportunities and greater resolving (separating) power. Since the columns are made of materials of better quality, constant pressure must be applied to the column to obtain acceptable flow rates. Therefore, the final result is high resolution and very fast separation.


HPLC was developed and improved with new column technologies in the mid-1970's, replacing the other primeval column chromatographic techniques which failed when it came to quantifying and purifying similar compounds. Pressure liquid chromatography proved to be much less time consuming than the old methods. Compared with classical column chromatography, in which the columns are powered by gravity and a separation can take hours or even up to days, HPLC was able to produce results as fast as five to thirty minutes.

HPLC was used frequently for the compound purification by the 1980's. Computers and other improved technology added to the convenience of HPLC. Improvements in the types of columns and consequently, reproducibility of HPLC, led to developments of micro-columns, affinity columns, and fast HPLC.

The past decade has seen a vast advancement in the development of micro-columns, now commonly used for HPLC, and other specialized columns. The diameter of the typical HPLC column is about 3-5 mm. But the usual diameter of micro-columns, or capillary columns, ranges from 3 µm to 200 µm, so it is considerably smaller. Fast HPLC utilizes a column that is shorter than the typical column, and so they are packed with smaller particles.

These days, one has the option of considering several types of columns for the Purification of mixtures, as well as a variety of detectors to work with the HPLC in order to get the best possible analysis of the compound.


A small volume of the sample is put into the High-Pressure Liquid Chromatography where a mobile phase will move it through the stationary phase. The mobile phase is usually a gas or a liquid and the stationary phase is immobile and immiscible. The stationary phase will slow down the flow of the sample because of it physical or chemical properties (size, net charge, or other differences depending on the type HPLC) where it will be filtered or purified. Because of the difference in how the stationary phase affects the impurities from the desired compound, the different components of the sample will come out at different times. The time that a component comes out of the column is called the retention time. The retention time should be unique to the component in the particular sample, so that no two components being analyzed elute at the same time and obscure each other. If solvent composition cannot be tweaked to effectively separate components in HPLC analysis, then a different type of chromatography might be better suited. HPLC, unlike other column chromatography techniques, uses pressure via pumps to push components through the more finely packed columns to speed up analysis and enable analysis of component and column combinations that take longer to elute on their own.

Mobile Phase[edit]

The mobile phase is a solvent or mixture of solvents that carries the sample through the stationary phase. As it moves through the stationary phase, molecular interactions between the sample's components and the column material determine the retention time of the different components. The components that have stronger interactions with the mobile phase than the column will "prefer" the mobile phase and elute quicker with a shorter retention time while components that have stronger interactions with the stationary phase than the solvent will "prefer" the column and elute slower with a longer retention time. This is how HPLC separates, filters, and aids in purification of the compound. There are different techniques in regards to mobile phases that are tweaked to optimize retention time, separation, and peak clarity. These are isocratic, gradient, and polytyptic.


Isocratic elution involves a constant mobile phase composition. For example, a mobile phase of 50% acetonitrile and 50% water for a reversed phase HPLC (RP-HPLC) run that remains unchanged through the entire analysis. A solvent system is chosen and it will be used for the entire duration of the HPLC run. The sample is injected as the mobile phase flows through, enters the HPLC at a constant flow rate, and passes through the chosen column. This method is generally used when the sample being analyzed is simple enough that all the components of the sample come out at different times with sufficient clarity, and do not have impractically long retention times.


Most samples are not so easy to work with. In these cases, a gradient elution method is set up. The mobile phase mixture will shift as the run proceeds, and the concentrations of the solvents are modified so that the run begins with the "weaker" solvent, and the "stronger" of the solvents will be the most concentrated at the end. One such example is a reversed phase HPLC run that begins with more mobile phase A, which is composed of a 95% water and 5% acetonitrile mixture, and will gradually increase mobile phase B, which is a 100% acetonitrile mixture, until at the end of the run the majority of mobile phase flowing through the column is mobile phase B. Usually for reversed phase HPLC, the mobile phase will begin with the more polar solvent combination and increase the concentration of the less polar solvent combination as the run proceeds. This is so that the less polar molecules (relative to the mobile phase and stationary phase being used) will eventually elute due to a higher concentration of a less polar solvent and the necessary run time for the analysis can be shortened. An isocratic mobile phase can have a polarity too close to the stationary phase, resulting in components eluting out together immediately and their peaks overlapping, or a polarity too different from the nonpolar stationary phase, resulting in nonpolar components taking too long to elute. This is why a gradient mobile phase is often used in analysis, where concentration of less-polar to more-polar solvents can be modified to obtain optimal peak separation.


The polytptic elution, also known as mixed-mode chromatography, involves the use of a special column that can switch modes of analysis depending on the solvent. The same column can perform size exclusion, ion exchange, or affinity chromatography depending on the type of solvent that flows through it.

Retention Time[edit]

Retention times depend on the interaction of the component of the sample, the mobile phase, and the stationary phase to each other. Therefore, a well-designed HPLC run relies on choosing the correct type of column for the analysis desired and the right combination of mobile phases for the analyte and the column.

Column Efficiency[edit]

Column efficiency describes how well the stationary phase filters or purifies, basically how packed it is and how well things move along it. There are a couple of ways to measure column efficiency but they all use the same formula:


N=number of theoretical plates
a=constant that depends on the height of a graph
tr=retention time
W=width of a peak

Applications of HPLC[edit]

Normal phase chromotography[edit]

Normal phase chromatography, or NP-HPLC is the first kind of HPLC developed. In this method a polar stationary phase and a non-polar mobile phase is used in order to separate analytes based on their polarity. Since the polar phase is stationary, polar analytes will bind to that phase. Their adsorption strength and elution time depend on the strength of the analyte polarity and the analyte’s steric factors. Since the elution time depends on steric clashes, it is then possible to differentiate and separate structural isomers since each isomer has a different steric clash. One can increase the elution time by adding a non-polar solvent to the non-polar mobile phase. One can also able to decrease the retention time of the analytes by adding polar substances to the non-polar mobile phase and even occupy the stationary phase surface preventing the polar analytes from binding to the polar surface.
In the past, this method is unfavorable due to the fact that water or protic organic solvents changed the hydration sate of the media in the system. However, this problem was solved with another version of NP-HPLC called hydrophilic interaction liquid chromatography, which uses a variety of phases that had better retention times.

Reversed phase chromotography[edit]

Reverse phase chromatography, as the name suggests, is the opposite of normal phase chromatography, where it now has a non-polar stationary phase and a polar mobile phase. Consequently, the non-polar analytes will bind to the non-polar phase, and its elution time will also depend on how non-polar it is. One can still also increase the elution time by adding a polar solvent to the mobile phase or decrease the elution time by adding a non-polar solvent to the same phase. However, unlike NP-HPLC, the method depends on hydrophobic interactions.

Some factors can influence hydrophobic interactions. One of those factors is surface area. An analyte with a larger hydrophobic surface area would consequently have a longer retention time since there would be more bonds interacting between the analyte and the non-polar surface. However, too large of an analyte surface won’t be able to enter the pores of the non-polar phase and have no interactions with the phase. This strengthening in bonds is also due to the force of water for “cavity-reduction” around the analyte, and the energy released in this process depends on the surface tension of the eluent, which in this case is water.

Another factor that can affect the hydrophobic interactions is the pH. An ideal environment is one that is uncharged. As a result, chemists use buffering agents, such as sodium phosphate, to regulate the pH and neutralize the charge on exposed media, which usually is composed of silica, on the stationary phase and the charge on the analyte.

Reverse phase columns are stronger than normal silica columns, but still have some weaknesses. Aqueous bases shouldn’t be used with columns consisting of alkyl derivatized silica particles since the base will destroy the underlying silica particle. Also, if an aqueous acid is used, it should be exposed too long to the column in order to prevent corrosion.

Gel filtration[edit]

Gel-filtration chromatography separates proteins based on differing in size. The process involves a gel in a buffer solution that is packed into a column. This gel has many porous carbohydrate polymer bead-like particles. The size of the pores is selected so that it can only allow proteins with a certain size to diffuse through them. The movement of the molecules that are small enough to enter through the pores of the beads is then slowed down because it is forced to enter the stationary phase of the column. The larger molecules on the other hand, end up moving through the column faster because they cannot enter the internal volume of the beads.

The most important advantage of gel-filtration chromatography is its ability to separate the proteins in its original, non-denatured condition, giving you a sample that is in a suitable form for possible further analysis. Another advantage as well is the high resolution that is obtained by applying pressure into the column to get adequate flow. Improved resolution is achieved with slower flow rates. An optimum flow rate for protein fractionation of approximately 5mL/cm2/h is recommended for most gels.

Reference: Aguilar, Marie-Isabel. HPLC of Peptides and Proteins Methods and Protocols. volume 251. Humana Press.

Ion exchange[edit]

Ion-exchange chromatography separates proteins based on their charge. It is efficient enough to be able to resolve proteins that differ only by one single charged group. It depends on the formation of ionic bonds between the charged groups on the proteins and an ion-exchange gel carrying the opposite charge in a column. Proteins that do not have an electrical charge and are neutral are removed by washing. Those proteins that can form ionic bonds, though, are recovered by elution with a buffer of either higher ionic strength or changing pH. An increase in oppositely charged ions (those of the protein being analyzed and those of the gel medium) increases the retention time, which is based on the attraction between the protein ions and charged ions of the gel medium.

There are two types of ion-exchangers. One is the anion exchanger, which has positively charged groups that are stationary in a gel-medium and will interact and bind to negatively charged ions in the protein. The other is the cation exchanger, which has negatively charged groups that are stationary in a gel-medium as well but interact and bind to positively charged ions in the protein.

The pH of the solution can also alter how the ionization process between the protein ions and the ions in the gel-medium. When the pH is equal to the isoelectric point of the protein (the point where the net charge is zero). However, when the pH is less than the isolectric point, the net electric charge on the protein will be positive and it will bind to the cation exchangers. Finally, if the pH is greater than the isoelectric point, the net charge on the protein will be negative and it will bind to the anion exchangers. Therefore, by controlling the pH of the solution we can control how the protein gets separated since it is these exchangers that separate the protein

Reference: Aguilar, Marie-Isabel. HPLC of Peptides and Proteins Methods and Protocols. volume 251. Humana Press.

Affinity chromatography[edit]

Affinity chromatography is the method of the separation of biochemical mixtures, based on a highly specific biologic interaction. It is used to purify a molecule from a mixture and concentrate it into a buffering solution, and also to recognize what biological compounds bind to another molecule, like drugs. It was discovered in 1968 by Pedro Cuatrecasas and Meir Wilcheck.

The process involves the trapping of the target protein (or molecule) that one wants separated from the mixture onto a solid or a medium. A column is filled with beads that contain covalent glucose residues, which are chosen to correspond with the target protein. The proteins will travel down through the beads as they are poured into the column, and when the target protein is recognized, it will get trapped to the column by covalent bonds due to its affinity for glucose. The rest of proteins will run down the column and become successfully separated. The portion of buffer will be added to the column to wash out the unbounded protein. Lastly, a concentrated solution of glucose is added to separate the target protein from the column-attached glucose residues, resulting with the protein being completely purified out of the mixture.

Adsorption Chromatography

Adsorption, meaning the accumulation of solutes of the surface of a solid or liquid, chromatography is useful in separating a mixture of solutes based on their different polarities. It is based on the notion that polar solute will form a tighter bond with the polar stationary phase than a less polar solute will. An insoluble, polar material like silica gel (a derivative of silica gel, Si(OH)¬4) is filled into a glass column, making it the stationary phase. The sample containing the mixture is the mobile phase, which can be a liquid or gas, is poured onto the glass column, where each solute with a different polarity will bind differently to the solute. The polar solutes will bind tightly to the stationary phase, the less polar ones will bind more loosely, and the neutral ones will pass right through the column. The solute can be eluted with solvents of progressively higher polarity, where the solutes will be eluted with increasing polarity. So, neutral solutes will pass right through the column, the less polar ones will be eluted first, and very polar solutes will be eluted last.

Reference: Principles of Biochemistry 4th Edition.Nelson, David L.; Cox,Michael M.W.H Freeman and Company. New York

Additional References[edit]

  • Practical HPLC Method Development 2nd Edition. Snyder, Lloyd R.; Kirkland, Joseph Jack; Glajch, Joseph L. New York.
  • Handbook Of Pharmaceutical Analysis By HPLC. M. W. Dong. Elsevier.
A Gel Filtration column

Gel-filtration chromatography, also known as 'size exclusion chromatography', 'molecular exclusion chromatography' or 'molecular sieve chromatography' is the simplest and mildest technique that separates molecules based on their size difference (hydrodynamic volume). This approach allows each polypeptide to be purified from other different sized polypeptides by passing through a gel filtration medium packed into the column. Unlike ion-exchange or affinity chromatography, fractions passing through the column do not bind to the chromatography medium. The big advantage of Gel-filtration chromatography is that the medium can be varied to suit the properties of a sample for further purifications.

When an organic solvent is used as a mobile phase, chemists tend to call it Gel permeation chromatography. The buffer or organic solvents used as the mobile phase are chosen based on the chemical and physical properties of the specific protein sample. The stationary phase of the column is simply the carbohydrate polymeric beads and the mobile phase goes through the stationary phase at a different speed depending upon the size of the molecule. This technique is used to analyze the molar mass distribution of organic-soluble polymers. It was invented by Grant Henry Lathe and Colin Ruthren who were working at Queen Charlotte's Hospital in London, United Kingdom.

Gel-filtration chromatography can be applied in two different ways: for group separations and high resolution fractionation of biomolecules. The group separation technique separates compounds in a sample into groups based on the size range. This technique is used for purification of a sample from high or low weight contaminants. The high resolution fractionation of biomolecules is a more precise technique. It can be used for isolation of one or more components in a sample, separation of monomers from aggregates, to determine molecular weight, or to perform molecular weight distribution analysis. Gel-filtration chromatography is very suitable for biomolecules which are very sensitive to pH changes, concentration of metal ions, or co-factors.

Within the size range of molecules that are subjected to gel-filtration chromatography and are separated by a particular pore size of beads in the column, there is a linear relationship between the relative elution volume of a substance (i.e., the volume of the fractions in which the molecule is found)and the logarithm of its molecular mass (this is assuming that the molecules have similar shapes). If a given gel filtration column is calibrated with several proteins of known molecular mass, the mass of an unknown protein can be estimated by its elution position.


An analogy to understand (this is CONCEPTUAL, not even remotely a literal representation of what happens in ME chromatography) why gel filtration works is to picture several whiffle balls (or sponges or Swiss cheese-whatever cratered object works for you) suspended in a glass tank. Now imagine that you have a mixture of sand, small marbles, and golf balls in a bucket; you dump it in. As you watch, first the golf balls reach the floor of the tank, then the marbles, and finally a layer of sand settles. Why? Essentially all of the sand goes into the holes of the whiffle balls(or Swiss cheeses or sponges), and it tends to fall from the interior of one whiffle ball to the interior of another, significantly slowing passage of the sand to the bottom of the tank. The marbles are only slightly smaller than the holes in the whiffle ball, so they sometimes fall into the holes on the way down but also sometimes bounce off; again, the whiffle balls slow their progress, but to a lesser extent. The golf balls are way too big to fit the holes of a whiffle ball, and so they push straight through the whiffle balls—the fastest and most direct route. Key: sand=small molecules; marbles=medium molecules; golf balls=large molecules; whiffle balls=porous beads; tank of water=column & aqueous solution

General Procedure[edit]

The gel medium packed into the column is a porous matrix that consists of spherical beads, which have stable physical and chemical properties such as non-reactivity and lack of adsorption. The small molecules can enter the beads but the larger one cannot. The small molecules are distributed in the aqueous solution both inside and between the beads where as the large molecules are located in the solution between the beads. These beads are not soluble and are normally made from highly hydrated polymers such as dextran, agarose, or polyacrylamide. For commercial purposes, Sephadex, Sepharose, and Biogel are used. These commercial beads are about 100 miciro-meters in diameter and are used to separate proteins based on sized. Also, silica or cross-linked polystyrene can also be used as material for the beads under higher pressures. The pores and space between the particles is filled with a liquid buffer, which fills the entire column. The liquid filling the pore space is called a stationary phase and the liquid in the space between particles is a called mobile phase. Once the sample has been applied to the top of the column, it passes through the column along with the mobile phase from the top of the column to the bottom. Smaller molecules are able to cross and go through these polymer beads but large ones are not able to. Therefore, small molecules in the column are both inside the polymer beads and between them, whereas large molecules can only travel between the polymer beads. Since less traveling space is allowed for the larger beads, they tend to move faster down the column and they emerge first at the end of the column. Think of it this way. The molecules traveling down the column represent a faucet. If the faucet has a smaller volume of space to allow the water to travel, the water will come out faster and with greater force. The same concept applies here as well. Since less volume is accessible to the bigger molecules, they move much faster through the column than smaller molecules do. So, since the small molecules are stuck inside the beads, they tend to move slower. Theoretically, molecules that have the same size should elute simultaneously. An elution diagram, or a chromatogram, can be constructed to verify complete separation. Before separation of unknown sample, solutions with known biomolecules can be run in order to make a calibration curve, which later can be used as a reference for identifying of unknown molecules.


Gel-Filtration Chromatography is commonly used for analysis of synthetic and biological polymers such as nucleic acid, proteins, and polysaccharides. A downfall to this technique is that the stationary phase may also interact in an undesirable way with a molecule and affect its retention time. A major drawback to this method is its difficulty in producing a high-resolution image. An alternative to this may be Discontinuous Electrophoresis. Disc electrophoresis uses gels with different pHs and the proteins produce sharp bands when they go from one gel to the other, which creates high-resolution images.[1] This technique requires three different gels: the sample gel, the stacking gel, and the running gel. The proteins moves through the stacking gel and between the sample and running gels before the proteins enter them. This compresses the proteins and increases the resolution.[2]

Gel-Filtration Chromatography should not be confused with gel electrophoresis, where electricity is applied to create an electric field to separate molecules through the gel towards the electrode (anode and cathode) depending on their electric charge. Besides, large molecules in Gel-filtration Chromatography migrate down the column first whereas small molecules in gel electrophoresis migrate down the gel first.


  1. "Discontinuous Electrophoresis." The University of Adelaide, Australia, Department of Chemistry.
  2. "EXPERIMENTAL TECHNIQUES, ELECTROPHORESIS." Department of Biochemistry and Molecular Biophysics. 2006.

Viadiu, Hector. Biochemistry 114A Lecture. "Protein Techniques." 10/15/12 Purpose: To separate a specific protein from its mixture by using the property of ion-charges.

General information[edit]

An ion exchange column.

Ion Exchange Chromatography (IEC) is a purification method aimed at separating proteins based on charge, which is dependent on the composition of the mobile phase (a separation of mixtures that is dissolved). What adjusting the pH, or the ionic concentration, of the "mobile phase" does is that it allows for separation. For example, if a protein has a net positive charge of pH 7, then it will bind to a column of negative charge beads. On the other hand a negatively charged protein would not. A summary of the ion-exchange chromatography include: If a proton has a net positive charge at pH of 7 then it will bind to a column of beads that contain the carboxyl groups, where as a negatively charged proteins will not. Then, the bound protein is eluted by increasing the contradiction of sodium chloride. The movement of protein is depended on the density of the net charge. So, the proteins that have a low density of net positive charge will emerge first. Protein binds to ion exchangers by electrostatic forces between the surface of the protein charges and cluster of the charged group on the exchangers. A column is packed with a resin (usually cellulose or agarose) with a charged group bonded to it. This allows positively charged proteins, for example, to bind to the negatively charged beads on the column and the negatively charged proteins to flow through the column. Therefore ion exchange chromatography consists of cation exchange chromatography and anion exchange chromatography. In addition, a protein must displace the counterions and become attached; in other words, the net charge on the protein will be the same sign as that of the counterions displaced-therefore "ion exchange. The protein molecules in solution are neutralized by counterions also; the overall reaction must be electrically neutral. Whatever one wants to purify is known as the sample and the parts that are separated are known as the analytes. The sample is added to the top of the column and a buffered solution is used to elute it.

Anion-Exchange Chromatography[edit]

Anion-Exchange chromatography involves the use of positively charged beads. In the purification of acids, which often has the negative charge on its carboxyl group, anion-exchange chromatography is utilized. Anion-exchange chromatography mainly recollects biomolecules by the interaction of amine groups on the ion-exchange resin with aspartic or glutamic acid sidechains, which have pK of ~ 4.4. The mobile phase is buffered at pH > 4.4, below which acid sidechains start to protonate and retention declines.

Above pH 4.4, retention is fundamentally reliant on on the number of anionic sidechains existing in the protein. Proteins including the same number of anionic sidechains can often be separated by modification of the mobile phase pH between 7 and 10 where histidine is not protonated and lysine starts to deprotonate.

Delicate changes occur to proteins in this pH region which affect the interaction of the protein with the resin and which allow fine-tuning of the anion-exchange separation. A mobile phase, pH > 10, is not usually suggested because of possible protein deprivation, such as deamination, at higher pH's.


Ion-exchange chromotography.

In cation exchange chromatography, a sample consisting of a certain protein that bears a net positive charge at a certain pH is a added to a column. In anion exchange chromatography, a sample with a protein that bears a net negative charge at a certain pH is added to a column. Recall that a net charge is the sum of partial charges for each amino acid's particular R group at a given pH. The columns have resin that consists of cellulose (or agarose) beads, which have a function group covalently bonded to it. For cation exchange a carboxylate group is used, and for anion exchange a diethylaminoethyl group is used. A buffer solution, also called a mobile phase, has its pH set between the pl or pKa of protein and the pKa of the beads on the columns. The buffer solution then runs the sample through the column. Molecules with no charge or the same charge as the beads will pass through, while molecules with the opposite charge will bind to the column of beads. Like a magnet, it'll stick and stay there. To elute the bound proteins, the column is flushed with a salt, usually excess NaCl. In cation exchange chromatography the Na+ ion will compete with the bound protein for the negative functional group, and in anion exchange chromatography, the Cl- ion will compete to bind the columns. Another way to flush the system would be with a low pH buffer. The more acidic conditions will lower the net charge (or make it more positive) of the protein. Since the protein now bears a positive net charge, it no longer feels compelled to be around the like-charged resin (since like charges repel), and thus will come out of the column pure. Knowing the isoelectric point (pI) of the protein sample can be helpful in ion-exchange chromatography. Recall that pI is the pH at which a compound's net charge is zero. So if we have a compound with a high pI, for example 10, then to get the pH gown to 7 would cause the compound to become positive. Conversely, if the pH of the solution is higher than the pI, the protein becomes negative overall, thus more anion formation. Thus, depending on the pI of the protein, different solvents at specific pH's can be targeted to purify protein. This also implies that proteins with two significantly different pI's are the most successful in ion-exchange.

If there are impurities in the sample that have a similar charge of the protein being isolated, a pH gradient buffer solution is needed. Unless the proteins have exactly the same amino acids, it is unlikely that they will have exactly the same charge at the same exact pH. Raising (or lowering) the pH, which is in effect causing more molecules to be deprotonated (or protonated), will cause the molecule to have a slight change in charge negatively (or positively). This will affect the ionic interaction between the molecule and the resin, causing some of the molecules to elute from the column. By changing the pH, different molecules will have different charge densities (or degree of negative charge; -2,-1,-3, etc.). So at a certain pH, a protein might have a higher or lower charge density and will thus bind to the resin differently, and those with a lower charge density will elute first.

For another example, say we are analyzing an air sample that has been collected onto an air filter and put through filter extraction (adding water to the filter, purifying by putting through another filter, and extracting the water to be the sample). The samples are then further prepared to put into the IC (ion chromatograph) by adding a given amount of the sample and a given amount of a water. A series of standard solutions and water are first put through the IC in order to calibrate the instrument. The standard solutions consist of certain cation or anion, depending on which ion chromatography is being performed, that are to be detected in the samples. Once all the samples have been put through the IC an ion chromatrogram (see image)is created for each standard and sample solution. In the ion chromatogram the analyte separation can been seen. Each analyte travels through the column at a different rate due to the positively or negatively charged resin. In the ion chromatogram the time at which it takes each analyte to pass through as well as the amount present can be seen. Each analyte will travel through the column at a consistent time in each sample thus each peak can be determined to be certain analytes.

External links[edit]

An affinity column.

Affinity chromatography is an applicable technique used to purify proteins. It is performed depending on the advantage of the high affinity of proteins for specific chemical groups. Affinity chromatography was discovered by Pedro Cuatrecasas and Meir Wilcheck in 1968.

This process is generally used to isolate interested protein from the pool of proteins. A column is filled with beads that contain covalently attached glucose residues. It is taken in consideration that these residues are chosen corresponding to the target protein. As the protein mixture is poured into the column, the proteins will travel down through the beads. The target protein will be recognized and get trapped to the column by covalent bond because of its affinity for glucose. The rest of proteins will run down to the column and be separated. The portion of buffer needed to be added to the column to wash out completely the unbounded protein. Lastly, a concentrated solution of glucose with be added to separate the target protein from the column-attached glucose residues.

The starting part included an undefined heterogeneous mixture of molecules in solution. The desired molecules will have defined property which can be exploited during the affinity purification process. The process is a setup in which the target molecule becoming trapped on stationary medium. The non-target heterogeneous mixture will not become trapped due to its unbounded ability. The solid medium can then be removed from the mixture, washed multiple times, and the target molecule released from the entrapment in a process known as elution with high concentration of specific chemicals or altering the conditions to decrease the binding ability. Also, it is important that the reaction is carried in an appropriate pH; otherwise, it may reduce the affinity and change the conformation of the proteins, preventing the target protein to bind to the residues as expected.

Affinity chromatography is a powerful means of isolating transcription factors, proteins that regulate gene expression by binding to specific DNA sequences. A protein mixture is percolated through a column containing specific DNA sequences attached to a matrix. Proteins with a high affinity for the sequence will bind and be retained. In this instance, the transcription factor is released by washing with a solution containing a high concentration of salt.

In general, affinity chromatography can be effectively used to isolate a protein that recognizes group X by:covalently attaching X or a derivative of it to a column, adding a mixture of proteins to this column, which is then washed with buffer to remove unbound proteins, eluting the desired protein by adding a high concentration of a soluble form of X or altering the conditions to decrease binding affinity. Affinity chromatography is most effective when the interaction of the protein and the molecule that is used as the bait is highly specific.


Affinity chromatography is mainly used in biochemistry to

• Purify certain proteins from a mixture

• Reduce the amount of a certain protein molecule in a mixture of multiple proteins

• Discover the affinity of substances to biological compounds, in this case protein.

Diethylaminoethyl group used to bind negative charge group
Carboxylmethyl group used to bind positive charge group

Combinatorial Chemistry[edit]

Affinity chromatography can also be used in combinatorial chemistry (in-vitro evolution), in which you can imitate the process of evolution by creating large sets of molecules and selecting for a specific function. In this case, you start from a diverse population of molecules, then select for particular proteins, and reproduce that molecule. For instance, starting with a randomized pool of RNA segments and an ATP affinity column, you would apply the RNA pool to the top of the column. Next, you would allow the selection of ATP-binding molecules to occur, eluting from the RNA pool all the segments that did not bind to the ATP. Then to elute the bound RNA molecules, you apply ATP to the top of the column. This isolates the selected RNA molecules that are bound to ATP. You can expand this selection by using different salt concentrations, with increased salt concentrations being more selective.

Immunoaffinity Chromatography[edit]

An example of immunoaffinity chromatography is by the use of blood antibodies. Blood antibodies can be purified by use of affinity purification form the blood plasma (serum). If there is antibodies in the blood plasma that are against some particular antigen we can use this for the antigen purification by using affinity. A common example to see if an organism is immune against a GST-fusion protein by observing if it produces antibodies against GST tag and the fusion-protein. Foremost, the GST affinity matrix is allowed to bind to the blood plasma. Allowing the blood plasma to bind helps remove antibodies against the GST. Separation of the blood plasma form the solid helps it bind to the GST-fusion protein matrix which in turn traps the antigen that is recognized by the antibody in the solid support. Using low pH ( pH 3 ) buffers for elution helps obtain the desired antibodies. Collection of the eluate is mostly done in phosphate buffer to neutralized the low pH.

Immobilized metal ion affinity chromatography (IMAC)[edit]

IMAC is particular based on coordination with covalent bonds form amino acids to metals. The concept of this technique is to keep in the column proteins with affinity to the metal ions which get immobilized inside the column. Iron, gallium or zinc can be used to purify phosphorylated proteins or peptides. Common metals for binding histidine are copper, cobalt, and nickel. DNA recombinant technologies are use since many natural occurring proteins do not have affinity to metal ions.

Interaction materials[edit]

These are typical biochemical interactions in nature that have been used extensively in affinity chromatography:

• Enzyme will bind to substrate analogue, inhibitor, and cofactor

• Lectin will bind to polysaccharide, glycoprotein, cell surface receptor, cell

• Antibody will bind to antigen, virus, cell

• Nucleic acid will attach to complementary base sequence, histones, nucleic acid polymerase, and nucleic acid binding protein

• Hormone, vitamin will bind to receptor, carrier protein

• Glutathione will bind to glutathione-S-transferase or GST fusion proteins

• Metal ions will attach to Poly (His) fusion proteins, native proteins with histidine, cysteine and tryptophan residues on their surfaces.

First technique[edit]

Commonly, affinity chromatography will be done through column chromatography. First of all, the binding ability of protein must be studied. Then, the solid medium modified with the binding material is packed in a chromatography column. Then, the initial mixture that contained desired proteins was added through the column to allow binding to occur. A wash buffer was gradually added to the addition of mixture. The elution buffer subsequently removes unbounded protein from the column and collected.

Affinity Chromatography Technique.jpg

Elution Methods[edit]

There is no generally applicable elution methods for all affinity media. When substances are very tightly bounded to the affinity medium, it may be useful to stop the flow after applying eluent, usually 10 minutes to 2 hours is referred, before continuing the elution process. This extra time helps to improve recovery percentage of bounded protein.

Forces that maintain the complex of substrate and bound substances include electrostatic interactions, hydrophobic interactions, and hydrogen bonding. Agents that deteriorate these interactions may be expected to function as efficient eluting agents. The optimal flow rate to achieve efficiency may vary according to the specific interaction.

pH elution method[edit]

This is one of most common techniques that are used to remove bounded protein from the ligands. A change in pH alters the charged groups on the ligands and/or the bound protein. This change may directly affect the binding sites and reducing their affinity. On the other hand, a change in pH can cause indirect modification in affinity by altering in conformation of proteins. A sudden decrease in pH is one of the most common methods to elute bounded proteins. The chemical stability of the ligand and target proteins determines the limitation of pH change. The column should always return to neutral pH immediately after the elution to avoid irreversible denature of proteins.

Ionic strength changing method[edit]

Changing ionic strength of buffer solution will alter the specific interaction between the ligand and target protein. This method is a mild elution using a buffer with increased ionic strength usually sodium chloride, applied as a linear gradient or in steps.

Competitive agents elution[edit]

Selective eluents are often utilized to separate substances on a specific medium or in the presence of high binding affinity of the ligand/target protein interaction. The eluting agent competes either for binding to the target protein or for binding to the ligand. This is an example of competitive inhibitors that occur in nature. Substances may be eluted either by a concentration gradient of a single eluent. In this method, the concentration of competitive agents should be added equally to the concentration of the coupled ligand. However, if the free competing compound binds more weakly than the ligand to the target protein, use a higher concentration of competitive agent to achieve efficiency in elution.

For example of competitive affinity chromatography, There is R1a protein. The target R1a protein bind to cAMP resin. The interaction between R1a protein and cAMP would separate by using cGMP elution buffer. This cGMP compete with target protein, however the elution buffer which contains high concentration of cGMP would bind to resin more. The separated R1a protein will eluted out.


Reduced polarity of eluent[edit]

Conditions are used to lower the polarity of the eluent promote elution without inactivating the proteins. Dioxane or ethylene glycol are typical of this type of eluent.

Chaotropic eluents[edit]

In case of other elution methods fail, deforming buffer solution, which alters the structure of proteins, can be used to achieve separation of ligand and target proteins. Typical chaotropic agents are guanidine hydrochloride and urea. Although this method will yield the highest percentage of recovery, chaotropes method should be avoided whenever possible since they are to denature the eluted protein.

Guanidinium chloride

Histidine tag[edit]

Affinity chromatography can be performed using a number of different protein tags. One of the common tag using in laboratory is poly-hisitidine. Its shortness in length prevents altering the conformation of the tagged protein. Histidine tagging is favorable because it is very specific, allowing for a high level of purification.

The gene which encodes for a specific protein is first modified to include the tag. A string of histidine residues may be added to the amino or carboxyl terminus of the expressed protein. The tagged proteins are then passed through a column of beads containing covalently attached, immobilized nickel(II) (Ni 2+.) This His-tag binds tightly to the immobilized metal ions because the side chain of Histidine, imidazole, has a specific binding affinity to metal ions (in this case, nickel II). As a result, the desired protein is binded tightly to the beads while other proteins flow through the column easily. Even other, non-desired proteins, that have Histidine side chains will flow through because they do not have as many as the desired, tagged protein, which would have about 6 adjacent Histidine residues. The protein can then be eluted from the column by addition of imidazole or some other chemicals that bind to the metal ions and displace the proteins. The presence of desired proteins can be verified through enzyme-linked immunosorbent assay (ELISA).



Nickel resin regeneration

In recombinant DNA, histidine tag on the desired protein and Nickel resin are commonly used to purify desired protein via affinity chromatography. That is, histidine has strong affinity towards the nickel resin which does not flow through the column. Undesired proteins do not have the designed histidine sequences hence could not bind to Nickel resin; those protein flow though the column. During elution, we add a relatively high concentration of imidazole buffer. Imidazole compete with our desired protein to bind with the nickel resin. In practice, Nickel resin is rather expensive. Regeneration of Nickel resin is essential. It involves several steps. First, there are possible left over protein remained on used Nickel resin; these left over protein are denatured and washed away using Guanidinium chloride and corresponding buffer. The Nickel resin is washed with Milli Q water and increasing concentration of Ethanol. One essential step in Nickel resin regeneration is recharging the Nickel. We first remove the Nickel with EDTA, which is a hexa-dendate compound that releases the Nickel ion. The Resin would then turn white without Nickel. Then the resin is recharged with high concentration of nickel salt to obtain our slightly green resin.

Add caption here

Glutathione S-transferase (GST) tags[edit]

GST has an affinity for glutathione, which is available immobilized as glutathione agarose. An excess amount of gluthione is used to displace the tagged protein for elution. Together with the histidine tags, the purification of recombinant proteins like GST tags is the most common use of affinity chromatography.


GST are enzymes involving cellular defense against electrophillic compounds. It has hing affinity and specificity to bind with glutathinone. The strength and selectivity of this interaction allow GST tagged proteins to be purified by the glutathione-based protein resins. The glutathione resins selectively bind to GST-tagged proteins effectively, allowing the specific protein of interest be separated from the mixture at high efficiency.

GST is a 35-KDa protein, it has small peptides. It is this characteristic which allows one to perform GST-protein purification quickly without degradation by proteases and minimize sample loss.GST will lose its ability to bind Glutathione resin when it is denatured, therefore, strong denaturant such as Guanidine-HCl and urea cannot be added in the buffers.

Lectin Affinity Chromatography[edit]

Lectin protein, for example concanavalin A which is originally extracted from the jack-bean Canavalia ensiformis, binds specifically to some certain structures in sugars. Lectin affinity chromatography is one kind of affinity chromatography in which the plant protein concanavalin A is purified by passing a crude extract through a column of beads containing covalently attached glucose residues. Since it has affinity to glucose, concanavalin A will bind to this type of column. A concentrated solution of glucose is then added to remove the bound concanavalin A from the column.

Advantages and disadvantages[edit]


• Affinity chromatography is a fairly achievable technique because of the great selectivity of the glucose residues and the target protein, giving purified product with a high yield of recovery.

1• It can be a one step process in many cases.

2• The technique can be used for substances of low concentration.

3• Rapid separation is achieved while avoiding contamination.

4. Unlike Gel filtration chromatography and ion-exchange chromatography, affinity chromatography would be able to isolate one specific protein at a time, where other techniques will isolate proteins with similar characteristics.


• The interaction of proteins of interest and ligand has to be determined carefully. This process required expensive materials, time, and small amount of protein that can be processed at once.


Biochemistry, Berg, 6th edition, ISBN 0-7167-8724-5


General Information[edit]

Hydrophobic Interaction Chromatography (HIC) (or Hydrophobic Chromatography) is a method of separation by using salt gradients (i.e. ammonium sulfate) to generate hydrophobic interactionsbetween protein and the ligands on the solid phase support resin [1]. The purpose of this type of chromatography is to utilize the hydrophobic properties of specific proteins rather than their charges, which is used in ion-exchange chromatography. Therefore, the more hydrophobic a protein is, stronger it will cling to the column and elution proceeds with the least hydrophobic proteins emerging first from the column. The salt gradient is important because it increases hydrophobic interaction and stabilizes proteins [2]. During elution, other factors besides hydrophobicity still affect how proteins separate, such as ionic interactions, pH, temperature, salt concentration, solvent amount, buffer conditions, etc. These attributes also point to the similarities between HIC and reverse phase chromatography and affinity chromatography [3]. It is important to note that HIC is advantageous because it can be prepared for specific proteins and applied to different facets of protein purification. Conditions may be altered in minor ways to apply the test to many other situations for purification and study purposes, especially in cell membrane studies.


  1. Wikibook:Proteomics - Hydrophobic Interaction Chromatography [13]


  1. Tosoh Bioscience. "FAQ's HPLC Columns - HIC". Tosoh Bioscience LLC. Retrieved 2009-10-17. 
  2. Khalsa, Guruatma. "Chromatography". Arizona State University. Retrieved 2009-10-17. 
  3. Er-El, Zvi; Shaltiel, Shmuel. "Hydrophobic Chromatography: Use for Purification of Glycogen Synthetase". Proceedings of the National Academy of Sciences of the United States of America. Retrieved 2009-10-17. 
Column Chromatography

Column chromatography is another method used to separate proteins or molecules from each other. It is essentially an upside-down version of TLC (Thin Layer Chromatography) - relying on the same physical principles, except that while TLC is driven by capillary forces for moving the solvent, column chromatography allows gravity to drive down the eluent. In this method, a sample to be separated is applied to the top of a glass column. the glass column is then packed with a solid phase. The purpose of this solid phase is to separate the compounds in the sample into different zones. Silica gel (SiO2) and alumina (Al2O3) are common adsorbents. The one expected to percolate out of the column first is the component that has the least interactions with the silica gel, so therefore the one that is least polar. The eluent carries the soluble compounds with it. When the column is packed with Silica Gel, the band expected to percolate out of the column first is the component that has the least interactions with the silica gel, so therefore the one that is least polar. The eluent carries the soluble compounds with it. The polarity of the eluent can be progressively increased from a nonpolar solvent to a polar solvent because as the nonpolar component is collected first, the bands of components left in the column are more polar. A more polar solvent would be more efficient to carry the polar component left in the column. After column chromatography is used to separate the mixture according to their respective polarities, Thin Layer chromatography should be used to separate the mixtures to observe the fractions separation and combine any that have “climbed” the same distance. The compounds found to have the same polarities using TLC are combined and analyzed by taking their respective melting points and comparing them to literature values. The amount of sample is used to figure out the initial concentration of each in the original sample. If the compound is colored this is easy, however if the compound is a clear solution then the plates can be CAM stained (Iodine can also be used) or put under UV to track their location. After these bands are collected, the solution can be put under a rotovap to evaporate off the solvent and a clean compound can be obtained. If the solvent is volatile, it can be evaporated in the hood or over night. The samples can also be heated in a sand bath.

Once the column is packed with dry stationary adsorbent material (such as silica gel), there are generally two methods to load column chromatography: wet loading and dry loading. The ability of the mixture to dissolve in a polar or non polar solvent determines the method of the column chromatography.

dry Column Chromatography separation
  • In wet loading method, the adsorbent is suspended in solvent and the slurry is transferred into the column as the eluent. This method is most commonly used when the desired separating mixture is soluble in the least polar solvent or a non-polar solvent. If excessively polar solvent is used, then it will stay inside the column and increase local polarity, which can mix the separation on the column.
  • In dry loading method, mixture is first dissolved in a minimal amount of solvent and the adsorbent material. Once the solvent mixed with the mixture and adsorbent is evaporated, the dried compound can be added into the column. After the addition of the dried compound, the column is flushed with mobile phase (can be polar solvent with various polarity, but they should be added with increasing polarity), and the column is not allowed to run dry after the addition of mobile phase. This method is most commonly used when the mixture is only soluble in solvent that are more polar than the eluent of choice.

The chemical compounds are separated and collected within the column. The separated sample can then be tested for purity and other properties. As the sample is applied to the top of the column, it is also washed with a solvent. As the sample moves over the solid phase in the column, the different molecules or compounds in the sample will begin to separate from each other into zones. The compounds in the sample will bind to the solid phase, but then the sample will also release from the solid phase and then bind to the liquid solvent that passes over it. This is a continuous process. A compound will bind to the solid phase, then release and bind to the solvent. it will then rebind back to the solid phase, and again rebind to the liquid solvent. This process keeps occurring as the compound moves down the column. Different molecules in the sample will have a different binding affinity to the solid phase or the liquid phase, these differences in affinity is what allows the molecule in a mixture to travel at different speeds and separate from the other compounds.(NOTE :*The preceding section was a description of column chromatography for organic chemistry.) This method is sometimes called reverse phase chromatography in biochemistry.

The factors that determine the distance a compound travels are 1) the interaction between the solvent and adsorption layer, 2) the interaction between the solute and adsorption layer, 3) the polarity of the solute, solvent, and adsorption layer, and 4) the weight of the solution. To determine the distance a compound travels one may calculate the retardation factor (Rf). The Rf value can be found by looking at a TLC plate that is spotted with the fractions collected after running a column. The Rf value is the ratio of the distance traveled by the solute over the distance traveled by the solvent. The range of Rf is 0 to 1. If the calculated Rf value is higher than desired, then a less polar solvent should be used when running the column.


Organic Chemistry Laboratory third edition with Qualitative Analysis,By Bell Jr, Charles E; Taber, Douglas F.; Clark, Allen K. Harcourt College Publisher Planar chromatography is one type of chromatography technique in which the stationary phase is on a flat plate and the mobile phase moves through stationary phase due to capillary action. This technique was used to separate the mixture. There are two types of planar chromatography: A. Thin layer chromatography TLC. B. Paper Chromatography

Basic Concept of Paper Chromatography[edit]

Though this is a different kind of chromatography, it still separates mixtures of substances into the individual components, molecules, even atoms. The size and concentration of the component is determinant of the component's rate. The stationary phase, which is either a solid or a liquid supported by a solid, is absorbed in a uniform manner in paper chromatography. On the contrary, the mobile phase, being gas or liquid, serves as the solvent. Compounds can travel as far as the solvent does when the paper is dipped in a container filled with solvent. These compounds travel at different rates and separated into distinctly colored dots on the paper. The solvent that is used can be either nonpolar or polar. These properties affect the solubility of the compounds and components in the particular mixture. Polar components will be attracted to the water molecules attached to the cellulose (paper) and not attracted to a nonpolar solvent. The chromatogram will not contain the polar components, given that it doesn't climb up the paper with the nonpolar solvent. These components spend more time in the stationary phase rather than the mobile phase therefore the rate of moving up the paper is slow. If it were the opposite and nonpolar components were in a polar solvent, then the same thing will occur. The mobile phase can be various organic solvents or mixture. The compound can be stained with iodine in order to visualize where they have traveled easily. [1]

The stationary phase can be called a paper chromatogram. Usually, one will split the paper into individual lanes so that multiple trials can be done with one paper. Also it will allow the experimenter to compare the differences or similarities present in each lane depending on how far the compound has traveled. [1]

The paper is placed in a container with a shallow layer of a suitable solvent or mixture of solvents in it. Sometimes the paper is just coiled into a loose cylinder and fastened with paper clips top and bottom. Then the cylinder stands in the bottom of the container. The container is covered to make sure that the atmosphere in the beaker is saturated with solvent vapor. Saturating the atmosphere in the beaker with vapor stops the solvent from evaporating as it rises up the paper. As the solvent slowly travels up the paper, the different components of the ink mixtures travel at different rates and the mixtures are separated into different colored spots.

The distance travelled relative to the solvent is called the Rf value. Its formula is: Rf = distance travelled by compound / distance travelled by solvent. Thus, the higher the Rf value, the further the compound has traveled up the paper. The main benefit of the Rf value is that we can now compare values similar values and conclude that they are indeed the same compound[1]

General Scheme[edit]



Thin Layer Chromatography (TLC)

Thin layer chromatography (TLC) is an extremely valuable technique in the organic lab. It is used to separate mixtures, to check the purity of a mixture, or to monitor the progress of a reaction. The polarity of the solute, polarity of solvent, and polarity of adsorbent are crucial factors that determine the mobility rate of a compound along a TLC plate. This technique helps separate different mixtures of compounds based on their mobility differences. TLC can also be used to identify compounds by comparing it to a known compound

Thin layer chromatography (TLC): this technique was used to separate dried liquids with using liquid solvent (mobile phase) and a glass plate covered with silica gel (stationary phase). Basically, we can use any organic substance (cellulose polyamide, polyethylene, etc.) or inorganic substance (silica gel, aluminum oxide, etc.) in TLC. These substances must be able to divide and form uniform layers. On the surface of the plate, will be a very thin layer of silica which is considered the stationary phase. Then, add a small amount of solvent into a wide-mouth container (i.e. beaker or developing jar) just enough to cover the bottom of the container. Place the prepared TLC plate into the sealed container which has small amount of a solvent (moving phase). Due to capillary action, the solvent moves up to the plate and now we can remove the plate and analyze the Rf values.

Usually TLC is done on a glass, plastic, or aluminum plate coated with silica gel, aluminum oxide, or cellulose. This coating is called the stationary phase. The sample is then applied to the bottom of the plate and the plate placed in a solvent, or the mobile phase. Capillary action pushes the sample up the plate. The rate the samples move up the plate depends on how tightly the sample binds to the stationary phase. This is determined by polarity. The Rf values or the Retention Factors are then compared for analysis. The retardation factor of a solute is defined as the ratio between the distance traveled by a compound to that of the solvent in a given amount of time. For this reason, Rf values will vary from a minimum of 0.0 to a maximum of 1.0. However, this retardation factor for a given protein compound will vary widely with changes in the adsorbents and/or solvents utilized. In addition, the retardation factor can vary greatly with the content of moisture in the adsorbent. The Rf values or the Retention Factors are then compared for analysis. This Rf value can be quantified as such:

Rf = (Distance that compound has traveled)/ (distance that the solvent has traveled)

A light pencil line is drawn approximately 7 mm from the bottom of the plate and a small drop of a solution of the dye mixture is placed along the line. To show the original position of the drop, the line must be drawn in pencil. If it was drawn in ink, dyes from the ink would move up the TLC plate along with the dye mixture and the results would not be accurate. In order to get more accurate results, dot the TLC paper with the dye mixture a few times trying to build up material without widening the spots. A spot with a diameter of 1 mm will give good results. While dotting the TLC plate, be sure to not dot mixtures too close to one another because when the dye mixture rises up the TLC plate, it will clash with the other spots and the Rf values will be difficult to calculate.

When the spots are dry, the TLC plate is placed in a beaker, with the solvent level below the pencil line. Cover the beaker to ensure that the atmosphere in the beaker is saturated with solvent vapor. Line the beaker with some filter paper soaked in solvent because this will help in the process of separating the mixture. Saturating the atmosphere in the beaker with solvent vapor stops the solvent from evaporating as it rises up the plate.

As the solvent slowly travels up the plate, the different components of the dye mixture travel at different rates and the mixture is separated into different colored spots. The solvent is allowed to rise until it approximately 1-1.5 cm from the top of the plate. This gives the maximum separation of the dye components for this particular combination of solvent and stationary phase.

Once the maximum separation of the dye components for this particular solvent and stationary phase solvent is induced, the TLC plate is removed from the beaker and allowed to dry. Immediately after removing the TLC plate, use a pencil to mark the solvent front before the solvent begins to evaporate. The solvent front is the line where the solvent rose up to on the TLC plate. Then, let the solvent evaporate from the TLC plate. The separated compounds are circled/marked to indicate their position on the plate. In some cases, the compounds that have traveled up the TLC plate do not give off any noticeable appearance with the naked eye. In such cases, the TLC plate can be dipped briefly in a visualizing solution containing certain reagents that will react with the separated compounds to form a colored compound upon heating. Another way to visualize colorless organic compounds separated on a TLC plate is by placing them in iodide (I2) vapor to test their absorption of iodide vapor. These TLC plates with colorless marks are placed in a bath of iodine vapor prepared by placing a small amount of iodine crystals in a tightly capped jar. Colorless spots gradually gain a dark brown color after placing the TLC plates in the bath for approximately 10 minutes. For the reason that the colored spots usually disappear in a short period of time, they are outlined immediately with a pencil after the TLC plate is taken out of the iodine bath.

In addition to the visualization technique of an iodine bath, a fluorescent indicator can also aid in helping to determine the distance in which the separated compounds had traveled. A short- wave ultraviolet lamp is used to illuminate the adsorbent side of the plate in a darkened room/ area. Many compounds will decrease the intensity of the fluorescent. Using this UV light visualization technique, the separated compounds appear as dark spots on the fluorescent TLC plates. It is often easier to visualize the darkened spots with 365-nm light. These dark spots are outlined with a pencil while the plate is under the UV light source to give a permanent record of the location in which the analyzed compounds had traveled.

Some examples of interpretation of TLC plates under UV light:

1. TLC gives useful qualitative results and interpretations. For example, if an individual wants to compare the components in an unknown mixture to standard compound A and B, TLC can be ran and if the dark spots for unknown under UV light aligns with those of compound A and B, the unknown contains both A and B.

Example 1

2. If there is only one dark spot for the unknown and it is uncertain whether the spot for compound A is at the same level as the spot for the unknown, one can co-spot both compounds on the TLC plate for a quick check. Co-spot means to spot compound A on one area of the TLC plate and spot the unknown on the same area as the spot of compound A. If there is only one dark spot under the UV light for the co-spotting lane, the identity of unknown is A.

Example 2

The co-spot result for example 1 should contain only 2 spots where one spot represents compound A + one component in the unknown, and the other spot represents another component in the unknown mixture. Extra spots may indicate that one of the components in the unknown does not match with the standards.

3. How can someone tell the reaction between A+B actually occurs to give a new product C? TLC can be used to check. Compound A and B are spotted on a TLC plate separately. The mixture of A+B (C ) is then spotted on the TLC plate and after each time period a new sample can be spotted (C2, C3, and so on).

Two spots on C1 align with A and B suggests just a mixture of A+B, not a new product. C2 and C3 still have one spot aligning with reactant A, but C4 has both spots that do not match with either reactant, where C5 only has one dark spot. There are two possible interpretations:

1) C2 to C4 are intermediates to the new product in C5,

2) the desired product is actually C4 and it degrades to just having one component on the plate.

Example 3
  • Tips in a lab:

1) A capillary tube is used to transfer solution onto the TLC plate. Smaller origin spots will give smaller area and better separation of dark spots under UV light and this will make calculation of Rf easier and more accurate.

2) The container with TLC plate and solvent should always be on a flat surface in order to get a "straight lane" for the run.

Effect of Solvent in TLC plate[edit]

The effect of increasing the polarity of a solvent, this leads to a greater separation

As you might already know, the TLC plates are made of silica gel, which is a polar compound, and is the reason why non-polar compounds tend to have a great separation on TLC plates.

As shown in the diagram, initially, the solvent used consisted of a 7:3 ratio of hexane to hexyl acetate. This means that a majority of the solvent reacting with the TLC plate will be nonpolar. Due to the lack of polarity of the solvent, there is less competition between the spotted samples and the TLC plate, thus, the polar parts of the sample will readily react with the silica gel leading to less of a separation. Because there is nothing 'hindering' the sample from reacting with the silica gel, it reacts right away and its separation is 'bogged down.' Think of a dog walking down a pathway, if the dog stops to sniff at every tree on the way, its distance separated from the beginning is less than if it had just kept walking without being distracted by the surroundings. This is the same with these samples, if they are constantly reacting with the silica gel as they are moving they will not move as far.

Now when the ratios are switched, and there is more of the hexyl acetate(more polar), then all of a sudden there is competition for the reacting with the TLC plate. The sample wants to react with the TLC plate, but so does the solvent(since it is now more polar), thus there will be less reaction of the sample with the TLC plate. Obviously the solvent is trying to react with the TLC plate, leading to the sample not getting as much of a chance to "stop and sniff" so it is separated further. The sample reacts less with the TLC plate because now there is the solvent reacting with the same TLC plate, and this explains why there is a greater separation.

Now the 3d TLC plate in the diagram is a bit tricky. One might think that petroleum ether would be semi-polar due to the name(it has ether in it), but actually petroleum ether is a non polar compound which consists of many hydrocarbon molecules. This will not lead to any different separation.

Gas Chromatography Diagram.

Gas Chromatography is common type of chromatography which is used to analyze or separate volatile components of a mixture. This technique helps us to test the purity of a particular substance or separate different components of a structure. Basically, the mechanism of this technique is carried out by injecting syringe needle which contains a small amount of sample into the hot injector port of gas chromatography. The injector is set to the temp that is higher than the boiling points of the components so that the components will be evaporated into gas phase inside the injector. The carrier gas (normally is Helium) then pushes the gaseous components into gas chromatography column. The separation of components occurs here, form partition between mobile phase (carrier gas) and stationary phase (boiling liquid). More interestingly, gas chromatography column showed what’s inside, the maximum temperature along with the length and diameter due to the presence of metal identification tag on the column. Additionally, the column temperature is raised by the presence of heating element. The detector inside the gas chromatography will recognized the differences in partition between mobile and stationary phases. The molecules reach the detector, hopefully, at different intervals depending on their partition. The number of molecules that regenerate the signal is proportional to the area of the peaks.

Although gas chromatography has many uses, GC does have certain limitations. It is useful only for the analysis of small amounts of compounds that have vapor pressures high enough to allow them to pass through a GC column, and, like TLC, gas-liquid chromatography doesn't identify compounds unless known standards are available. Coupling GC with a mass spectrometer combines the superb separatiion capabilities of GC with the superior ID methods of mass spectrometry. GC can also be combined with IR spectroscopy. IR can help to identify that a reaction has gone to completion. If the functional groups of the product are depicted in the IR, then we can be sure that the reaction has gone to completion. This can also be depicted in the GC analysis. The presence of peaks that do not correlate with the standards may be due to an incomplete reaction or impurities in the sample.

The basic parts of a GC machine are as follows:

  • Source of high- pressure pure carrier gas
  • Flow controller
  • Heated injection port
  • Column and column oven
  • Detector
  • Recording device

A small hypodermic syringe is used to inject the sample through a sealed rubber septum or gasket into the stream of carrier gas in the heated injection port. the sample vaporizes immediately and the carrier gas sweeps it into the column. The column is enclosed in an oven whose temperature can be regulated. After the sample's components are separated by the column, they can pass into a detector, where they produce electronic signals that can be amplified and recorded.

The steps need to be followed to use Gas Chromatography:--Cherryblossom06 (discusscontribs) 06:02, 22 November 2012 (UTC)

1. Wash syringe with acetone by filling it completely and pushing it out into a waste paper towel.

~Possible errors that can occur during Gas Chromatograpy can be due to the improper rinsing of the syringe. The syringe should be rinsed twice with acetone and once or twice with the sample. If improper rinsing ensues, unknown peaks can occur and alter our analysis of the sample. This error can be easily avoided. About 1 micro liter of sample is needed.

2. Pull some sample into the syringe. Air bubbles should be removed by quickly moving the plunger up and down while in the sample.

3. Turn on chart recorder, adjust chart speed in cm/min, set baseline by using zero so that the baseline is 1 cm from bottom of chart paper ( set 0), turn on the chart.

4. Inject sample into either column A or column B and push the needle completely into the injector till we can’t see the needle, then we pull the syringe out of the port.

5. Mark the initial injecting time on the chart. ~The sample should be injected at exactly the same time as the 'start' button is pressed. Otherwise, take note of how long after injection recording started. If the sample is not injected at the exact time the button is pressed, retention times will be off in the calculations.

6. Clean the syringe immediately.The syringe should be rinsed with acetone before injecting a different sample. Rinse before any other sample is injected and after every sample.

7. Record current (in milliamperes), temperature (in Celsius).

Notes on Injection:

1. The injection site, the silver disk, is very hot.

2. The needle will pass a rubber septum so there will be some resistance. Some machines have a metal plate near the septum, so if there feels like metal resistance, the needle should be pulled out and tried again. The needle should be completely inserted into the injection point if done correctly.

3. Quick injection is needed for good results.

4. Take out the needle immediately after injection.

Liquid Chromatography[edit]

Liquid Chromatography is a is a separation technique in which the mobile phase is a liquid. This technique can be done on either a column or a plane. Nowadays liquid chromatography is done by high performance liquid chromatography.

In High Performance Liquid Chromatography, the sample is forced by the mobile phase, a liquid at high pressure, through a stationary phase column that is irregularly packed, has spherically shaped particles, or a porous monolithic layer.

Isoelectric Focusing[edit]

Isoelectric focusing or also called the pI of the protein is the pH at which its net charge is zero. A separation technique which separates peptides according to how acidic and basic their residues are. A gel with a pH gradient is used as the medium. The pH gradient is made by adding polyampholytes, which are multi-charged polymers, with different pI into the gel. Then the sample is put onto the gel and a voltage is applied. The proteins will move along the gel until they reach their isoelectric points. In other words, each protein will move until it reaches a position in the gel at which the pH is equal to the pI of the protein. a protein band that forms at a given pH can then be removed and analyzed further. This process can successfully separate proteins that have a difference in net charge greater than or equal to 1.

Isoelectric point (pI): The pH at which the net charge on the protein is zero. For a protein with many basic amino acids, the pI will be high, while for an acidic protein the pI will be lower.

Isoelectric focusing is a type of zone electrophoresis, and it is usually performed in a gel, that takes advantage of the fact that a molecule's charge changes with the pH of its surroundings. A protein that is in a pH region below its isoelectric point (pI) will be positively charged and so will migrate towards the cathode. As it migrates, however, the charge will decrease until the protein reaches the pH region that corresponds to its pI. At this point it has no net charge and so migration ceases. As a result, the proteins become focused into the sharp stationary bands with each protein positioned at a point in the pH gradient corresponding to its pI. This technique is capable of extremely high resolution with proteins differing by a single charge being fractionated into separate bands.

Molecules to be focused are distributed over a medium that has a pH gradient (usually created by aliphatic ampholytes). An electric current is passed through the medium, creating a "positive" anode and "negative" cathode end. The negatively charged molecules migrate through the pH gradient in the medium toward the "positive" end while positively charged molecules move toward the "negative" end. As a particle moves towards the pole opposite of its charge it moves through the changing pH gradient until it reaches a point in which the pH of that molecules isoelectric point is reached. At this point the molecule no longer has a net electric charge (due to the protonation or deprotonation of the associated functional groups) and as such will not proceed any further within the gel. The gradient is initially established before adding the particles of interest by first subjecting a solution of small molecules such as polyampholytes with varying pI values to electrophoresis.

Isoelectric Focusing

The method is applied in the study of proteins, which separate based on their relative content of acidic and basic residues, whose value is represented by the pI. Proteins are introduced into an immobilized pH gradient gel composed of polyacrylamide, starch, or agarose where a pH gradient has been established. Isoelectric focusing can resolve proteins that differ in pI value by as little as 0.01. Isoelectric focusing is the first step in two-dimensional gel electrophoresis, in which proteins are first separated by their pI and then further separated by molecular weight through SDS PAGE.

How to determine pI of amino acids[edit]

We can determine pI of each amino acid when we know its pKas by titration with NaOH. For example, glycine, the smallest amino acid, has two pKa values, which are 2.34 and 9.60, respectively.[14]

First, add strong acid and let glycine to become complete protonated form. Then gradually add NaOH until pH raises up to 2.34. At this point, we use 0.5 mol of NaOH equivalent to first protonated form of glycine. Also, There would be 0.5 mol of second protonated form generated in the solution. After using 1 mol NaOH equivalent to first pronated form, there would be solely second pronated form. We'll see that second protonated form of glycine is zwitterion, which is zero net charge molecule. Therefore pH at this point is called isoeletric point (pI) and equals 5.97. Continue adding NaOH once pH equals 9.60. At this point, 0.5 mol of third protonated form is present in the solution and total amount of NaOH is 1.5 mol. Back to pI, we see that

Then we can write in the general form:


Remark To determine pI of amino acid which has more than two pKa, we'll use two pKa values covering the range in which zwitterion would present in the solution.[15] This technique is not usable because it takes time and not work well, not good for big molecules. Thus, it is not popular technique to use.


In dialysis a semipermeable membrane is used to separate small molecules and protein based upon their size. A dialysis bag made of a semipermeable membrane (cellulose) and has small pores. The bag is filled with a concentrated solution containing proteins. Molecules that are small enough to pass through the pores of the membrane diffuse out of the bag into the buffer solution, or dialysate. Dialysis is sometimes used to change buffers. The molecules go from an area of high concentration to low concentration. When the level of concentration is equal between the bag and the buffer, there is no more net movement of molecules. The bag is taken out and inserted into another buffer, causing the concentration to be higher in the bag relative to the buffer. This causes more diffusion of molecules. This process is repeated several times to ensure that all or most of the unwanted small molecules are removed (usually done overnight). In general, dialysis is not a means of separating proteins, but is a method used to remove small molecules such as salts. At equilibrium, larger molecules that are unable to pass through the membrane remain inside the dialysis bag while much of the small molecules have diffused out.


Daily Application[edit]

The technique of dialysis is used in everyday life for hospital usages. Dialysis mimics one of the functions of a bodily organ, the kidneys. It is used in procedures to filter out the blood's toxins and waste products during kidney failure. During kidney failure, there is a build up of nitrogen-containing waste products (such as urea or creatine) in the body called azotemia, which can be detected from the blood. Patients result to a dialysis when the waste product accumulates on the blood causes metabolic acidosis leading to illness. Two tests are executed through a blood sample and a full day's worth of urine sample. There are two chemicals in the blood that are measured, the blood urea nitrogen level and the creatinine level. If these two chemicals are found to be high in the blood, then it is an indication that the kidneys are not cleansing bodily waste products efficiently. Certain solutes such as potassium and calcium are carefully calibrated at a concentration similar to the concentration of healthy blood. Another solute is Sodium Bicarbonate which is used as a pH buffer introduced by elevating the solute concentration within the dialysis to neutralize some of the matabolic acidosis occurring within the blood.

creatinine Creatinine.png


General Information[edit]

Protein Purification is the process of separating proteins for individual analysis. Protein purification is the second step of studying proteins, the first being the process of an assay. An assay is a procedure to measure the activity enzyme activity thus confirming the presence of the protein or proteins in interest. Popular assays include Western Blotting and ELISA(Enzyme-linked immunosorbent assay). Before the purification process, Cell Disruption is utilized to homogenize the cell's content. After the cell has been opened up, the process of purifying proteins from one another and the other organelles can be approached in several different methods. Protein mixtures are normally separated multiple times, each based on a different property, such as:

  • Solubility
  • Size
  • Molecular Weight
  • Charge
  • Binding affinity

The intended reason for purifying a specific protein governs the level and degree of protein purification. At times, a sample of protein that is only moderately purified suffices for its intended application; however, other situations require a higher degree of purification, especially if the fundamental ambition is to study the characteristics and tendencies of the specific protein in interest. By considering solubility, size, molecular weight, charge, and binding affinity, the goal of the scientist that conducts protein purification is to find a level of purification necessary and create a protein yield that is ample for further research and application. This means using the fewest amount of steps in order to keep the yield high, as each protein purification step incurs a degree of product loss. Therefore two factors serve as obstacles in protein purification: yield and purification level. The main goal of each protein purification project falls under two categories: analytical (for studying and research purposes) and preparative (for production and creation of commercial products).

There are many methods of purification including:

Proteins Purification Methods


Differential Centrifugation Salting Out Gel-Filtration Chromatography Ion-Exchange Chromatography Affinity Chromatography Hydrophobic Interaction Chromatography Gel Electrophoresis Isoelectric Focusing Two-Dimensional Electrophoresis Dialysis
Proteins are separated based on masses or densities by a centrifugal force. Centrifugation enables the separation of proteins in different cell compartments. Different proteins precipitate at different salt concentration. When the concentration of salt increases, more proteins are able to separate. Large molecules flow more rapidly to the bottom of the column. Proteins are separated according to its charge. Positively charged proteins bind to negatively charge bead, and negatively charge proteins are released. The negatively charged proteins flow through faster. Many proteins have high affinity for specific chemical groups. Proteins separate according to different levels of hydrophobicity. Electrophoresis separate protein while the gel enhances the separation. Small proteins move more rapidly through the gel. Different proteins have different pI (isoelectric point). Proteins are separated horizontally based on pI and vertically based on mass Proteins are separated through a semi-permeable membrane. Since the dimensions of proteins are generally larger than the pores of the membrane, proteins do not pass through and separate.

After each purification steps, the types of protein that exist in the in the solution is expected to decrease while its specific activity is expected to increase. These two qualities are desirable because experiment done using a pure protein sample gives a more quantifiable result. One method used to check the purity of the sample is using a form of Gel Electrophoresis, such as SDS PAGE or native PAGE.

Purification can also be quantitatively evaluated by measuring total protein, total activity, specific activity, yield and purification level. Total protein is the quantity of protein present in a fraction and can be determined by measuring the protein concentration of a part of each fraction and multiplying by the fraction's total volume. Total activity is measured by the enzymatic activity in the volume of fraction used in the assay multiplied by the fraction's total volume. Specific activity is the total activity divided by total protein. The yield is the amount of activity retained after each purification step. The purification level is the increase of purity which can be measured after each purification step by dividing its specific activity by the specific activity of the initial extract.

a good purification takes into account both purification levels of yield. A high amount of purification and a poor yield give little protein to work with. on the other hands, a low purification and a high yield give contaminated protein in the experiment.

Identifying Proteins[edit]

After purification is complete, how will you prove that you have successfully isolated the correct protein? Several techniques can be used to identify whether or not the isolated protein is the desired one, including immunological reactions.

Overview Millions of antibodies are produced by the body, with each one tailored to recognize specific protein structures. The "Y" shaped antibody recognizes protein structures through its binding site, which is able to attach to antigens with the perfect fit by forming intermolecular bonds. After being exposed to a pathogen, organisms can churn out several different antibodies that will recognize this same pathogen for every subsequent exposure. These polyclonal antibodies attach to different areas on the same pathogen to counteract mutations that change a pathogen's surface proteins and render a specific antibody recognition site obsolete.

Monoclonal Antibodies Though useful from an organism's standpoint, polyclonal antibodies prove to be messy and inefficient in the lab because the body does not produce them in exact ratios. Different antibody samples would consist of different relative amounts of several antibodies, each of which attach differently to the protein product. So how can a researcher force a model organism to create only one type of antibody for a particular protein? The solution was discovered by Cesar Milstein and Georges Köhler, who mixed anti-body producing cells with immortal cancer cells (Meyloma cells) capable of mass producing identical proteins over and over again. The hybrid cells capable of producing the desired antibody could then be selected and grown in mass culture or within the model organism itself as tumors.

Enzyme-linked Immunosorbent Assay (ELISA) There are two types of ELISA, "Indirect" and "Sandwich." Both use a specific antibody to recognize the desired protein. This first antibody must be specially produced for each and every different protein. After unbound antibodies or proteins are washed away, a second antibody that contains an enzyme capable of producing a visual confirmation that the isolated protein is present is introduced to solution. This second antibody is a generic antibody that can be used regardless of the specific protein.

Indirect ELISA: 1) A container is coated with protein.2)The first antigen, specific to the protein, binds to the protein. 3)The container is washed. If the desired protein is not present, the antibodies will not bind and will be removed from solution. 4)The second antibody with an enzyme is added and binds to the first antibody. 5)Binding to the first antibody induces a chemical reaction that causes a visually identifiable change in solution (color change or fluorescence), indicating that the first antibody is present, which in turn indicates that the desired protein is also present. SEE FIGURE 1.

Figure 1. Basic Indirect ELISA steps.

Sandwich ELISA: 1) A container is coated with the monoclonal antibody. 2)The protein is added and will bind to the antibody only if it is the desired protein. 3) The container is washed. Only the desired protein and antigens will remain (if any). 4) A second antibody linked to an enzyme is added and will attach to the protein. 5) Attaching to the protein will induce a chemical change that allows for visual confirmation that the protein is present. Note that since the second enzyme is attaching directly to the protein, the rate of visual change can be used to determine the amount of protein present. SEE FIGURE 2

Figure 2. Basic Sandwich ELISA steps.

Western Blotting 1) After separating the desired protein from other proteins or molecular impurities via gel electrophoresis, the resulting protein bands are transferred from the gel to a thin polymer sheet. This makes the proteins more accessible to reactions. 2) The monoclonal antibody is added. Only the desired protein will react with the antibody, so only one band will have antibodies attached. 3) The polymer sheet is washed to remove unbound antibodies. 4)A second antibody linked to an enzyme attaches to the first. 5) A chemical reaction induces a visual change in the band containing the desired antibody. Or photographic film can overlay the sheet and record the protein band that contains the attached antibodies. SEE FIGURE 3.

Figure 3. Basic Western Blotting steps


Cellulose acetate electrophoresis utilizes native protein charge to separate proteins based on their isoelectric point.

How it Works[edit]

A sample protein is dotted on the marked center of a cellulose acetate strip and the strip is placed in barbital buffer of a desired pH and voltage is applied across the strip. The proteins that migrate towards the anode have a pI greater than the pH of the buffer while proteins that migrate towards the cathode have a pI less than the pH of the buffer. Positively charged proteins migrate towards the cathode while negatively charged proteins migrate toward the anode.


Cellulose acetate electrophoresis can be useful in identifying multimeric proteins formed by different isoforms since each ratio of isoforms will have a different charge due to the different amino acid structure.

Quantifying Proteins[edit]

Knowing the quantity of a protein after each separation step is useful in checking the progress of purification and evaluating the technique's efficiency. Quantifying proteins also helps us understand how an organism functions as one. Several chromatography techniques rely on quantifying proteins by mass, with additional observables such as charge to provide further differentiation.

Specific Activity[edit]

Because specific activity is a ratio of the enzymatic reactions of a particular protein to the total amount of proteins, quantifying a protein can be followed throughout a purification. The equation for specific activity can be modeled as: . Therefore, as the total amount of protein decreases per step, the specific activity should rise. Generally, an assay performed will give the rate of reaction, in units such as micromoles per second. Dividing this rate by the concentration of your enzyme preparation yields the specific activity of a protein.

Ideally, the end of purification should be consistent with a constant specific activity. The specific activity can be monitored and used to quantify a purification by analyzing several variables which are total protein, total activity, yield, and purification level.


The concentration of a protein can be measured by immunological techniques such as ELISA or Western Blotting (the former being able to measure the quantity of protein present because of the direct proportionalities of reagents to proteins).

Activity can be measured using fluorescent techniques.

In order to determine how much activity is retained after each successive purification step in the crude extract, the yield can be calculated as . In order to convert this to a percentage, multiply the yield by (100). Also, it is important to note that in most cases, the amount of initial activity is always 100 %.

Purification Level[edit]

By obtaining a value for the purification level, we are able to assess how much purity has increased. The purification level can be calculated by: (

        • Important note: a purification scheme turns is only successful when taking into account BOTH purification levels and percent yield. Experimentation can become fairly complex if there is a high yield with very little purification. This is because there is an indication that there are a vast number of contaminants/proteins that aren't of interest. On the other hand, a purification level is high while the percent yield is low, then it is fair to conclude that there isn't enough protein available to carry out the experiment.

Total Number of Proteins[edit]

The amount of protein separated using chromatography or dialysis is determined by:

Total Enzymatic Activity[edit]

The recovered volume's activity is determined by:


In addition to electrophoresis and immunological assays, the use of ammonium sulfate, (NH4)2SO4, can also quantitatively evaluate a purification. Because ammonium sulfate is non-denaturing and very water soluble (it is high on the Hofmeister series), it is used to effectively precipitate proteins: at high concentration, the ammonium and sulfate ions absorb most of the water through hydroelectric attraction, leaving the proteins to aggregate and precipitate out. ^

The mass of a protein can be measured using the sedimentation-equilibrium technique. This method requires slow centrifugation of a sample in order to establish a balance between sedimentation and diffusion. Unlike SDS-Polyacrylamide Gel Electrophoresis, which gives merely an estimate of the mass of dissociated and denatured polypeptide chains, sedimentation-equilibrium provides accurate mass measurements without requiring denaturation, thereby allowing the native structure of multimeric proteins to be left intact. Furthermore, the number of copies of each polypeptide chain that are present in a multimeric protein can be determined based on the mass of the dissociated chains and the mass of the entire multimeric protein, as measured by SDS-polyacrylamide gel electrophoresis and sedimentation equilibrium, respectively.

Mass spectrometry is another accurate analytical technique for determining protein mass. In this technique, atoms are ionized through a machine and passed through a vacuum into the detector. In which then, the time of flight (TOF) in the electrical field is directly proportional to the mass of the protein (or the mass-to-charge ratio). Thus, the smallest protein in a protein mixture has the smallest TOF, whereas the largest protein has the largest TOF. This technique allows the identification and analyzation of molecules based on their size and mass. This approach however, does not entail too much information about the structure or conformation of a protein.


  1. [16] "Chapter 9: Protein expression, purification and characterization", Proteins: Structure and Function, Whitford, 2005, John Wiley & Sons, Ltd

Biochemistry, 6th ed., Berg et al., 2007 Freeman

General information[edit]

Gel electrophoresis is a technique used to display and assert that the purification scheme was effective by measuring the number of different proteins in a mixture. The basis of gel electrophoresis is the fact that molecule with specific net charge will move through an electric field. The speed of protein migration can be quantified as:

With E as magnitude of the electric field, z as net charge of a protein, and f as frictional kinetic coefficient.

Frictional coefficient, for spherical molecule, is determined as:

f = 6 π η r

with η as viscosity.

As its equation implies, the velocity of molecule traveling in the gel matrix depends on its size, shape, and the charge that it has. The smaller the molecule, the faster it will travel. Furthermore, Gels can be made in a variety of wt percents: 6%, 8%, 10%, 12% and 15%. Higher percentages are used primarily for smaller molecules and smaller percentages are used for larger sized samples. Theoretically, larger molecules can still be used with higher percents, but these gels may take a long time to develop. Charge can also be a factor in the speed and distance that a specific sample travels through the gel. Using a higher voltage will send the samples farther and faster. However, caution must be used with higher voltages as the heat it generates may melt the gels.

Gel Electrophoresis (SDS-PAGE; SDS-polyacrylamide Gel Electrophoresis) is a powerful tool to check the purity of the sample because because it can detect minuscule amount of protein. Different proteins appear as different bands on SDS-Polyacrylamide Gel after gel has been stained with Coomassie blue (visualize ~2pm of protein) or silver stain (visualize 0.02 µg of protein).

Native Gel Electrophoresis[edit]

Native Gel Electrophoresis involves running gels with samples in its native state. In doing so, the charge of the molecule becomes a factor in addition to size. More specifically, more charged molecules will migrate faster and farther than less charged molecules of comparable mass. Likewise, larger molecules will migrate less and at slower speeds than another molecule of comparable charge. Native Gel Electrophoresis most often involves two types of gels - Agarose and Polyacrylamide. Agarose is a derivative of the cell membranes of red algae composed of polysaccharides agarose and agaropectin, and due to the larger size of the pores, agarose gels are better suited for protein samples larger than 200 kilodaltons. Polyacrylamide (poly 2-propenamide, is a readily-crosslinked polymer of the neurotoxin acrylamide. It's pores are more fine, and while agarose is most commonly used for most cases, polyacrylamide is the gel of choice for smaller sample masses.

The Use of SDS (sodium dodecyl sulfate)[edit]

Sodium dodecyl sulfate.png
SDS Page. The Molecular Marker is located in the left lane.

Electrophoresis involves the movement of particles, such as nucleic acids or peptides, through a medium due to forces experienced by charges in an electric field. Electrophoresis can exploit molecular size differences or charge differences to separate similar molecules, and the amount of separation may be refined by changes in applied voltage or the density of the stationary medium. SDS -PAGE is a technique used to separate proteins based on size, and size alone. Sodium dodecyl sulfate (SDS) is a detergent that binds to proteins at every 2 amino acids in its sequence, and as SDS is very negative on its own, it changes the overall charge of the molecule to a negative charge. This negative charge is proportional to the protein's mass on the basis that the amount of SDS bound to the molecule is based on how many doublets of amino acids are present. The negative charge put on the protein is much larger than the charge originally there, which allows for a similar charge-to-mass ratio between different proteins. When SDS binds to proteins, it also changes the conformation of the proteins into similar shapes by denaturing the proteins and changing its bonds. SDS allows gel electrophoresis to separate proteins based on their molecular weights since the mass-to-charge ratio is relatively uniform among the proteins. This is because the SDS gel has sieving properties (offers resistance to particles based on their size)and is a uniform environment. It increases the differential mobility. The mobility of these proteins are then linearly proportional to the logarithm of their mass. Using this information, we can conclude from their mobility the mass of the protein and can even distinguish proteins that have a 2% difference in mass. Thus, the largest molecules, the ones that have more SDS bound to them, will fall down the electric field slower than the ones that have a smaller mass, and less SDS bound to them. This principle is contrary to the one in size-exclusion (gel-filtration) chromatography, which causes heavier molecules to come down first while the lighter ones come out later.

Certain solvents, such as PEG, glycerol, ethanol, and isopropanol, have an effect of decreasing the hydrodynamic radius of the proteins by decreasing the amount of free water to provide hydration spheres for the proteins. The polar solvents will hydrogen bond with the water, decreasing the disorder around the proteins and as a result, reducing the size of the hydration sphere. In such case, proteins will be eluted at a later stage as if they were of smaller size.

After the process is complete, the proteins are stained with a dye, forming bands, which represent the layers of mobility of each protein. With each additional purification process, the electrophoresis yields less bands, but a single darker band, which consequently represents the increased presence of the protein being isolated.

Two-Dimensional Gel Electrophoresis[edit]

The separation techniques of SDS-PAGE and isoelectric focusing can be utilized in conjunction to allow for 2DGE, which employs higher resolution and sensitivity in the separation of proteins. The first dimension of this powerful technique is isoelectric focusing (IEF) and the second dimension is polyacrylamide gel electrophoresis (PAGE). In the first dimension, proteins are separated according to their isoelectric point (pI). To do so, the gel is applied to the top of an SDS-polyacrylamide slab. Electrophoresis is then applied horizontally across the top of the gel and the proteins migrate into the second-dimension gel. Electrophoresis will then be applied again, this time vertically across the gel slab, and the proteins will migrate based on their molecular size. Heavier proteins will move shorter distances. Conversely, lighter proteins will move further.

While Two-Dimensional Gel Electrophoresis is a powerful technique that presents a higher resolution of separation, it does have its own limitations. 2DGE is a time-consuming and labor-intensive process, requiring manual gel polymerization, staining, and hours upon hours of separation. Furthermore, the technique is not without risk. Because heating of the gel may cause warping and diffusion of the molecules on the gel surface, 2DGE is difficult to reproduce.

Gel Electrophoresis in DNA Fingerprinting[edit]

DNA Fingerprint. Each sample has a different pattern of bands indicating these samples are from three different individuals.

DNA fingerprinting is a technique used to differentiate between different organisms based on the differences between each organism’s DNA configuration. DNA fingerprinting is often used by forensics labs to identify criminals by comparing a suspect’s DNA to the DNA found at a crime scene. DNA from the suspect is run through a gel electrophoresis and compared to a sample of DNA that was found at the scene. If the two samples produce identical band patterns in the gel, then confirmation that the suspect was at the scene of the crime can be made, since no two people possess identical patterns in their DNA.

In order to perform a fingerprint, a sample containing DNA must be obtained from each organism under evaluation. Examples of DNA samples include blood, urine, saliva, skin or hair. Before the samples can be analyzed, they must first be prepared. Preparation includes using restriction enzymes to separate the DNA into smaller pieces. Restriction enzymes are enzymes that cut DNA strands at specific nucleotides. These nucleotides are called restriction sites, and typically mark the end of a 4-8 unit sequence in nucleotides. The components and length of each restriction sequence vary from person to person, thus the use of restriction enzymes is an efficient way of separating an organism’s DNA into unique and specific sections. Additionally, certain amount of chemicals are also inserted as dye into the gel which will illuminate under UV light. This causes the bands to be much more visible when analyzing the sample protein.

Regions of DNA that contain many different short repeated sequences are called microsatellites. The lengths of these microsatellites vary greatly from person to person, which makes them prime locations for restriction enzymes to fragment the DNA. After treating the DNA samples with restriction enzymes, the DNA is now ready to be analyzed. The samples are loaded into the wells in a slab of gel, and an electric current is applied. Smaller fragments of DNA run through the gel faster, and will therefore be closer to the bottom, while larger fractions remain closer to the top. If two samples of DNA are run at the same time, the locations of the bands can be compared. If the patterns of bands between the two samples are identical, it means that the restriction enzymes partitioned each sample’s DNA at the same locations, indicating the two DNA samples had identical nucleotide sequencing. Identical nucleotide sequencing reveals the two samples are from the same organism.

DNA fingerprinting is also a useful technique to determine whether or not two people are related. Although no two people share the same DNA patterns, sections of microsatellites are passed down from parent to child. Not all of these sections are passed down, but offspring do not contain any pattern that their parents did not possess. A paternity or maternity test can be performed by comparing the DNA fingerprint of the individuals in question. If there are large groups of patterns that repeat in each sample’s fingerprint, it is likely that the individuals are related. The embedded image contains a three different DNA fingerprints, as indicated by the three different patterns of bands. Although these patterns represent fingerprints from different people, sample 2 shares similar patterns with both 1 and 3, which indicates that the person whose DNA is represented by sample 2 is likely to be the child of sample 1 and 3.

Maternal and paternal DNA fingerprinting tests are used to determine the probability of two people being related. These tests do not give definitive answers, and are not foolproof.

Visualization of protein in gels[edit]

As most proteins are not directly visible on gels to the naked eye, a method has to be employed in order to visualize them following electrophoresis. The most commonly used protein stain is the dye Coomassie brilliant blue. After electrophoresis, the gel containing the separated proteins is immersed in an acidic alcoholic solution of the dye. This denatures the proteins, fixes them in the gel so that they do not wash out, and allows the dye to bind to them. After washing away excesse dye, the proteins are visible as discrete blue bands. As little as 0.1-1.0 µg of a protein in a gel can be visualized using Coomassie brilliant blue. A more sensitive general protein stain involves soaking the gel in a silver salt solution. However, this technique is rather more difficult to apply. If the protein sample is radioactive the proteins can be visualized indirectly by overlaying the gel with a sheet of X-ray film. With time (hours to weeks depending on the radioactivity of the sample proteins), the radiation emitted will cause a darkening of the film. Upon development of the film the resulting autoradiograph will have darkened areas corresponding to the positions of the radiolabeled proteins. Another way of visualizing the protein of interest is to use an antibody against the protein in an immunoblot (Western blot). For this techinique, the proteins have to be transferred out of the gel on to a sheet of nitrocellular or nylon membrane. This is accomplished by overlaying the gel with the nitrocellulose then has an exact image of the pattern that was in the gel. The excess binding sites on the nitrocellulose are then blocked with a nonspecific protein solution such as milk powder, before placing the nitrocellulose in a solution cantaining the antibody that recognizes the protein of interest (the primary antibody). After removing excess unbound antibody, the primary antibody that is now specifically bound to the protein of interest is detected with either a radiolabeled, fluorescent or enzyme-coupled secondary antibody. Finally, the secondary antibody is detected either by placing the nitrocellulose against a sheet of X-ray film (if a radiolabeled secondary antibody has been used), by using a fluorescence detector or by adding to the nitrocellulose a solution of a substrate that is converted into a colored insoluble product by the enzyme that is coupled to the secondary antibody.


Hames, David. Hooper, Nigel. Biochemisty. Third edition. Taylor and Francis Group. New York. 2005.

SDS-Polyacrylamide Gel Electrophoresis[edit]

An SDS gel being visualized under UV
A gel apparatus

SDS-Polyacrylamide Gel Electrophoresis is a technique to separate proteins according to elctrophoretic mobility - a function of polypeptide chain length or protein mass). SDS-Polyacrylamide Gel Electrophoresis can also be used to separate DNA and RNA molecules.

SDS stands for sodium dodecyl sulfate. "SDS is an anionic detergent that disrupts non-covalent interactions in native proteins." SDS is used to create denaturing conditions to separate proteins by molecular weight and also confers negative charge to the proteins in proportion to its mass. By denaturing the proteins with SDS, proteins can be separated by their mass alone; without SDS, other molecular properties, such as a charge and shape, would interfere with the separation process (proteins that are strongly negative, for example, would move faster down a gel, even if they were larger, without SDS). In addition, a loading dye is introduced that helps bind the protein to the gel and make it more recognizable when exposed by UV light.

SDS-PAGE gives an estimates of the mass of dissociated polypeptides by the anions of SDS binding to the main chains of the polypeptide at a ratio of one SDS anion for every two amino acid residues. SDS-PAGE is unlike sedimentation-equilibrium technique because denaturing of the proteins is applied for SDS-PAGE for mass determination.

This technique is used to test the purity of interest proteins and the percentage of interested protein in the sample solution. This technique is rapid, sensitive and capable of high resolution compared to Gel-Electrophoresis because it can give a distinct band with as little as 0.1 micrograms of the protein when stained with Coomassie Blue and proteins that differ by 2% can still be separated.

SDS-PAGE can also be combined with Isolectric Focusing to obtain very high resolution separations. Proteins are first isolated by their net charge accordingly, then simultaneously run a SDS-PAGE adjacent to the filtering compartment.


Detergents are widely used to interrupt the hydrophobic interactions which can then destroy the lipid bilayer. Detergents are the most common types of agents used to solubilize transmembrane proteins.

Detergents are small amphiphilic molecules that are more soluble in water than lipids. Sometimes their hydrophilic heads (polar side) can be charged as in SDS but can be nonionic like octylglucoside and Triton. Detergents are monomeric in low concentration but form micelles in high concentration, after overcoming the critical micelle concentration. In order to keep the detergent monomer concentration constant, individual detergents go in and out of micelles. Detergents are very condition specific because they depend on the pH, salt concentration, and the temperature. Therefore detergents are very complicated to study.

Detergents help break the lipid bilayer by acting as a substitute. When the detergents are mixed with the lipids, the hydrophobic part of the detergent attaches to the hydrophobic head of the lipid bilayer making them soluble. If the detergent concentration decreases, the protein would not remain soluble.If more phospholipid were to be introduced, membrane proteins would form liposomes. Since the opposite side of the detergent is polar, the binding brings the membrane proteins into the solution as detergent-protein complexes. In this sense, the detergents acts as a capsule/substitute for the lipid membrane.

SDS, a strong ionic detergent, can solubilize even the most hydrophobic membrane proteins by attacking the hydrophobic core itself, which ultimately denatures the protein and can be used in a procedure known as SDS polyacrylmide-gel electrophoresis. The study of the protein function seems almost frivolous with the protein denatured but studies have showed that the protein can be renatured once the detergents are removed. Detergents are used commercially today to remove stains or proteins that stained clothes. By making the protein soluble, it is able to remove direct and other proteins from the clothes.

BN-Polyacrylamide Gel Electrophoresis[edit]

Similar to SDS-polyacrylamide gel electrophoresis, blue native-polyacrylamide gel electrophoresis is another useful method of protein purification that has allowed scientists to analyze membrane protein complexes in mitochondria, chloroplasts, microsomes, and bacteria.[1]


"Biochemistry." Sixth Edition - Jeremy M. Berg, John L. Tymoczko Lubert Stryer

"Molecular Biology of THE CELL." Fifth edition- Alberts, Johnson, Lewis, Raff, Roberts, Walter[[Structural_Biochemistry/Proteins/Purification/Edman_Sequencing|Edman_Sequencing}}

SDS-Polyacrylamide Gel Electrophoresis[edit]

An SDS gel being visualized under UV
A gel apparatus

SDS-Polyacrylamide Gel Electrophoresis is a technique to separate proteins according to elctrophoretic mobility - a function of polypeptide chain length or protein mass). SDS-Polyacrylamide Gel Electrophoresis can also be used to separate DNA and RNA molecules.

SDS stands for sodium dodecyl sulfate. "SDS is an anionic detergent that disrupts non-covalent interactions in native proteins." SDS is used to create denaturing conditions to separate proteins by molecular weight and also confers negative charge to the proteins in proportion to its mass. By denaturing the proteins with SDS, proteins can be separated by their mass alone; without SDS, other molecular properties, such as a charge and shape, would interfere with the separation process (proteins that are strongly negative, for example, would move faster down a gel, even if they were larger, without SDS). In addition, a loading dye is introduced that helps bind the protein to the gel and make it more recognizable when exposed by UV light.

SDS-PAGE gives an estimates of the mass of dissociated polypeptides by the anions of SDS binding to the main chains of the polypeptide at a ratio of one SDS anion for every two amino acid residues. SDS-PAGE is unlike sedimentation-equilibrium technique because denaturing of the proteins is applied for SDS-PAGE for mass determination.

This technique is used to test the purity of interest proteins and the percentage of interested protein in the sample solution. This technique is rapid, sensitive and capable of high resolution compared to Gel-Electrophoresis because it can give a distinct band with as little as 0.1 micrograms of the protein when stained with Coomassie Blue and proteins that differ by 2% can still be separated.

SDS-PAGE can also be combined with Isolectric Focusing to obtain very high resolution separations. Proteins are first isolated by their net charge accordingly, then simultaneously run a SDS-PAGE adjacent to the filtering compartment.


Detergents are widely used to interrupt the hydrophobic interactions which can then destroy the lipid bilayer. Detergents are the most common types of agents used to solubilize transmembrane proteins.

Detergents are small amphiphilic molecules that are more soluble in water than lipids. Sometimes their hydrophilic heads (polar side) can be charged as in SDS but can be nonionic like octylglucoside and Triton. Detergents are monomeric in low concentration but form micelles in high concentration, after overcoming the critical micelle concentration. In order to keep the detergent monomer concentration constant, individual detergents go in and out of micelles. Detergents are very condition specific because they depend on the pH, salt concentration, and the temperature. Therefore detergents are very complicated to study.

Detergents help break the lipid bilayer by acting as a substitute. When the detergents are mixed with the lipids, the hydrophobic part of the detergent attaches to the hydrophobic head of the lipid bilayer making them soluble. If the detergent concentration decreases, the protein would not remain soluble.If more phospholipid were to be introduced, membrane proteins would form liposomes. Since the opposite side of the detergent is polar, the binding brings the membrane proteins into the solution as detergent-protein complexes. In this sense, the detergents acts as a capsule/substitute for the lipid membrane.

SDS, a strong ionic detergent, can solubilize even the most hydrophobic membrane proteins by attacking the hydrophobic core itself, which ultimately denatures the protein and can be used in a procedure known as SDS polyacrylmide-gel electrophoresis. The study of the protein function seems almost frivolous with the protein denatured but studies have showed that the protein can be renatured once the detergents are removed. Detergents are used commercially today to remove stains or proteins that stained clothes. By making the protein soluble, it is able to remove direct and other proteins from the clothes.

BN-Polyacrylamide Gel Electrophoresis[edit]

Similar to SDS-polyacrylamide gel electrophoresis, blue native-polyacrylamide gel electrophoresis is another useful method of protein purification that has allowed scientists to analyze membrane protein complexes in mitochondria, chloroplasts, microsomes, and bacteria.[2]


"Biochemistry." Sixth Edition - Jeremy M. Berg, John L. Tymoczko Lubert Stryer

"Molecular Biology of THE CELL." Fifth edition- Alberts, Johnson, Lewis, Raff, Roberts, Walter[[Structural_Biochemistry/Proteins/Purification/Micro-Purification_/Edman_Sequencing|Edman_Sequencing}}

Zonal Centrifugation (Sedimentation Coefficient)[edit]

Another method in determining protein size is zonal centrifugation. Also known as band or gradient centrifugation, this technique relies on the concept of the sedimentation coefficient. The sedimentation coefficient is an equation that quantifies the rate of movement through a liquid medium through the formula:

s = m (1-vp)/f

where s = sedimentation coefficient, m = mass, v = partial specific volume, p = density of the medium, and f = frictional ratio. The unit of this equation are Svedberg units (S), which is equal to 10-13 s. A smaller S value generally means that a molecule will move more slowly in a centrifugal field, as opposed to a higher S value.

Some important conclusions that can be drawn from this equation include:

  1. Since the velocity of a particle depends on its mass, particles with higher mass will sediment faster than particles with less mass.
  2. Shape also determines the rate of sedimentation since it affects viscous drag. Therefore a more compact particle will have a smaller frictional coefficient than that of an elongated particle with the same mass. This means that more compact particles will sediment faster than elongated particles (same mass).
  3. The sedimentation velocity is dependent upon the density of the solution (p). Particles that have a vp value less than 1 will sink, while particles that have a vp value greater than 1 will float. Particles that have a vp value equal to 1 won't move.[3]

To use this technique, a density gradient is first created in a test tube (usually with sucrose) with the highest density at the bottom. The purpose of the density gradient is to prevent convective flow. A sample of proteins is then placed on top of the gradient and then centrifuged. The proteins separate accordingly to their sedimentation coefficient into bands which can then be collected by creating a hole at the bottom of the tube. [3]

Zonal Centrifugation

The diagram below illustrates a simplified version of this technique with DNA as a sample instead.

File:DNA sucrose gradient.jpg
Zonal Centrifugation

The benefit of using this technique is that it is very accurate and can be done without denaturing the protein.


1. Berg, Jeremy Mark, John L. Tymoczko, and Lubert Stryer.Biochemistry. 6th. New York: W H Freeman & Co, 2006. Print.

2. "Centrifugation: Buoyant Density Centrifugation." Cell Fractionation. N.p., n.d. Web. 16 Nov. 2012. <>.

3.Berg, Jeremy, Tymoczko J., Stryer, L.(2012). Zonal Centrifugation.Biochemistry(7th Edition). W.H. Freeman and Company.


The Bradford assay utilizes the binding of Coomassie Brilliant Blue to basic proteins and its shift to a maximum absorbance of 595 nm when bound.

How it Works[edit]

The amount of protein in a sample can be determined by constructing a standard curve with known masses of protein plotted against the absorbance value. The absorbance of the sample can then interpolated into the standard curve and the mass of protein in the sample can be determined.

Advantages and Disadvantages[edit]

The Bradford assay is advantageous because it offers high precision and fidelity. It also is compatible with most reagents although not with detergents or surfactants.

Disadvantages of the Bradford include that it is a slow assay to perform, it depends on a standard curve, and it destroys the sample of protein used.

Quantum Dots

What are Quantum Dots?[edit]

Quantum dots are microscopic semiconductor crystals that are made of clusters of cadmium selenide, cadmium sulfide, indium arsenide, or indium phosphide and they radiate colors when are exposed to ultraviolet light. They are typically between two to ten nanometers long in diameter. Their small size allows for the visible emission of photons as they are excited, which produces wavelengths of color that people can see. They are used to visualize and track individual molecules and their movements inside cells. They are also known as “artificial atoms” because their behavior is analogous to that of single atoms. Quantum dots work based on the principle of quantum confinement, which states that when an object is confined to a small space, the object is only able to occupy certain discrete energy levels. This principle is equivalent to how electrons are only able to occupy discrete energy configuration known as orbital’s. In the case of Quantum Dots, electrons are forced to occupy discrete energy levels based on which wave functions "fit" inside the quantum dot. When electrons are excited from their lower energy levels, the transition from a high energy state to a low energy state emits a photon, just like when an electron makes an energy transition in an atomic transition.

Energy diagram

This property of quantum dots is useful for one especially important application, to tag molecules or proteins of interest as well as several other uses outside the field of biology. Some examples include applications in memory chips, quantum computation, quantum cryptography, in room-temperature quantum-dot lasers, just to name a few. The basic concepts underlie these artificial atoms include, but not limited to, the magic numbers in the ground state angular momentum, the spin singlet-triplet transition, the generalized Kohn theorem, and their implications, shell structure, single-electron charging, diamond diagram, etc. They are often used more than traditional organic compounds that are used to stain cells and make cells radiate because they are brighter and more versatile.

One-Electron Systems[edit]

The problem of a single ideally two-dimensional electron in a circular dot with zero confinement potential in the presence of an external magnetic field was studied by Landau leading to the term Landau levels. Hybridization of Landau levels with the levels that arise from spatical confinement occurs at low values of the magnetic field (the magnetic length is larger than or comparable to the size of the confinement potential). As magnetic field increases (the magnetic length becomes much smaller than the radius of the confinement potential), free-electron behavior dominates that of spatial confinement. Therefore, a gradual transition from spatial to magnetic quantization that depends on the relative size of the quantum dots as compared to the magnetic length can be observed.

Basic Properties Found by Experiments[edit]

Using single-electron capacitance spectroscopy, gated resonant tunneling devices, conventional capacitance studies of dot arrays, transport spectroscopy, far-infrared (FIR) magneto-spectroscopy, and Raman spectroscopy the electronic properties of quantum dots are found. An oscillatory structure in the measured capacitance was attributed to the discrete energy levels of a quantum dot. In the presence of a perpendicular magnetic field, Zeeman bifurcation of the energy levels of a quantum dot was also observed. This splitting is believed to occur due to the interplay between competing spatial and magnetic quantization.

Capacitance spectroscopy has been widely used to study the density of states of low-dimensional electron systems. The measured capacitance (or the first derivative of the capacitance versus the gate voltage) reveals structures related to the zero-dimensional quantum levels. As a result, fractionally quantized states, similar to the fractional quantum Hall effect in a two-dimensional electron system, are observed.

Single-Electron Capacitance Spectroscopy[edit]

The electronic ground state in a parabolic confinement potential has been observed in an experiment by Ashoori. The method involved in this experiment is known as single-electron capacitance spectroscopy, and allows direct measurement of the energy levels of a ne-electron dot as a function of the magnetic field. The capacitance was measured between an electrode on top of the QD (the gate) and a conducting layer under the dot that is separated from the dot by a thin tunnel barrier. When the dc gate voltage on the top electrode is varied the Fermi level in the bottom electrode can coincide with the Fermi energy of the dot. Electron tunneling through the thin barrier is observed. Charge modulation in the QD induces a capacitance signal on the gate because of its close proximity to the dot. The capacitance as a function of the gate voltage was found to exhibit a series of uniformly spaced peaks, with separation decreasing with increasing electron number. The peaks are results of the addition of single electrons to the QD. The remarkable aspect of the experiment is that they probed the addition spectrum starting with the very first electron in the dot.

Optical Transtions[edit]

The quantum dot structure was created either by etching techniques or field-effect confinement in this experiment. The samples were prepared from modulation-doped AsGaAs/GaAs heterostructures. For the quantum dots, an array of photoresis dots was created by a holographic double exposure. The rectangular 200nm deep grooves were then etched all the way into the active GaAs layer. Quantum dots can also be grown from seed crystals. Like how sugar crystals are grown to make rock candy, quantum dots can be grown layer by layer until the desired size is achieved in a process known as self-assembly. Field-effect confined quantum dots were prepared by starting from a modulation-doped GaAs-heterojunction. Electrons were laterally confined by a gate voltage applied to a NiCr-gate. A strong negative gate voltage depletes the carriers leaving isolated electron islands (quantum dots).

Quantum-Dot Light-Emitting Device (LED)[edit]

Previously, there had been functional problems with the ligands that were attached to the quantum dots. Scientists have instead utilized these ligands to their advantage; They are now used to cover up the spaces in between the quantum dots. This creates a structure in which there are spaces for the quantum dots to fit in. This allows for the use of a single-layered Quantum-dot Light-Emitting Device, enabling scientists to pass current directly through the quantum dots rather than in between them. Scientists are currently pushing for this new technology of Quantum-Dot LEDs to be used in computer and television displays.


Quantum dots is a technology which utilizes microscopic semiconductor crystals to label proteins and genes of interest. The crystals are less than a millionth of an inch in diameter and radiate bright colors when exposed to UV light. Different sized dots radiate with different fluorescent colors. Large dots emit a red color, while small dots emit a blue color. The size affects the color of the fluorescence due to the phenomenon of quantum confinement. As the size of the quantum dot decreases, the electron is forced into a tighter and tighter space. This means that the quantized energy levels of the electron get spaced further and further apart, increasing the energy difference between the excited and relaxed electron energy levels. This phenomenon is exemplified in the classical quantum mechanics problem of the infinite potential well. Choice of the quantum dot material also affects the characteristics of the emission spectra. Choosing a semiconductor with a high bandgap, the energy difference between the highest occupied energy level and lowest unoccupied energy level, results in higher energy photons being released (blue shifting). Also, quantum dots tend to be made from direct bandgap materials like GaAs, which results in more efficient energy transitions and less energy wasted as heat.

The dots are more useful than fluorescent markers because there are more variety in colors, and the light emitted from quantum dots are brighter and more versatile. Another advantage is that until flurophores and chromophores, they do not photobleach, meaning that repeated use does not diminish their capacity to function properly. Because quantum dots are made from inorganic materials, they can be functionalized easily with molecules and do not degrade easily, which maybe pose an environmental risk. They can visualize individual molecules or every molecule of a given type. Quantum dots show promise in allowing scientist to quickly analyze thousands of genes and proteins from patients with disease, such as cancer. They can then customize treatments to each patient’s own molecular profile. Quantum dots can also improve the speed, accuracy, and affordability of various diagnostic tests, whether it be HIV or common allergies. They can also give a specific dose of a drug to a certain type of cell. Compared to other fluorescent markers, they are smaller, more specific and allow further insight into the structure and inner working of a cell. Large scale use of quantum dots, however, may be limited due to the unknown hazards of using nanomaterials in living organisms.


"Inside the Cell"

Lipopeptide detergents for membrane protein studies[edit]

  • In a cell there are two main groups of proteins, there are cytoplasmic proteins and membrane bound proteins. The cytoplasmic proteins are floating around in the cell and have a particular structure. This structure is the polar and hydrophilic amino acids all tend to be on the outside of the protein, and the non polar hydrophobic amino acids are buried inside the protein (Fig 1)
    Protein in membrane
    . The reason for this assembly is because the proteins have to be stable in the cytoplasm which is mostly water. Since the structure is not complex in terms of the positions of certain amino acids it is much easier to look at a crystal or an NMR spectroscopy of cytoplasmic proteins. The membrane bound proteins have a unique structure because of the position of the protein. The cell is bounded by a phospholipid bilayer, which is hydrophilic on the outsides and hydrophobic on the inside. This structure of the cell membrane has to be mimicked in the protein or it will not be able to stay stable in the membrane. Due to the more complex nature of the membrane bound proteins it is harder to purify and perform a NMR spectroscopy of these proteins.
  • In order to perform a NMR of a stable membrane protein, it would have to be place in an environment which mimics the phospholipid bilayer. Scientist then saw that detergents had a similar structure and that they formed micelles, with a hydrophobic inside and hydrophilic outside (Fig 2)
    Protein in detergent
    . The only problem with the detergents is that they move around and are hard to get an NMR because it created a lot of noise. Another problem is that the micelles don’t completely replicate the membrane in that they are not attached parallel to the proteins non polar region, instead they are perpendicular and so can cause distortions in the proteins functionally and shape. To solve these problems the scientists created a lipopeptide detergent of LPD for short. The LPD’s is a chain of 25 amino acids that for an alpha helix. On the second and 24th amino acids there is an attachment to two alkyl chains that are about eight to twelve carbons in length (Figure 3)
    Lipopeptide detergent
  • The advantage of using a LPD is that unlike a micelle, the LPD’s are closer in function to a membrane since they attach parallel to the protein (Fig 4)
    LPD and protein
    . Another advantage is that they are rigid and don’t move around and so reducing the noise that is present during a NMR spectroscopy. Since these structures are rigid and span the entire hydrophobic region of the protein, there only has to be a few LPD’s in place to keep the membrane protein stable. The only problem with the use of an LPD is that it is financially expensive and so is used as a last resort when all detergents fail. When LPD’s are used in experiments detergents still have to be present to surround the protein at first. Then the LPD is inserted and since it is more structurally favored it replaces the detergent’s micelle and creates the “membrane” and then the detergent is centrifuged out of the solution. After a few rounds of this process one can assume that the proteins are purely surrounded by LPD’s


Mitochondria have been known to be the powerhouse of the eukaryotic cell, possessing the ability to produce ATP which is used as cellular energy for the cell. However, mitochondria also fulfill other roles within the cell such as in metabolic pathways, apoptosis, cellular differentiation, and control of the cell cycle. As a result, to these multiple functions, mitochondria have evolved to develop a double membrane that surrounds the mitochondria complex. This double membrane functions as a high-traffic zone for the cell, possessing the ability to control what molecules go into the mitochondrion and what have to go out. For example, low-energy metabolites such as ADP have to go inside while high-energy metabolites such as ATP have to go out. This function of funneling ADP into mitochondria and ATP out of mitochondria is controlled by an integral membrane protein known as the voltage-dependent ion channel (VDAC), or also referred to as the mitochondrial porin.

The structure of VDAC has been examined for quite some time after it was discovered in 1975. Many structures of VDAC were determined, but the spatial arrangement, the topology, of the structure for the beta-strand could not be determined. However, in 2008, three long-term efforts to determine the three-dimensional structure of VDAC-1 were determined at atomic resolution. Three structures of the isoform VDAC-1 were determined by different methods. One was determined by using NMR spectroscopy alone, another by X-ray crystallography alone, and the last one using a combination of both NMR spectroscopy and X-ray crystallography. The comparison of these three different structures of VDAC-1 is examined as well as the discussion of the importance of solution NMR to determine the structure of VDAC-1.

Structure of VDAC-1 and Comparison of Three Structures[edit]

Picture of Integral Membrane Protein

The structure of VDAC-1 is very unique as it contains a very large beta-barrel. For all three structures, the number of strands in this beta-barrel and the spatial arrangements of molecules is the same. In studying the amino acid sequence of VDAC, it has been identified as being conserved from yeast to human. As a result, the overall folding pattern of the structure is known to be the same in all eukaryotes. In the three structures of VDAC-1, one of the structures is derived from a mouse while the other two structures are derived from humans. When comparing the mouse form of VDAC-1 to the human form of VDAC-1, the two forms are highly identical, differing in only four amino acids. Due to the very small changes in amino acid sequence between the mouse and human forms, the three-dimensional folded structures are very similar. To further confirm the beta-barrel structure of VDAC-1, denaturation of the VDAC-1 protein was performed to allow it to refold into the detergent LDAO. The refolded VDAC-1 structure was then placed into a different environment containing bicelles known as DMPC. By placing the refolded VDAC structure in a different solution environment, the same beta-barrel structure was observed again, and it was concluded that this beta-barrel structure of VDAC-1 is the same no matter what type of environment solution it is placed in.

The beta-barrel structure of VDAC-1 is fairly unique because it is the only structure that is observed in any eukaryotic membrane protein, and it is also the only known beta-barrel membrane protein that contains an odd number of strands. The rest of the beta-barrel proteins are observed to arrange into anti-parallel beta sheets, and because of this, an even amount of strands is needed to stabilize the entire beta-sheet structure through hydrogen bonding. It is unknown why a beta-barrel structure is stabilized with an odd number of strands as the folding mechanism of this protein is not fully understood. The structure of the beta-barrel is defined using two numbers that are the number of strands, n, and the shear number, S. The shear number in the beta-barrel can be identified as the pairs of alpha-carbon atoms in adjacent strands that lie on a helical trace across the surface of the beta-barrel. The side chains of the alpha-carbon atoms must be pointed to the same side of the sheet, and following the trace of the helix once around until it arrives back at the first strand a certain number of residues away from the starting point is known as the shear number of the beta-barrel. In beta-barrels, the shear number is always even in order to have the hydrophobic residues of the protein on the outside of the complex. Beta-barrel structures usually contain a shear number in the range of n and n+4.

Another comparison that is made to differentiate the three VDAC-1 structures is the residues that branch off the protein that is not part of the beta-barrel. The 1-23 residues are compared between each of the three structures, but through the use of NMR, only residues 6-10 have been identified to be an alpha-helical structure. Also, through the use of X-ray crystallography, the structure of mouse VDAC-1 was observed to contain three aliphatic residues: Leucine 10, Valine 143, and Leucine 150. From the crystallized structure, it was observed that Valine 143 and Leucine 150 are the only hydrophobic side chains that point to the barrel interior from the barrel wall. Residues 11-20 in the mouse structure and human structure appear to contain similar segments. However, the conformation of these segments differs between these two structures. Both structures were analyzed at cryogenic temperatures through the use of NMR. The conformational changes between these structures are exposed through the use of NMR because as the conformations change, residues of the proteins will end up interacting with other different neighboring residues. As a result, these conformational changes can lead to multiple resonance lines, reduced signal intensity, or line broadening on the NMR graphs for the structures.

Solution NMR in Determining Structure of VDAC-1 and Other Integral Membrane Proteins[edit]

In determining the beta-barrel structure of VDAC-1, researchers have stated that the combination of NMR and X-ray crystallography data were not enough in fully determining the structure. As a result, the use of solution NMR techniques was used instead to solve the beta-barrel structure of this membrane protein. In total, nine structures of integral membrane proteins have been solved using solution NMR. In using solution NMR, two important techniques are used in determining membrane protein structures such as VDAC-1: protein refolding and deuteration of the detergent micelle.

The use of protein refolding from a denatured state has a very low success rate for most membrane proteins, but if the refolding process is successful, there are many benefits that help to study the structure of membrane proteins with much greater ease. First, the process of protein refolding can lead to a very high yield of the newly folded structure of the protein. In the case of VDAC-1, an average of 40 mg of VDAC-1 was obtained in a 1 liter solution of an E. coli cell culture. Second, protein refolding helps to purify the membrane protein to an extremely high degree. This is extremely important in studying the structure of VDAC-1 as the data obtained from X-ray crystallography and NMR would be accurate in examining the true structure. Third, protein refolding has a high reproducibility, which goes together with the high purity. Fourth, because high yield and high reproducibility can be done from protein refolding, efficient perdeuteration and selective isotope labeling can be done. Finally, since the predeuterated protein goes through a denatured state in the in deuterated-water, all the amide compounds are readily protonated by the deuterated-water, and therefore, the structure of proteins such as VDAC-1 are much easier to identify due to the presence of D’s instead of H’s.

The use of deuterated detergent in solution NMR is the second technique that helps in identifying the structure of large membrane proteins such as beta-barrel of VDAC-1. From other NMR studies, compounds not placed in deuterated solutions produce very broad resonance lines due to the strong dipole-dipole interactions between different atoms, causing the spectral sensitivity to be reduced an extremely significant amount. By examining membrane proteins such as VDAC in deuterated solutions, a much more specific NMR graph is observed. For example, when using the Nitrogen-15-resolved-NOESY spectra, when a deuterated detergent was replaced with a protonated detergent, a decrease of 10-30% in sensitivity was observed. This decrease in sensitivity is clearly seen when analyzing the spectra of the methyl groups of the aliphatic residues of Isoleucine, Leucine, and Valine. The NOESY spectra of these groups did not produce a clear spectra identifying these compounds in a protonated detergent, but when a deuterated detergent was used, clear images of these groups were able to be identified, concluding that the use of solution NMR in a deuterated detergent proved to be a powerful method in determining the structures of integral membrane proteins such as VDAC-1.


Hiller, S. The role of solution NMR in the structure determinations of VDAC-1 and other membrane proteins. 2009, Current Opinion in Structural Biology. p. 396-401. ANALYTICAL ULTRACENTRIFUGATION

In principle, analytical centrifugation is similar to differential centrifugation in that both techniques apply the principles of centrifugal acceleration to separate components of a sample based on shape and mass differences. They both require a rotor capable of spinning samples at speeds enough to generate forces up to tens of thousand times greater than the force of gravity. However, analytical centrifugation is able to perform analysis of the concentration of the sample during centrifugation through the incorporation of light detection devices into the system, and this is the key point that differentiates the two techniques.

Analytical centrifuges can perform at least two different types of hydrodynamic analysis: (1) sedimentation velocity; and (2) sedimentation equilibrium. These two techniques, along with some of their advantages and disadvantages, are discussed below.

An ultracentrifuge.

1) Sedimentation Velocity This test is sensitive to both the mass and the shape of the molecules. To perform this test, a uniform sample is first loaded into the sample slots and subjected to high acceleration spinning. A typical velocity is anywhere between 40,000 and 60,000 rpm's. Due to the difference in force applied to the components caused largely by mass and shape differences, the components will separate out in layers, forming boundaries in solution. The boundary is basically a concentration gradient that forms as a result of the movement of the particles. Although the velocity of the individual particles resulting from the centrifugal force cannot be determined, a series of scans (such as absorbance or refractive index detection) is performed on the sample as it spins to record the movement of particle boundaries over time.

More specifically, the rate of movement of the boundary can be used to calculate the sedimentation coefficient (s). The sedimentation coefficient can be affected by at least the following factors:

• Molecular weight—heavier particles tend to sediment faster;

• Density—more dense particles tend to sediment faster;

• Molecular shape—unfolded proteins or a more highly elongated shape will experience more friction from solvent, so will tend to sediment slower;

• Solute concentration—higher solute concentration tends to lower the rate of sedimentation;

• Solvent concentration/viscosity—higher solvent concentration and viscosity will tend to increase friction and lead to a lower sedimentation coeffient; and

• Charge of the protein and how it interacts with polarity of the solvent—for example, a charged particle will travel more quickly through a polar solvent.

In addition to analyzing the rate at which the boundary moves (i.e., the sedimentation coefficient), the characteristics of the boundary itself can also provide information regarding the sample. The diffusion coefficient (D) can be determined by measurement of the spreading of a boundary. A homogeneous product will often produce a boundary that is sharper. In contrast, a heterogeneous sample can produce multiple boundaries or a very broad boundary. However, these are only general rules of thumb because characteristics of the sample can produce contradictory results. For example, a single boundary is not necessarily indicative of a homogeneous sample where it includes two molecules that have similar sedimentation coefficients that would result in a what appears to be a single boundary. Likewise, multiple boundaries do not necessarily result from a heterogeneous sample because a homogeneous sample can have several stable aggregation states that can produce multiple boundaries depending on how rapid the states introconvert.

An additional factor that can create complications in analyzing the characteristics of the boundary is a phenomenon known as self-sharpening. Self-sharpening occurs where the molecules at the "front" end of the boundary move in a higher concentration of solvent and are restricted, whereas molecules at the "back" end of the boundary are in a less concentrated portion of the solvent and move more quickly. This causes an artificial narrowing of the boundary.

Sedimentation velocity is a useful technique for a variety of analyses, including: (1) determining whether a sample is homogeneous; (2) determining whether a protein is a monomer, dimer or other multimer in its native state; (3) determining the overall shape of a protein (for example: is it spherical or more extended); and (4) quantifying the distribution of sizes of proteins in a sample that includes a range of sizes. A critical advantage of a sedimentation velocity procedure is that it can be performed in a relatively short amount of time (often as low as 3–5 hours), as opposed to sedimentation equilibrium (which can often take days). Another important advantage of sedimentation velocity is that it can be used to analyze samples over a broader range of pH, ionic strength, and temperature conditions. One disadvantage is that interacting systems (such as proteins that reversibly self-associate—see discussion above) can lead to data that is difficult to interpret if those systems change during the course of the testing.

2) Sedimentation Equilibrium

This type of analysis is sensitive only to the mass of a particle (not its shape), and is performed at slower velocities than those for sedimentation velocity. As the sample spins, the components separate out due to acceleration from the spinning while diffusion simultaneously provides an opposing force. Eventually, these forces balance each other out and the components in solution reach an equilibrium point. A series of scans (such as absorbance or refractive index detection) monitors the sample for this equilibrium point, which provides information on the molar weight of the component in sedimentation.

Sedimentation is still regarded by many as the best method to determine the molecular weights of macromolecules in a sample. Although sedimentation equilibrium is conducted at a lower velocity than sedimentation velocity, it must be conducted at higher velocities when analyzing lower molecular weight samples. Sedimentation can also be used to separate heterogeneous samples of different molecular weights. Higher molecular weight particles will move further toward the bottom of the cell, whereas lower molecular weight particles will collect near the top of the cell.

In combination, these tests are able to provide details on the purity of samples and information on molecular weights quite accurately. In particular, analytical ultracentrifugation becomes extremely useful for the analysis of molecular weights for large macromolecules which wouldn’t be able to undergo sequencing tests, such as polysaccharides. Additionally, sedimentation equilibrium is able to provide information on the attractive forces between components of a sample in solution without disturbing the solution, which makes this method very reliable and accurate. Although analytical ultracentrifugation techniques can be used in isolation, they are also used in combination with other analytical techniques to provide more clear and complete conclusions. For example, these techniques are often used in combination with cheaper techniques such as gel electrophoresis and other chromatographic techniques. In addition, they are often used in combination with other analytical techniques such as mass spectrometry, x-ray crystallography, and multidimensional nuclear magnetic resonance (NMR).

Sedimentation Velocity Patterns:

Ultracentrifugation studies of ATCase have shown two different graphs of Protein concentration versus migration distance. Native ATCase has one peak and the 6 catalytic and 6 regulatory subunits are gound together. When the enzyme is treadted with p-hydroxymercuribenzoate, the enzyme is dissociated into two subunits. A 2 regulatory subunit and a 3 catalytic subunit. These experiments have helped show that the interaction of the subunits in the native enzyme produce its regulatory and catalytic properties.

The origin of ultracentrifugation Ultracentrigugation is one of the powerful techniques to determine structure proteins because this method can be used as preparative and analytical. Thus, it is common use in biology, biochemistry and polymer area. In 1923, the analytical ultracentrifuge was invented by Theodor Svedberg and three years later, he won a Nobel Prize in Chemistry for his research on using the ultracentrifuge on separating the collides and proteins. In 1946, Pickel designed the first model preparative ultracentrifuge that can reach the velocity of 40,000 rpm.

Analytical ultracentrifuge

Preparative ultracentrifuge

REFERENCES [17] [18] [19] [20] Crosslinking is one method that is used to study the interactions in protein and is often called bioconjugation when referring to proteins. Crosslinking involves covalently attaching a protein to another macromolecules (often another protein) or a solid support via a small crosslinker. A crosslinker, or a crosslinking agent, is a molecule which has at least two reactive ends to connect the polymer chains. The crosslinkers are usually reactive toward functional groups common on proteins such as carboxyls, amines, and sulfhydryls.

Types of Crosslinkers[edit]

Homobifunctional Crosslinkers[edit]

Homobifunctional crosslinkers are molecules that have the same reactive groups on each end of the crosslinker. Homobifunctional crosslinkers can give a good idea of all the interactions between molecules present in a solution or cell, but it can also cause unwanted crosslinks. The reactive ends are impartial and may crosslink a protein to an identical protein when interactions between different proteins are desired. Homobifunctional crosslinkers often also create intramolecular crosslinks.[4]

Heterobifunctional Crosslinkers[edit]

Heterobifunctional crosslinkers are molecules that have different reactive groups on each end of the crosslinker. Heterobifunctional crosslinkers can be more selective in the crosslinks formed because the reactivity of each group can be chosen so that a specific protein will only bind to one end. A two-step process can also be set up to minimize undesired crosslinks. First a crosslinker is added to a solution with one particular protein and allowed to react. The protein with the crosslinker attached is then purified and added to a solution with a second protein that will form a crosslink with the other reactive group on the crosslinker. This new structure can then be analyzed using different techniques to see if the proteins connected, how many connected, or other desired information.[5]

Crosslinker Reactive Groups[edit]

There are a number of different reactive groups used in crosslinkers that are targeted towards different functional groups on proteins including carboxyls, amines, sulfhydryls, and hydroxyls. Crosslinkers are generally selected based on their reactivity, length, and solubility. Crosslinkers can also be spontaneously reactive upon addition to a sample or be activated at a specific time, generally through photo-reactive groups.

Although a crosslinker can be chosen to target only a certain type of functional group, most proteins contain several residues with each type of group. If multiple target sites are available for binding, the crosslinker will lose specificity and multiple crosslinked products will be formed. However, a crosslinker will only be able to bind if the target functional group is on the surface of the protein. Thus, protein folding will often block access to a number of possible reaction sites and allow for greater specificity in crosslinking.

N-Hydroxysuccinimide Esters (NHS Esters)[edit]

NHS esters react with amines to give stable amide groups. As such, NHS esters are useful for linking to the N-terminus or lysine residues on a protein. The reaction is generally carried out in slightly alkaline conditions (pH 7.2-8.5). However, the desired reaction competes with hydrolysis of the NHS ester. The rate of hydrolysis increases with increasing pH, so the pH of the buffer solution must be closer controlled.

NHS ester.png


Imidoesters are reactive groups that form amidines with primary amines. Like NHS esters, imidoesters are useful for linking to the N-terminus of a protein or a lysine residue. The reactivity of imidoesters increases with pH and the reaction is generally carried out between pH 8 and 10. However, imidoesters become labile at higher pH, and are thus not as stable as NHS esters. Imidoesters are useful for linking membrane proteins and for probing lipid-protein interactions, as they are able to penetrate the cell membrane.

Imidoester reaction.png


Carbodiimides are not traditional crosslinkers in that crosslinker itself does not become part of the protein-protein complex. Carbodiimides instead covalently link two proteins directly together by forming an amide bond between a carboxylic acid group of one protein and an amine group of another. Because of the mechanism of carbodiimide crosslinkers, they are by nature zero length (they do not become part of the molecule) and heterobifunctional crosslinkers. EDC (1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide) is the only well known carbodiimide crosslinker.[6]

EDC reaction.png


Maleimides react with sulfhydryls at physiological pH to produce a stable thioether linkage, and are usually linked to a cysteine residue. Since there are often far fewer free sulfhydryl groups than amine groups on a protein, maleimides tend to be more specific than reactive groups that target amines. Also, since sulfhydryls are often involved in disulfide bonds, connection to a crosslinker often does not disturb the structure of the protein.

Maleimide reaction.png


Like maleimides, haloacetyls react with sulfhydryl groups at physiological pH to give a thioether linkage. The most common haloacetyls are iodoacetyls and bromoacetyls, and they react via a nucleophilic substitution of the halide by the sulfur of the sulfhydryl.


Pyridyl Disulfides[edit]

Pyridyl disulfides react with sulfhydryls to form disulfide bonds. They are active over a wide pH range, but pH 4-5 is optimal. Because a disulfide bond is formed, the linkage can be cleaved with normal disulfide reducing agents.

Pyridyl Disulfides.png


Diazirines are an example of a photo-reactive group. While most other reactive groups are spontaneously reactive upon addition to a sample, a photo-reactive group will be inert until activated by exposure to ultraviolet light. Diazirine analogs of methionine and leucine are typically incorporated into a protein and then photo-activated in the sample solution or cell. When activated, the diazirine will react with any protein within a few angstroms of the photo-reactive analog. This allows for protein-protein interactions to be captured and studied in live cells.[7]

Despite the structural modifications, the proteins made from diazirine analogs of methionine and leucine are still viable, although their growth is slowed slightly. This allows for the creation of photo-reactive proteins that lack toxicity. The amino acid analogs and the prote