# Structural Biochemistry/Volume 5

How can i be more of. A volintery gini pig b. I wouldlime yo eventuly. Fold into a more natual nutral struckture of living without the mager coulisions. Withoutbthedeath

## Proteins

Proteins are polymers of multiple monomer units called amino acid, which have many different functional groups. More than 500 amino acids exist in nature, but the proteins in all species, from bacteria to humans, consist mainly of only 20 called the essential amino acids. The 20 major amino acids, along with hundreds of other minor amino acids, sustain our lives. Proteins can have interactions with other proteins and biomolecules to form more complex structures and have either rigid or flexible structures for different functions. Iodinated and brominated tyrosine can also be present; however, they were also not listed in the 20 amino acids. Since iodinated tyrosin can only being found in thyroid hormones and brominated tyrosine only being found in coral, they were excluded. The 20 main amino acids that are found in most but not all proteins are listed below:

## Amino Acids

Amino acids are molecules which contain both a carboxylic acid and an amine group. In amino acid, the carboxyl group is more acidic than the carboxylic acid. 2-amino acids, also known as alpha-amino acids, are a specific type of amino acid that makes up proteins. These amino acids have many interesting properties which will be discussed in the next sections.

Amino acids play central roles both as building blocks of proteins and as intermediates in metabolism. Proteins are linear polymers formed by linking the a-carboxyl group of one amino acid to the a-amino group of another amino acid. This type of linkage is called a peptide bond or an amide bond. The formation of a dipeptide from two amino acids is accompanied by the loss of a water molecule. The equilibrium of this reaction lies on the side of hydrolysis rather than synthesis under most conditions. Hence, the biosynthesis of peptide bonds requires an input of free energy. Nonetheless, peptide bonds are quite stable kinetically because the rate of hydrolysis is extremely slow; the lifetime of a peptide bond in aqueous solution in the absence of a catalyst approaches 1000 years. Thus, the 20 amino acids that are found within proteins convey a vast array of chemical versatility. The precise amino acid content, and the sequence of those amino acids, of a specific protein, is determined by the sequence of the bases in the gene that encodes that protein. The chemical properties of the amino acids of proteins determine the biological activity of the protein. Proteins not only catalyze all (or most) of the reactions in living cells, they control virtually all cellular process. In addition, proteins contain within their amino acid sequences the necessary information to determine how that protein will fold into a three dimensional structure, and the stability of the resulting structure. The field of protein folding and stability has been a critically important area of research for years, and remains today one of the great unsolved mysteries. It is, however, being actively investigated, and progress is being made every day.

### Amino Acid Subdivisions

There are twenty major amino acids which make up proteins. Each of them contains a unique functional group which gives rise to different properties. These properties include size, shape, charge, capacity for hydrogen bonding, hydrophillicity/hydrophobicity(hydrophobic interactions), and chemical reactivity. Amino acids can be broadly hydrophobic and hydrophilic, depending on the chemical properties of the R group side chain. In an aqueous environment, the hydrophobic amino acids are unable to participate in hydrogen bonding. They associate with one another and reside mostly inside the protein. On the other hand, hydrophilic amino acids tend to interact in the aqueous environment due to polarity. These amino acids are normally found on the exterior surface.

### Zwitterion

An amino acid is in a zwitterionic state when the carboxylic acid group is deprotonated and the amino group is protonated, simultaneously. Zwitterions are dipole ions—meaning that these molecules have two charges, both a positive and a negative charge. The pH of the water solution is a factor determining the state of protonation. Such a state leaves the carboxylic end negatively charged (-COO-) and the adjacent amino end positively charged (-NH3+). The carboxyl group (-COO-) is deprotonated first because the pKa is about 2 and the pKa of the amine group (-NH3+) is about 9. The net charge for the protein in zwitterionic form is zero. [1] Molecules which behave in this fashion are called amphoteric. In solid state, the amine functionality deprotonates the carboxylic acid group, giving rise to the zwitterionic, dipolar entity. The charged state of an amino acid in aqueous solution depends largely on the pH. The major form of all amino acids at a pH of 2 to 9 is the zwitterionic form. In strong acid (pH < 2), the predominant form is the fully protonated cationic ammonium with the corresponding protonated form of the carboxylic acid. This species has a net charge of +1. In strongly basic solutions (pH > 9), the predominant form is the fully deprotonated aminocarboxylate anion. This species would have a net charge of -1. These forms interconvert by acid-base equilibria. This leaves a wide pH range wherein the zwitterion would play a large role as a contributing species. The pH at which the extent of protonation equals that of deprotonation is called the isoelectric pH or the isoelectric point (pI). At this pH, the amount of positive charge balances that of negative charge and the concentration of the charge-neutralized zwitterionic form is at its highest. When the side chain of the acid bears an additional acidic or basic function, the pH is either decreased or increased, respectively. Note that at most relevant physiological pH ranges, the zwitterion would be, by far, the species of the most abundance.

Histidine contains an imidazole ring with 2 nitrogen atoms: one is basic and the other is not. The basic nitrogen is involved in the delocalization which is important during enzyme catalysis.

Here is an example of L-amino acids forming zwitterion at neutral pH:

Lysine-zwitterion,Zwitterionic forms of L-amino acids

### Optical Activity

All proteins or polypeptides are a series of linked amino acids. A typical α amino acid consists of a central carbon (which is the alpha carbon in this case) that is attached to an amino group (-NH2), a carboxylic acid (-COOH), a hydrogen atom, and a distinctive R group. The R group, usually referred to as a side chain, determines the properties of each amino acid. Scientists classify amino acids into different categories based on the nature of the side chain. A tetrahedral carbon atom with four distinct groups is called chiral. The ability of a molecule to rotate plane polarized light to the left, L (levorotary) or right, D (dextrorotary) gives it its optical and stereochemical fingerprint. All amino acids within polypeptides are configured in the L form. The L form corresponds to the absolute configuration of S, which is a system used to designate stereochemistry in the field of organic chemistry. Although D-amino acids (designated as R stereoisomers in the field of organic chemistry) exist naturally, they are not found in proteins. Thus far, scientists have not been able to come up with a hypothesis on the preference for the L amino acids in living organisms. It is clear, however, that all of the physiological mechanics downstream of the amino acids are geared towards recognizing and interacting with the specific L conformation. Note: Since the central carbon has four distinct groups attached, all of amino acids are chiral except for glycine, which is achiral. This is due to the fact that the central carbon atom in glycine contains only 3 unique substituents instead of 4 (R sidechain = H).

### Modified Amino Acids

Within proteins, it is possible to find amino acids which do not correspond to the 20 standard types. Most of these come about by chemical modification of an already incorporated amino acid. For example, a hydroxylated form of proline exists within collagen protein. Also, a selenium analog of cysteine is known to occur in glutathione peroxidase enzymes. Pyrrolysines have also been isolated and characterized. These exceptions to the rule are dictated by and encoded within DNA and RNA and there are many more examples.

### The Peptide Bond

Any discussion of amino acids is not complete without mentioning how each amino acid bonds to another. All amino acids bond to one another through a condensation reaction involving the amine group of one amino acid and the carboxylic acid group of another. The enzymatically-catalyzed reaction forms an amide entity: [R1-NH2 + R2-COOH ==> R1-NH-C(=O)-R2 + H2O]. The amide bond has special properties in that it has a resonance form which gives the bond a planar, rigid, double bond character: [R1-N-C(=O)-R2 <==> R1-N+=C(-O-)-R2]. Amino acids can link to each other in small units of only 2 or 3 amino acids called dipeptides and tripeptides, but can also connect in very large chains consisting of hundreds or even thousands of amino acids. Each complete peptide series has an N terminus (amino) and a C terminus (carboxylate). The overall, 4 atom angles involved in the peptide bond system are important to those who study proteins. In particular, the R-[N-C-C(=O)-N]-R group is called a phi torsion angle and the adjacent angle, the psi, φ, torsion angle, involves the R-[C-N-C-C(=O)]-R group. These angles are important to consider and the natural distribution of know peptide angles are summarized on the Ramachandran plot. Peptide bond is formed by condensation reaction and broken by hydrolysis (addition of water).

Tetraptide is a peptide that has four amino acids that are joined by peptide bonds.

Amino Acid Classification

• Non-polar Amino Acids
Aliphatic : glycine, alanine, valine, isoleucine, leucine
Aromatic : phenylalanine, tryptophan.
Cyclic : Proline
• Polar Amino Acids
Sulfur-Containing : cysteine, methionine
Hydroxyl-Containing : serine, threonine
Aromatic : tyrosine
Acidic Amide : asparagine, glutamine
• Charged Amino Acids (at physiological pH)
Acidic : aspartic acid, glutamic acid
Basic : histidine, lysine, arginine

### List of the 20 Amino Acids

Amino Acid 3-Letter Abbreviation 1-Letter Abbreviation Class of Amino Acid (Side Chain) Hydrophobicity Index (100 being extremely hydrophobic, 0 being neutral, and -55 being hydrophilic) Structure pKa of COOH group pKa of NH3+ group pKa of R group Molecular Weight [g/mol] alpha helix beta sheet Reverse turn
Glycine Gly G Aliphatic, nonpolar Neutral (0 at pH = 2; 0 at pH = 7) 2.4 9.8 -- 75.07 0.43 0.58 1.77
Alanine Ala A Aliphatic, nonpolar Hydrophobic (47 at pH = 2; 41 at pH = 7) 2.4 9.9 -- 89.1 1.41 0.72 0.82
Valine Val V Aliphatic, nonpolar Very Hydrophobic (79 at pH = 2; 76 at pH = 7) 2.3 9.7 -- 117.15 0.90 1.87 0.41
Leucine Leu L Aliphatic, nonpolar Very Hydrophobic (100 at pH = 2; 97 at pH = 7) 2.3 9.7 -- 131.18 1.34 1.22 0.57
Isoleucine Ile I Aliphatic, nonpolar Very Hydrophobic (100 at pH = 2; 99 at pH = 7) 2.3 9.8 -- 131.18 1.09 1.67 0.47
Methionine Met M Hydroxyl or Sulfur-Containing, nonpolar Very Hydrophobic (74 at pH = 2; 74 at pH = 7) 2.1 9.3 -- 149.21 1.30 1.14 0.52
Serine Ser S Hydroxyl or Sulfur-Containing, polar Neutral (-7 at pH = 2; -5 at pH = 7) 2.2 9.2 -- 105.09 0.57 0.96 1.22
Cysteine Cys C Hydroxyl or Sulfur-Containing, polar Hydrophobic (52 at pH = 2; 49 at pH = 7) 1.9 10.7 8.4 121.16 0.66 2.40 0.54
Threonine Thr T Hydroxyl or Sulfur-Containing, polar Neutral (13 at pH = 2; 13 at pH = 7) 2.1 9.1 -- 119.12 0.76 1.17 0.96
Proline Pro P Cyclic Hydrophilic (-46 at pH = 2; -46 at pH = 7) 2.0 9.6 -- 115.13 0.34 0.31 1.32
Phenylalanine Phe F Aromatic Very Hydrophobic (92 at pH = 2; 100 at pH = 7) 2.2 9.3 -- 165.19 1.16 1.33 0.59
Tyrosine Tyr Y Aromatic Hydrophobic (49 at pH = 2; 63 at pH = 7) 2.2 9.2 10.5 181.19 0.74 1.45 0.76
Tryptophan Trp W Aromatic Very Hydrophobic (84 at pH = 2; 97 at pH = 7) 2.5 9.4 -- 204.25 1.02 1.35 0.65
Histidine His H Basic Hydrophilic at pH=2 (-42), Neutral at pH=7 (8) 1.8 9.3 6.0 155.16 1.05 0.80 0.81
Lysine Lys K Basic Hydrophilic (-37 at pH = 2; -23 at pH = 7) 2.2 9.1 10.5 146.188 1.23 0.69 1.07
Arginine Arg R Basic Hydrophilic (-26 at pH = 2; -14 at pH = 7) 1.8 9.0 12.5 174.2 1.21 0.84 0.90
Aspartate Asp D Acidic Neutral at pH=2 (-18), Hydrophilic at pH=7 (-55) 2.0 9.9 3.9 133.10 0.99 0.39 1.24
Glutamate Glu E Acidic Neutral at ph=2 (8), Hydrophilic at pH=7 (-31) 2.1 9.5 4.1 147.13 1.59 0.52 1.01
Asparagine Asn N Acidic, polar Hydrophilic (-41 at pH = 2; -28 at pH = 7) 2.1 8.7 -- 132.118 0.76 0.48 1.34
Glutamine Gln Q Acidic, polar Neutral (-18 at pH = 2; -10 at pH = 7) 2.2 9.1 -- 146.15 1.27 0.98 0.84

### Network Approach

• The network approach helps determine the role of a specific amino acid at a known position in the protein structure. Networks simplify complex system behaviors by splitting the system into a series of links. Links represent the neighboring positions of amino acids in protein molecules. Because proteins are linked in this way and protein structure networks are connected to each other by only a few other amino acid elements, we can determine folding probability. Proteins with denser protein structure networks fold more easily and the folding probability increases as the protein structure becomes more compact.
• The network approach can also be applied to the prediction of active centres in proteins. Active centres are protein segments that play key parts in the catalytic reaction of the enzyme function shown by their respective proteins. Scientists have used long-range network topology to create a network skeleton from which they can study only side chains which are essential in the flow of information for the whole protein. Network analysis has showed that active centres occupy a central position in protein structure networks, usually have many neighbors, give unique linkages in their neighborhood, integrate communication for the entire network, do not take part in wasteful actions of ordinary residues, and collect and coordinate most of the energy in the network.

## Alanine - Ala/ A

Structure Alanine, also known as 2-Aminopropanoic Acid, (abbreviated as Ala or A) is an α-amino acid with the chemical formula HOOCCH(NH2)CH3. It has a molar mass of 89.09 g/mol and a density of 1.424 g/cm3. The α-carbon atom of alanine is bound with a methyl group (-CH3), making it one of the simplest α-amino acids with respect to molecular structure and also resulting in alanine being classified as an aliphatic and amino acid. The methyl group of alanine is non-reactive and is thus almost never directly involved in protein function. Alanine is a nonpolar hydrophobic molecule. It is ambivalent, meaning it can be inside or outside of the protein molecule. The α-carbon of alanine is optically active; in proteins, only the L-isomer is found.

Features Alanine is a non-essential amino acid which meant that it can be manufactured by the human body and does not need to be obtained directly through the diet. Alanine is found in a wide variety of foods, but is particularly concentrated in meats. It is a non-essential amino acid that occurs in high levels in its free state in plasma.

Functions Alanine is the primary amino acids for sugar and acid metabolism. It boosts up the immune system by producing antibodies, and provide energy for muscles tissues, brain, and the central nervous system. It is used in pharmaceutical preparations for injection or infusion. It is also used in dietary supplement and flavor compounds in maillard reaction products. In addition, it is a stimulant of glucagon secretion.

Chemical Synthesis Alanine can be manufactured in the body from pyruvate and branched chain amino acids such as valine, leucine, and isoleucine. Alanine is most commonly produced by reductive amination of pyruvate. Because transamination reactions are readily reversible and pyruvate pervasive, alanine can be easily formed and thus has close links to metabolic pathways such as glycolysis, gluconeogenesis, and the citric acid cycle. It also arises together with lactate and generates glucose from protein via the alanine cycle. Racemic alanine can be prepared via the condensation of acetaldehyde with ammonium chloride in the presence of potassium cyanide by the Strecker reaction.

Analysis Alanine can be identified via UV spectrometry, infrared spectroscopy (IR), nuclear magnetic spectroscopy, (NMR), and mass spectroscopy.

## Arginine - Arg/ R

Structure Arginine, 2-Amino-3-carbamoylpropanoic acid, contained of a three-carbon aliphatic straight chain with the end of which is capped by a guanidinium group. Its molar mass is 132.12g/mol. With a pKa of 12.48, the guanidinium group is positively charged in neutral, acidic and even most basic environments. Therefore, arginine has basic chemical properties. Because of the conjugation between the double bond and the nitrogen lone pairs, the positive charge is delocalized and enables the formation of multiple H-bonds.

Features Arginine is an essential amino acid that plays important role in nitrogen metabolism. It is a chemical precursor to nitric oxide (a blood vessel-widening agent called a vasodilator. Nitric oxide is a powerful neurotransmitter that helps blood vessels relax and also improves circulation. Food that are rich in arginine include red meat, fish, poultry, wheat germ, grains, nuts and seeds, and dairy products.

Functions Arginine assists in wound healing and help in burn treatment. It is necessary in normal immune system activity by enhancing the production of T-cells. Studied show that arginine may help treat medical conditions that improve with increased vasolidation. Some conditions that are treated with arginine are chest pain, atherosclerosis (clogged arteries), heart disease or failure, erectile dysfunction, intermittent claudication/peripheral vascular disease, and vascular headaches (headache-inducing blood vessel swelling). Arginine also helps with bodybuilding, enhancing sperm production, and preventing tissue wasting in people with critical illnesses. Arginine hydrochloride has high chloride content and has been used to treat metabolic alkalosis.

Biosynthesis Arginine is synthesized from citrulline with the presence of cytosolic enzymes argininosuccinate synthetase and argininosuccinatelyase. This is energetically costly reaction. Therefore, the synthesis of each molecule of argininosuccinate will be coupling with hydrolysis of adenosine triphosphate (ATP) to adenosine monophosphate (AMP).

Synthesis of arginine in human body occurs principally via the intestinal–renal axis, wherein epithelial cells of the small intestine, which produce citrulline primarily from glutamine and glutamate, then join with the proximal tubule cells of the kidney, which extract citrulline from the circulation and convert it to arginine, which comes back to the circulation.

Arginine and Nitrogen Storage In order for a cell to grow, it needs nitrogen which can come from ammonia, nitrates, dinitrogen or amino acids. The PII protein is an ancient signaling protein that senses and integrates nitrogen and carbon abundance by binding 2 OG and ATP/ADP. The N-acetyl-L-Glutamate kinase (NAGK) stores nitrogen as arginine which it incorporates into arginine rich copolymers. Since arginine is nitrogen-rich, it is an ideal for nitrogen storage. The osmotic impact of arginine minimizes when arginine is incorporated into proteins. The PII protein binds to NAGK when nitrogen is abundant only in oxygenic phototrophs. But when nitrogen is scarce, 2-oxoglutarate binds to the PII protein with ATP leading to the dissociation of the PII-NAGK complex.

Arginine-insensitive NAGK is a homodimer containing a backbone of 16-stranded Beta sheets in both subunits. However, arginine-sensitive are hexameric and recent studies have shown that these enzymes are ring-like hexameric trimers of dimers. The ring is formed by the link between three E. Coli NAGK-like dimers and the N-terminal alpha-helix. In arginine-sensitive NAGK, the arginine is connected by interlaced N-helices. The helices are needed for making NAGK an arginine-operated switch showing a sigmoidal of the arginine inhibition kinetics. The PII protein is homotrimers having a βαββαβ subunit topology with the alpha helices looking outward and the beta sheet inward. The T-loop is large and flexible loop that contain the phosphorylation and uridylylation sites in cyanobacteria and proteobacteria. When the protein PII is absent, S. elongates NAGK is inactive having low Vmax and high Km for NAG and requiring a low concentration of argigine for inhibition. However, the enzyme A. thaliana NAGK is highly active having a Km four times lower and a Vmax three times greater for NAG than S. elongates NAGK. When PII binds the S. Elongates NAGK, the Vmax for NAG increases up to four times the original amount and decreases up to ten times the original amount for Km. Km is not affected when it binds to A. thaliana NAGK, but the Vmax for NAG increases by five times the original amount. The original amount is the amount with the protein PII absent. The S. elongates PII-NAGK complex has one NAGK hexamer that is sandwiched between two PII trimers. Since the PII proteins are not packed tightly on NAGK, PII only interacts with NAGK on the T-loops and B-loops. The A. thaliana PII-NAGK complex has MgATP bounded to the PII protein with all the NAGK active centers containing bound NAG and ADP.

## Asparagine - Asn/ N

Structure Asparagine is polar and uncharged derivative of acidic amino acid aspartic acid or aspartate; as a side chain, it has a carboxamide group, which is neutral at physiological pH and can be changed to carboxylic acid by hydrolysis to form aspartate amino acid. The carboxamide group of the amino acid can form hydrogen bonds.

Features Asparagine is found in abundance in asparagus, and is thus named so. Asparagine is not an essential amino acid, meaning that it is not necessary for humans to ingest it to receive necessary amounts. Asparagine has a high propensity to hydrogen bond, since the amide group can accept two and donate two hydrogen bonds. It is found on the surface as well as buried within proteins. It is a common site for attachment of carbohydrates in glycoproteins. Food sources that contain asparagine is dairy, beef, poultry, and eggs.

Functions Asparagine, along with glutamate, is an important neurotransmitter. Since Aspartic acid and Asparigine have high concentration in the hippocampus and hypothalamus of the brain, which is important in short-term memory and emotions, the two amino acids serves essential role between the brain and the rest of the body. Asparagine is required by the nervous system to maintain equilibrium and is also required for amino acid transformation from one form to the other which is achieved in the liver.

Synthesis Synthesis of asparagine requires oxaloacetate, C4H4O5. The double bonded oxygen attached to carbon-2 is replaced by ammonium group from glutamate via a process called transaminase. The newly formed compound, or aspartate, is converted to asparagine by replacing a negatively charged oxygen end with an ammonium group. The asparagine synthesis converts glutamine to glutamate, and ATP into AMP and pyrophosphate.

Analysis Asparagine can be identified by following methods: UV spectrometry, infrared spectroscopy (IR), nuclear magnetic spectroscopy, (NMR), and mass spectroscopy.

## Aspartic acid - Asp/ D

Structure Aspartic acid (C4H7NO4) is also named as a 2-aminobutanedioic acid. Its molecular weight is 133.1 g/mol.

Also known as aspartate, Aspartic acid is an acidic and polar amino acid that has carboxylic acid group, which loses a proton to be carboxylate group for physiological pH and has a negative charge; the carboxylic acid group of the amino acid has a pKa value of 4.1, which is a little basic than the terminal α-carboxyl group. Its pI is 5.41. Proteins are critical to maintain the pH balance in the body. It is the charged amino acids that are involved in the buffering properties of proteins. Aspartic acid is similar to alanine but with one of the β hydrogens replaced with a carboxylic acid group. This carboxylic acid group is what makes aspartate an acidic amino acid. Aspartate has an α-keto homolog, called oxaloacetate. Aspartate and oxaloacetate are interconvertable by a simple transamination reaction. Oxaloacetate is one of the intermediates of the Krebs cycle. The Krebs cycle is the sequence of reactions by which most living cells generate energy during the process of aerobic respiration.

Features Aspartic acid is a non-essential amino acid can be obtained from central metabolic systems.

Functions Aspartic acids are involved in transamination in which oxaloacetate and aspartate is interconvertible. It is also involved in immune system activity by promoting immunoglobulin production and antibody production. Moreover, aspartic acid protects the liver and helps in detoxification of ammonia.

Aspartate, the conjugate base of aspartic acid, also functions as a neurotransmitter. Along with few other amino acids, its primary role is to activate NMDA receptors in brain and; however, its effect is not significant as glutamate's.

Other than its role as an excitatory neurotransmitter, aspartate is proteinogenic amino acids that are used in coding of DNA.

Aspartate plays important roles as acids in enzyme active centers, as well as in maintaining the solubility and ionic character of proteins.

Synthesis Aspartic acid is synthesized from oxaloacetate via transamination. Aspartic acid can be used as an initial reactant in synthesis of other essential amino acids as well: methionine, threonine, isoleucine, and lysine. Aspartic acid needs to be reduced to its semialdehyde form of HOOCCH(NH2)CH2CHO. Asparagine can be also obtained from aspartic acid via transamidation: aspartic acid + glutamine -> asparagine + glutamic acid

## Cysteine - Cys/ C

Structure Cysteine, C3H7NO2S with molecular mass of 121.16 g/mol, is an amino acid that is made of the sulfhydryl or thiol group (-SH), which is more nucleophilic than a hydroxyl group. Its alternate name is 2-amino-3-mercaptopropanoic acid. Two cysteine residues can be oxidized to form stable disulfide bonds. Disulfide bonds can help to give a protein secondary and tertiary structure, e.g. protein folding. The unit of two bonded cysteines is known as cystine. Cysteine is considered to be a hydrophilic amino acid based on the fact that the thiol group interacts well with water. It is also a non-essential amino acid, and can be biosynthesized in human bodies.

Functions Nucleophilic thiol groups in cysteine can be easily oxidized; thus, cystein is highly reactive with its neutral pKa and has various functions in biology.

Cysteine is capable of inactivation of insulin in bloodstream. Excessive amount of cysteine reduces one of three disulfide bonds in insulin structure. As a result, insulin loses its functionality. Cysteine's capability of inactivation of insulin can be utilized in medicine and pharmaceutic when a patient experiences hypoglymecia attack due to high level of insulin.

Cysteine promotes iron production in iron deficiency anemia. It also assists in lung diseases by increasing production of red blood cells and red blood cells. Cysteine is a key, active site residue in many important proteins. Cysteine is the key residue in glutathione reductases which has protective effects against UV light, radiation, and free radicals. Additionally, glyceraldehyde-3-phosphate dehydrogenase, a key enzyme in glycolysis, uses cysteine in to achieve its most critical functions.

When cysteine is taken as a supplement, it is in the form of N-acetyl-L-cysteine (NAC). The body makes this into cysteine and then into glutathione, a powerful antioxidant. Antioxidants fight free radicals which are harmful compounds in the body that cause damage to the cell membranes and DNA. Researchers believe the free radicals play a role in aging as well as the development of a number of health problems, including heart disease and cancer. NAC can also help prevent side effected caused by drug reactions and toxic chemicals. It also helps break down mucus in the body. NAC also benefits in treating some respiratory conditions, such as bronchitis and COPD. COPD is the acronym for chronic obstructive pulmonary disease. Doctors often give NAC to people who have taken an overdose of acetaminophen (Tylenol). The NAC helps to prevent or reduce liver and kidney damage. NAC also helps reduce angina. Angina is chest pain or discomfort when the heart muscle does not get enough blood. Taking NAC will open the blood vessels and improve blood flow to the heart. Studies have also shown that NAC may help relieve symptoms of chronic bronchitis, leading to fewer flare ups. Not all studied gave these results. Some studies did not find any reduction in flare ups. Other studies showed that people with COPD who took NAC lowered the number of flare ups about 40% when used with other therapies. Another study shows that people who took NAC two times a day had fewer flu symptoms than those who took placebo. Some research has shown that intravenous NAC may boost levels of glutathione and help prevent and/or treat lung damage cause by ARDS, acute respiratory distress syndrome. Other results did not coincide with these results. For example, giving NAC to people with ARDS helped reduce the severity of their conditions while not reducing the number of overall deaths compared to placebo.

Biosynthesis The precursors of synthesis of cysteine are serine and methionine. Serine has a hydroxide group and methione has a sulfer as their substituents. Methione is initially converted into a homocysteine. With serine, homocysteine becomes cystathione (C7H14N2O4S) with water molecule leaving. Finally, addition of water and departure of ammonia from cystathione result in cysteine and alpha-ketobutyrate as a side-product.

## Glutamine - Gln/ Q

Structure

Glutamine, or 2-amino-4-carbamoylbutanoic acid, has a molecular formula of C5H10N2O3 and a molecular mass of 146.16 g/mol. It is a polar and uncharged derivative of acidic amino acid glutamic acid or glutamate; it has a carboxamide group, which is neutral at physiological pH and can be changed to carboxylic acid by hydrolysis to form glutamate amino acid. The carboxamide group of the amino acid can form hydrogen bonds.

Glutamine Final

Synthesis As previously stated, glutamine is a nonessential amino acid. In the body, glutamine is synthesized from glutamate via the enzyme glutamine synthestase (GS) and through the addition of ATP and ammonia. (See Figure).

Glutamate + ATP + NH3 → Glutamine + ADP + phosphate + H20

The incorporation of ammonia into glutamate is an amidation type reaction and the hydrolysis of ATP to ADP drives the reaction forward. ATP is directly involved in the reaction because it phosphorylates the carboxyl group on the side chain of glutamate and forms an acyl-phosphate intermediate (See Figure: Glutamine Final). The acyl-phoshphate intermediate reacts with free ammonia and forms glutamine. Glutamine synthetase (GS) plays a major role because a high-affinity binding-site for ammonia is formed in GS after the formation of the intermediate to prevent hydrolysis of the intermediate. Hydrolysis of the intermediate would not yield glutamine and thus waste a valuable molecule of ATP.

Functions Glutamine is a non-essential amino acid, which means that it will naturally occur in the human body and does not need to be gathered from exogenous sources. It is one of the most abundant amino acid manufactures in the body. Glutamine circulates in the blood and is able to cross the blood-brain barrier directly.

Glutamine has various functions in biochemistry. Its primary role is protein synthesis, but it also helps to maintain neutral pH in the liver by balancing the acid and base levels.

Like glucose, glutamine is capable of fueling cell bodies. It donates nitrogen to cells via anabolic reactions and provides carbons in the citric acid cycle. It is critical in the gastrointestinal system in that it provides energy to the small intestine. Notably, intestine is the only organ in the body that uses glutamine as a primary energy source. The kidney, activated immune cells, and cancer cells also require glutamine, but not as a primary energy source.

Within a cell, glutamine is essential for cell growth and protein translation. Moreover, it serves as a nitrogen donor and assists in maintaining the gradient across the mitochondrial membrane.

Normal cells require glutamine. On the other hand, cancer cells use glutamine in quantities much higher than normal cells. As discussed in the paper "Glutamine addiction: a new therapeutic target in cancer" by David R. Wise and Craig B. Thompson, cancer cells will sometimes exhibit what is called “glutamine addiction”. In this addiction, cancer cells will uptake glutamine from the body in much larger amounts than is necessary for cellular function. In fact, cancer cells will intake more glutamine than the cell can metabolize. Depriving cancer cells of this excess glutamine causes them to die. Such deprivation is the key to potential glutamine-based cancer therapy. Glutamine consumption can exceed the consumption of any other amino acid in the cell by tenfold. In cancer cells, a metabolic shift occurs so that glutamine replaces glucose as the major source of carbon for the cell.

The body can make enough glutamine for its regular needs, but extreme stress, such as heavy exercise or an injury), will make the body require more glutamine. Most glutamine is stored in muscles followed by the lungs, where much of the glutamine is made. Usually the body can make enough glutamine so it is not necessary to take supplements of glutamine. Certain medical conditions, including injuries, surgery, infections, and prolonged sites, can lower glutamine levels, however. In these cases, taking a glutamine supplement may be helpful.

Glutamine is important for removing excess ammonia, which is a common waste product in the body. Glutamine also helps your immune system function and is need for normal brain function and digestion. Glutamine is important in wound healing and recovery form an illness. When the body is stressed, it releases hormone cortisol into the bloodstream. This high concentration of cortisol will lower the body’s stores of glutamine. Other studies have shown that adding glutamine to enteral nutrition it will help reduce the rate of death in trauma and critically ill people. Clinical studies have found that glutamine supplements strengthen the immune system and reduce infections. Glutamine supplements also help in the recovery of severe burns. Another importance of glutamine is to protect the lining of the gastrointestinal tract known as the mucosa. People who have inflammatory bowel disease (IBD) may not have enough glutamine in their body. Two clinical trials found that taking glutamine supplements did not improve symptoms of Crohn’s disease. People with HIV or AIDs often experience severe weight loss, thus those people take glutamine supplements along with other nutrients including vitamin C and E, beta-carotene, selenium, and N-acetylcysteine to increase weight gain and help the intestines better absorb nutrients. Athletes who train for endurance events may reduce the amount of glutamine in their bodies, thus making them more prone to catch a code after an athletic event. Studies show that taking glutamine supplements resulted in fewer infections.

Glutamine and Cancer It has been shown that some cancer cells have an addiction to glutamine in that there is an increased rate of glutamine uptake. The increase in glutamine uptake is due to glutamine playing roles other than providing nitrogen for protein (amino acid) and nucleotide biosynthesis.

The first signs of cancer cells relying on an excess of a given compound to produce energy were discovered by Otto Heinrich Warburg. Warburg noticed that the energy produced in most cancer cells was produced through glycolysis of excess glucose, which is in turn converted into lactic acid during lactic acid fermentation. Such a process is in contrast with energy production in normal cells, in which glycolysis still occurs, but is instead followed by oxidation of pyruvate in mitochondria. As such, Warburg concluded that these cancer cells must have devolved into a more primitive form of metabolism as seen in single-celled eukaryotes. Thus this effect of cancer cells up taking excess glucose for their energy needs has been dubbed the "Warburg Effect". Glutamine was later found to mirror this effect in some tumor cells.

Glutamine has been shown to participate in signaling and uptake of essential amino acids. For instance, it is capable of acting as the substrate of the mitochondria to maintain the integrity of the mitochondrion membrane potential. It also plays integral roles in a variety of anaplerotic reactions.

Glutamine donates nitrogen to cancer cells. Like all cells, cancer cells must synthesize nitrogen compounds to produce nucleotides and other amino acids. Glutamine donates the nitrogen that is necessary for the production of these compounds. Glutamine donates its amide group and is converted into glutamic acid. Glutamatic acid transfers its amine group by transaminases to α-ketoacids which is used to generate the nonessential amino acids. This decompostion provides the nitrogen with several amino acids including alanine, serine, aspartate, and proline. Tyrosine is the only nonessential amino acids not produced from either glucose or glutamine.

• Glutamine is Needed for the Uptake of Essential Amino Acids in Certain Cancer Cells and as a Molecular Signal
Glutamine is imported through glutamine solute carrier SLC1A5 and quickly exported through the SLC7A5 amino acid transporter in exchange for extracellular essential amino acids. However, when the glutamine importer is impaired, the uptake of essential amino acids is also impaired. Such impairment suggests that glutamine is necessary for essential amino acid uptake. Without essential amino acids, the rapamycin-sensitive (mTORC1) is not activated. mTORC1 plays an essential role in regulatin cell growth and protein translation as well as inhibiting macroautophagy. As such, inactivation of mTORC1 inhibits cellular growth and protein translation. Thus, glutamine acts as a signal to mTORC1 and as a resource of essential amino acids in some cancer cells.
• Glutamine Provides Anaplerosis in Cancer Cells
Anaplerosis is a term used to describe the replenshing of the carbon pool in the mitochondrion. Oxaloacetic acid (OAA) is one of the substrates in mitochondria that eventually lead to synthesis of many essential biological macromolecules like cholesterol. In glioblastoma cells, glutamine metabolism provides the bulk of the OAA cellular pool. Thus, the increased rate of glutamine metabolism into OAA confirms glutamine as a primary substrate in cancer cells that provides the mitochondria with precursor macromolecules to carry out its metabolic functions.
• c-Myc Regulate Glutamine Metabolism in Cancer Cells
The synthesis of purines and pyrimidine uses glutamine as a source of nitrogen in five enzymatic steps. Three out of the five steps are regulated by c-MYC (Myc), a DNA transcription factor. Oncogenic levels of Myc promote increased glutaminolysis at the transcription level and the metabolism of glutamine into lactic acid. The catabolism of glutamine provides cells with carbons for anaplerosis and NADPH production.
Myc is a transcription factor that codes for a protein that binds to DNA. In a cancerous cell, Myc is amplified. Myc uptakes glutamine and converts it to glutamic acid and lactic acid. Myc over expression leads to increased catabolism of glutamine, which leads to a larger amount of carbon in the cell, which allows the cell to produce more NADPH. This over-expression of Myc triggers the metabolic switch from glucose to glutamine as the source of carbon for the cell.
• Glutamine-based cancer therapy
Glutamine addiction in some cancer cells is a target for new cancer therapies. Further research is needed to determine a non-toxic dosage; that is, a dosage that does not inhibit glutamine production indiscriminately and does so only in cancerous cells.
Since cancer cells are dependent on glutamine, starving these cells of glutamine will cause them to die. Thus, glutamine has become a target for new cancer treatments. New treatments have attempted to deny cancer cells their source of glutamine by reducing the amount of glutamine in the body. However, as glutamine is essential for many other processes in the body, such as synaptic communication in the brain, removing glutamine from the body is not a feasible treatment and is very dangerous. Other treatment methods have attempted to reduce the ability of the cell to uptake glutamine by targeting Myc and other proteins that are responsible for transporting glutamine into the cell. Other treatments have attempted to reprogram the mitochondria so that it will no longer depend on gluatmine. Another treatment involves targetting mTOR’s glutamine response. These treatments show more promise and less harm than removing all glutamine from the body.
These therapeutic methods target major glutamine activity in cancer cells:
1. Glutamine uptake and mTOR activation: L-γ-glutamyl-p-nitroanilide (GPNA) inhibits SLC1A5, a target for Myc. Such inhibition suppresses glutamine uptake in the cell. 2-aminobicyclo-(2,2,1)heptanecarbozylic acid (BCH) also inhibits SLC7A5 and blocks mTOC activation, inducing autophagy.
2. Glutamine-dependent anaplerosis and activity in mitochondria: Studies suggest that carbons derived from glutamine enter the citric acid cycle via transaminase. Therefore, Amino-oxyacetic acid (AOA), a transaminase inhibitor, shows potential as a promising cancer therapeutic. Additionally, the regeneration of mitochondrial NAD+ may prevent the entry of glutamine through the citric acid cycle. Metaformin, a biguanide class drug, inhibits this mechanism.

Usage

1. Wound Healing
2. Inflammatory Bowel Disease
3. HIV/AIDS
4. Obesity
5. Peritonitis
6. Athletes
7. Cancer
8. etc.

## Glutamic acid - Glu/ E

Structure The molecular formula of glutamic acid is C5H9NO4. Its molecular mass is 147.13 g/mol. Also known as glutamate, Gluctamic acid is a polar amino acid that has carboxylic acid group, which loses a proton to become carboxylate group for physiological pH and has a negative charge; the carboxylic acid group of the amino acid has a pKa value of 4.3, which is a little basic than the terminal α-carboxyl group and that of aspartic acid. The pKa of glutamic acid is significantly higher than that of aspartic acid due to the inductive effect o the additional methylene group. In some proteins, due to a vitamin K dependent carboxylase, some glutamic acid will be dicarboxylic acids, referred to as γ carboxyglutamic acid, that form tight binding sites for calcium ion. Glutamic acid and α-ketoglutarate, an intermediate in the Krebs cycle, are interconvertible by transamination. Glutamic acid can therefore enter the Krebs cycle for energy metabolism, and be converted by the enzyme glutamine synthetase into glutamine, which is one of the key players in nitrogen metabolism.

Function Glutamic acid is highly involed in metabolism. In citric acid cycle, tranamination of alpha-ketoglutarate with alanine or aspartate each gives off glutamate and pyruvate or oxalatate respectively. Pyruvate and oxalatate formed fram transamination play critical roles in cellular metabolism.

Glutamic acid is a non-essential amino acid. It plays an important role in DNA synthesis. It also assists in wound and ulcer healing. Glutamic acid takes places in the excitatory neurotransmitter and the metabolism of sugars and fats. It aids potassium move through the blood-brain barrier. Glutamic acid is a source of fuel for the brain. It is capable to attach to amine group to form glutamine. The process of forming glutamine will detoxifies ammonia that the body contains.

Glutamic acid can be used in correcting personality disorders and treating childhood behavioral disorders. It also takes places in treating epilepsy, mental retardation, muscular dystrophy, ulcers, and hypoglycemic coma.

Other minor uses include flavor enhancer, GABA precursor, nutrients, and fertilizers for plants

Synthesis A biosynthesis of glutamic acid involves various schemes. The most common scheme is the conversion of glutamine to glutamic acid by adding water molecules with glutaminase as a helper enzyme. The side product is an ammonia group. Addition of water to a N-Acetylglutamic acid also produce glutamic acid and acetate. Ketoglutaric acid is another common precursor in synthesis of glutamic acid. Addition of NADPH ad ammonia or alpha amino acid produces glutamic acid. Such enzymes involved are glutamate dehydrogenase and transaminase. Other methods include 1-pyrroline-5-carboxylate + NAD+ + HOH and N-formimino-L-glutamate + FH4.

Glutamic acid is easily converted into proline. First, the γ carboxyl group is reduced to the aldehyde, yielding glutamate semialdehyde. The aldehyde then reacts with the α-amino group, eliminating water as it forms the Schiff base. In a second reduction step, the Schiff base is reduced, yielding proline.

## Glycine - Gly/ G

Structure Glycine's molecular formula and mass are C2H5NO2 and 75.07 g/mol. Being the smallest amino acid out of all 20 amino acids, glycine only has a hydrogen atom as its substituent. For this reason, it has the ability to fit into tight spaces of molecules where no other amino acid could possibly fit therefore glycine is evolutionarily conserved. Most proteins contain small amount of glycine, however collagen is one of the exception that contains 35% glycine. Thus, if glycine were cleaved from an amino acid chain composing a whole protein, it would either alter the function of that protein, or denature it entirely. It is also the only achiral amino acid since its R group is simply a H atom. In particular it does not favor the helix formation.

Functions Glycione is non-essential amino acids meaning the human can manufacture it in their body. It serves an important role in maintaining central nervous and digestive systems. Glycine prevents the breakdown of muscle by increase creatine, which is a compound that helps build muscle mass. Glycine also keeps the skin firm and flexible. Without glycine, the skin can be damage from the UV rays, oxidation and free radical.

Glycine regulates blood sugar levels and helps provide glucose for the body.

Glycine serves as an inhibitory neurotransmitter in the central nervous system, especially in the spinal cord. When glycine binds to receptors, it activates chloride ion channels to open. As chloride ions enter the channels, the membrane becomes hyperpolarized, causing an inhibitory postsynaptic potential (IPSP).

Some disorders that can be treating using glycine is used for treating schizophrenia, stroke, benign prostatic hyperplasia (BPH), and some rare inherited metabolic disorders. It is also used to protect kidneys from the harmful side effects of certain drugs used after organ transplantation as well as the liver from harmful effects of alcohol. Other uses include cancer prevention and memory enhancement.

Some people apply glycine directly to the skin to treat leg ulcers and heal other wounds. The body uses glycine to make proteins. Glycine is also involved in the transmission of chemical signals in the brain, so there is interest in trying it for schizophrenia and improving memory. Some researchers think glycine may have a role in cancer prevention because it seems to interfere with the blood supply needed by certain tumors.

Biosynthesis Glycine is a derivative form of serine and 3-phosphoglycerate. The conversion of serine requires a specific enzyme called serine hydroxymethyltransferase and co-factor pyridoxal phosphate. The process can be simplied as the following reaction: serine + tetrahydrofoate -> glycine + N5, or N10-methylene tetrahydrofolate + water.

The reaction continues to carry out in the liver. Glycine synthase is used as enzyme in the conversion of N5, or N10-methylene tetrahydrofolate. In this reaction, carbon dioxides, ammonium, NADH, and protons transform the tetrahydrofolate molecule into glycine.

Degradation of glycine has three pathways. The most common pathway is the opposite of the previous reaction: conversion of glycine into a tetrahydrofolate molecule. Another pathway is the conversion of serine into pyruvate and serine dehydratase. The last pathway involves converting glycine to gloxylate by D-amino acid oxidase. This pathway leaves glycoxylate oxidized to oxalate.

## Histidine - His/ H

Structure Histidine, C6H9N3O2, is also called 2-amino-3-(1H-imidazol-4-yl)propanoic acid. Its molecular mass is 155.15 g/mol. It is a basic, polar amino acid with an imidazole group, which is an aromatic ring that can be of positive charge and hydrophilic. The imidazole group of the amino acid has a pKa value of 6, which can be either uncharged or positively charged at neutral pH. This amino acid is often present in active sites of enzymes wherein the imidazole group acts as a buffer (proton acceptor or donor) for chemical reactions. Histidine is a precursor of histamine, a compound released by the immune system cells during an allergic reaction.

Features At a physiological pH of around 7, the Henderson-Hasselbalch equation can be used to give a ratio of deprotonation/protonation of the imidazole side chain (pKa = 6). As it turns out, the histidine side chain is approximately 10% protonated at a neutral pH. That is not a negligible amount and it gives the histidine residue a certain amount of buffering capacity. The basic nitrogen activates imidazole sites as a nucleophile.

Functions Histidine is found in high concentrations in hemoglobin. As a result, it aids in treatment of anemia and maintaining optimal blood pH. Also, histidine is the precursor of histamine, which is involved in local immune responses.

Histidine is an essential amino acid, which means that the body cannot manufacture it. Histidine plays important roles in stimulating the inflammatory response of skin and mucous membranes. It also stimulates the secretion of the digestive enzymes gastrin and acts as the source and control for histamine levels. Histidine is required for growth and for the repair of tissues, as well as the maintenance of the myelin sheaths that act as protector for nerve cells. Histidine is also required to manufacture both red and white blood cells. With histidine in the body, it helps protect the body from damage caused by radiation and in removing heavy metals from the body. Histidine is also in the stomach. It is helpful in producing gastric juices, and people with a shortage of gastric juices or suffering from indigestion, may also benefit from this nutrient. It is thought that histidine may be beneficial to people suffering from arthritis and nerve deafness. This is not conclusively proven. Histidine is also used for sexual arousal, functioning and enjoyment. Histidinemia is an inborn error of the metabolism of histidine due to a deficiency of the enzyme histidase, where high levels of histidine are found in the blood and urine, and may manifest in speech disorders and mental retardation. There are no reported side effects with histidine, but too high levels of histidine may lead to stress and mental disorders such as anxiety and people with schizophrenia have been found to have high levels of histidine. People suffering from schizophrenia or bipolar (manic) depression should not take a histidine supplement without the approval of their medical professional.

Metabolism Histidine can be converted into histamine by histidine decarboxylase. The carboxyl group leaves histidine.

Food sources Food that contain histidine are dairy, meat, poultry, fish, rice, wheat, and rye.

## Isoleucine - Ile/ I

Structure Isoleucine, HOOCCH(NH2)CH(CH3)CH2CH3, is also known as a 2-amino-3-methylpentanoic acid and has a molar mass of 131.17 g/mol. Isoleucine is a nonpolar, aliphatic or hydrophobic amino acid that has two chiral centers for α-carbon atom and the R group. Isoleucine, because it contains two stereocenters, is a diastereomer. If it weren't for the selectivity of living things for one particular stereoisomer, there would be 4 possible stereoisomers because of the 2 chiral centers. However, only one version persists in living organisms: the 2S, 3S version. The structure stabilizes water-soluble proteins by hydrophobic effect.

Features Isoleucine cannot be distinguished by MS from leucine because of the simple fact that they have the same molecular weight. Instead, these two residues would usually have to be isolated and characterized by HPLC or TLC against known standards.

Isoleucine is also degraded into succinyl CoA and acetal CoA and consumed by TCA cycle.

Functions Isoleucine is an essential amino acid, meaning the human body cannot manufacture it. It is needed for the formation of Hemoglobin and to regulate blood sugar and energy levels. Isoleucine serves important roles in muscle strength and endurance and is a source of energy for muscle tissues.

Isoleucine promotes muscle recovery after an intense workout. Isoleucine is necessary for the formation of hemoglobin as well as assisting with regulation of blood sugar levels as well as energy levels. It is also involved in the formation of blood clots.

Symptoms of people with a deficiency of isoleucine may result in headaches, dizziness, fatigue, depression, confusion as well as irritability. Symptoms of deficiency may mimic the symptoms of hypoglycemia. This nutrient has also been found to be deficient in people with mental and physical disorders, but more research is required on this. Consuming higher amounts of isoleucine is not associated with any health risks for most people but those with kidney or liver disease should not consume high intakes of amino acids without medical advise. People who take in higher amounts of isoleucine report elevated urination. People involved with strenuous athletic activity under extreme pressure and high altitude may benefit from supplementation of this nutrient.

Food sources of isoleucine Food containing isoleucine are almonds, cashews, chicken, eggs, fish, lentils, liver, meat, etc.

Biosynthesis Synthesis of iseoleucine involves multiple steps. Isoleucine can be derived from pyruvate and ketoglutarate. Catalytic enzymes required are the followings:

1. Acetolactate synthase 2. Acetohydroxy acid isomemoreductase 3. Dihydroxyacid dehydratase 4. Valine aminotransferase

Industrially, isoleucine can be synthesized from 2-bromobutane and diethylmalonate.

## Leucine - Leu/ L

Structure Leucine's molecular formula and mass are C6H13NO2 and 131.17 g/mol respectively. Leucine, also known as a 2-amino-4-methylpentanoic acid, has aliphatic R group. It is one of the three amino acids with branched hydrocarbon side chains (generally buried in folded proteins) and result as a nonpolar or hydrophobic amino acid. The hydrophobic effect counts for stabilization of water-soluble proteins.

Features Leucine cannot be distinguished by MS from isoleucine for the simple fact that they have the same molecular weight. Instead, these two residues would usually have to be isolated and characterized by HPLC or TLC.

Functions Leucine has all functions of the amino acid Isoleucine as their similarity in branched hydrocarbon side chain. Leucine facilitates skin healing and bone healing by modulating the release of natural pain-reducers, Enkephalins. It is also a precursor of cholesterol and increases the synthesis of muscle tissues by slowing down their degradation process. Leucine is an essential amino acid. It is essential in promoting growth in infant and regulating nitrogen concentration in adults. Leucine is generally used as a flavor enhancer.

Deficiency and Excess Deficiency of this particular amino acids can result in Hypoinsulinemia, Depression, Chronic fatigue syndrome, Kwashiorkor (or starvation), etc. Excess of Leucine leads to Ketosis.

Biosynthesis As an essential amino acid, leucine cannot be synthesized in human bodies, and must be obtained from outside sources. Starting from pyruvic acid, the conversion includes valine, ketovalerate, isopropylmalate, and ketoisocaproate via reduction. Enzymes required are: 1. acetolactate synthase, acetohydroxy acid isomeroreductase, dihydroxyacid dehydratase, isopropylmalate synthase and isomerase, and leucine aminotrasferase.

## Lysine - Lys/ K

Structure Lysine is an essential amino acid. This means that is is necessary for human health but the body cannot produce it so you have to get the amino acid from food or supplements. Lysine are the building blocks of protein. Lysine has a positively charged amine group chain. The ε-amino group has a significant high pKa value of about 10.8, which is more basic than the terminal α-amino group. This basic amino group is highly reactive and participates in the reactions at the active center of enzymes. Although the terminal ε-amino group is charged under physiological condition, the hydrocarbon side chain with three methylene group is still hydrophobic.

Features Lysine is a naturally occurring essential amino acid in human body. It promotes optimal growth of infants and nitrogen equilibrium in adults.

Functions Lysine can be a treatment of Herpes Simplex and virus-associated Chronic Fatigue Syndrome as it inhibits viral growth. It facilitates the formation of collagen, which is the main component of fascia, bone, ligament, tendons, cartilage and skin. It also helps in absorption of calcium, which is critical in bone growth of infants.

Lysine is important for proper growth, and it plays an essential role in the production of carnitine, a nutrient responsible for converting fatty acids into energy and helping to lower cholesterol. Lysine helps the body absorb calcium, and it plays an important role in the formation of collagen, a substance important for bones and connective tissues including skin, tendon, and cartilage.

Herpes Simplex Virus (HSV) Consuming lysine on a regular basis may help prevent outbreaks of cold sores and genital herpes. Lysine has antiviral effects by blocking the activity or arginine, which promotes HSV replication. It has been studied that lysine at the beginning of a herpes outbreak did not reduce symptoms. Studies show that lysine with L-arginine makes bone building cells more active and enhances production of collagen. No studies have examined whether lysine helps prevent osteoporosis in humans.

Osteoporosis Lysine helps the body absorb calcium and thus decreases the amount of calcium that is lost in urine. Calcium is essential for strong bones so some researchers assumed lysine may help prevent bone loss associated with osteoporosis.

Deficiency and Excess Deficiency of lysine is seen in Herpes, Chronic Fatigue Syndrome, AIDS, Anemia, hair loss, and weight loss, etc. Having excessive lysine can result in high concentration of ammonia in the blood. Most people get enough lysine in their diet, although athletes, vegans who do not eat beans, as well as burn patients may need more. Not enough lysine can cause fatigue, nausea, dizziness, loss of appetite, agitation, bloodshot eyes, slow growth, anemia, and reproductive disorders. For vegans, legumes such as beans, peas, and lentils are the best sources of lysine.

Food Sources Foods rich in lysine are meat, cheese, fish, nuts, eggs, soybeans, spirulina, and fenugreek seed. Brewer's yeast, beans, and other legumes, and dairy products also contain lysine, Many nuts contain lysine.

## Methionine - Met/ M

Structure Methionine is one of the two amino acids with side chain containing sulfur. It contains a largely aliphatic side chain that includes a thioether (-S-) group. Unlike Cysteine, the chemical linkage of the sulfur in methionine is thiol ether. This sulfur does not participate in covalent bonding like that of cysteine. The high inclination of the sulfur oxidation in methionine is one of the causes of smoking-induced emphysema in the human lung tissue.

Features Methionine is a naturally occurring essential amino acid, which plays a critical role in supplying free methyl groups and sulfur in metabolism. It is also one of only two amino acids coded for by a single codon.

Functions Methionine helps the breakdown of fat and reduces blood cholesterol levels. It is an antioxidant that neutralizes free radicals and removes waste in the liver. Synthesis of DNA and RNA requires the presence of Methionine. It is also a precursor of several critical amino acids, hormones, and neurotransmitters in human body. Its AUG codon also serves as a "start" signal for ribosomal translation of messenger RNA or mRNA; this means that every peptide chain began with an methionine residual at its N-terminal. It may however be removed later on by cleavage.

Deficiency and Excess Methionine deficiency can be seen in chemical exposure and vegetarians. Severe liver disease can result from having excessive methionine.

## Phenylalanine - Phe/ F

Structure The amino acid phenylalanine is a derivative of alanine wherein a phenyl group takes the place of one of the hydrogens on the CH3 group. Phenylalanine has stronger hydrophobic properties when compared to the other aromatic amino acids, i.e. tyrosine and tryptophan. Tyrosine and tryptophan are less hydrophobic than phenylalanine due to their hydroxyl and indol substituents. Phenylalanine is often found buried in the proteins due to its hydrophobicity. Neighboring phenyl rings (on adjacent amino acids) can stabilize each other by pi stacking.

Features Individual amino acids as well as peptides are occasionally analyzed by UV light. Phenylalanine, along with the few other aromatic amino acids, fluoresces when UV light is applied. UV light can be a useful technique for verifying the presence of Tyr, Phe, and Trp. It can also quantify those amino acids if a sensitive enough assay is developed.

Functions Phenylalanine is a precursor of the amino acid tyrosine, which gives rise to neurotransmitters, such as dopamine, norepinephrine and epinephrine. It can be used to manage certain types of depression as a powerful anti-depressant and can also enhance memory, thought, and mood. This amino acid also plays a role in decreasing blood pressure in hypertension. The D form of phenylalanine can be used to reduce pain in arthritis which is a rare instance among amino acids. Phenylalanine is a naturally occurring amino acid that promotes growth in infants and regulates nitrogen concentration in adults.

Deficiency and Excess Deficiency of Phenylalanine can be seen in depression, AIDS, obesity, Parkinson's Disease, etc. Some people have a autosomal recessive genetic disorder called phenylketonuria, or PKU. This disorder is due to the lack of an enzyme that breaks down phenylalanine amino acids, which leads to a large accumulation of this amino acid, and in large quantities, phenylalanine is toxic, particularly to the brain. This leads to the possibility of mental retardation from this disorder. As a result, babies were blood tested early for signs of PKU, and if they have it then they must follow a strict diet that reduces the amount of natural phenylalanine in the food.

## Proline - Pro/ P

Structure Proline is one of the twenty DNA-encoded amino acids. It is unique among the 20 protein-forming amino acids because the α-amino group is secondary rather than primary as other amino acid. The distinctive cyclic structure of proline side chain locks its φ backbone dihedral angle at approximately -75°, giving proline an exceptional conformational rigidity compared to other amino acids. Hence, proline loses less conformational Entropy upon folding, which may account for its higher prevalence in the proteins. Proline, strictly speaking, can also be referred to as an imino acid. It greatly influences protein architecture because of its ring structure that makes it more conformationally restricted than the other amino acids.

Functions Proline behaves as a structural disruptor in the middle of regular secondary structure elements. However, proline is commonly found as the first residue of an alpha helix and in the edge strands of beta sheets. Proline is most commonly found in turns, which may account for the curious fact that proline is usually solvent-exposed although it has a completely aliphatic side chain. Because proline lacks of hydrogen on the amide group, it cannot act as a hydrogen bond donor, only as a hydrogen bond acceptor. Proline is important in healing, cartilage building, and in flexible joints and muscle support. It also helps reduce the sagging, wrinkling, and aging of skin resulting from exposure to the sun. Proline by breaking down protein and helps create healthy cells. It is essential both to skin health, and for the creation of healthy connective tissues and also muscular tissue maintenance.

Deficiency and Excess Proline deficiency is generally caused by people who perform prolonged exercises. Vitamin C deficiency will also cause proline to be lost in the urine because of collagen breakdown. Generally, people's body with proline deficiency tends to metabolize muscle cells instead of carbohydrates first if glucose levels are low. Proline is needed to maintain proper collagen creation and stabilize muscular tissue as well. The lack of proline could lead to symptoms such as fatigue, weight loss, dehydration, dizziness, and nausea.

## Serine - Ser/ S

Structure This amino acid's R group is a hydroxyl group attached to a CH2 group. The hydroxyl group is polar giving serine polar/hydrophilic properties. It has a pH of 5.68. pKa = 2.21, 9.15.

Features Serine is a non-essential amino acid which means it can be synthesized by the human body. For instance, serine can be synthesized from glycine. Serine is a precursor of glycine and cysteine.

Biosynthesis The biosynthesis of serine begins with the oxidation of 3-phosphoglycerate (an intermediate in glycolosis) to 3-phosphohydroxypyruvate which is then transaminated to 30phosphoserine. This last intermediate is then hydrolyzed to serine.

Function Serine is a non-essential amino acid which means it can be synthesized by the human body. For instance, serine can be synthesized from glycine. Serine is also a precursor of glycine and cysteine. Serine is found in phospholipids, active sites of trypsin and chymotrypsin. It can synthesize pyrimidines and proteins, cysteine and tryptophan. It is also involved in fat and fatty acid formation, muscle synthesis. Serine can be deaminated by the catalyst serine dehydratase, yielding to pyruvate and ammonium. The deamination of threonine follows a similar process.

## Threonine - Thr/ T

Structure Threonine is a polar, uncharged amino acid. Its side chain contains a secondary alcohol and a methyl group; hence it can be characterized as a hydrophilic amino acid. Threonine incorporates two chiral centers, just like isoleucine. If it weren't for the selectivity of living things for one particular stereoisomer, there would be 4 possible stereoisomers because of the 2 chiral centers. However, only one version persists in living organisms: the 2S, 3R version.

Features Threonine is an essential amino acid, which means it cannot be synthesized by the human body. Humans must ingest it in the form of threonine-containing foods.

Functions Threonine aids the formation of elastin and collagen. In the immune system, threonine aids in the formation of antibodies. It also promotes growth and function thymus glands and absorption of nutrients. In addition, threonine is the precursor to isoleucine. Threonine can be deaminated by the catalyst threonine dehydratase, yielding to α-ketobutyrate and ammonium. The deamination of Serine follows a similar process.

## Tryptophan - Trp/ W

Structure Tryptophan is an amino acid of aromatic group of an indole group bonded to a methylene group as the side chain, which is of two aromatic rings of nitrogen and hydrogen group and is hydrophilic. One of the side chains is 5-membered while the other is 6, and 2 carbons are shared by both aromatic rings.

Features Individual amino acids as well as peptides are occasionally analyzed by UV light. Tryptophan, along with the few other aromatic amino acids, fluoresces when UV light is applied. UV analysis can be a useful technique for verifying the presence of Tyr, Phe, and Trp. It can also quantify those amino acids if a sensitive enough assay is developed.

Functions Tryptophan is the precursor for various proteins, serotonin and niacin. It also promotes the formation of peptides and proteins. It is an essential amino acid, meaning it cannot be produced by the human body. It is usually present in peptides, enzymes, and structural proteins.

Deficiency and Excess Excess tryptophan has been linked with eosinophilia-myalgia syndrome (EMS). A deficiency of tryptophan is known as Pellagra which causes a deficiency of niacin. However, with vitamin supplements, this disease is no long as prominent. Symptoms of the disease include dementia and schizophrenia. Hartnup Disease is a genetic autossomal recessive disease in which a person cannot effectively digest this amino acid in their digestive tract. Although the disease of experiences symptoms similar to those of pellagra, however being slightly less severe. Patients suffering from the disease are generally seen with red rashes that are further aggravated by UV light from the sun. Further mental retardation could occur if not treated correctly with vitamin supplementation.

## Tyrosine - Tyr/ Y

Tyrosine Tyrosine is a nonpolar aromatic amino acid that contains a hydroxyl group attached to an aromatic ring. The hydroxyl group is particularly important because these residues are utilized in the phosphorylation of other proteins. Tyrosine is a non essential amino acid meaning it can be synthesized in the body. It is synthesized using phenylalanine in the body.

Features Individual amino acids as well as peptides are occasionally analyzed by UV light. Tyrosine, along with the few other aromatic amino acids, fluoresces when UV light is applied. UV light can be a useful technique for verifying the presence of Tyr, Phe, and Trp. It can also quantify those amino acids if a sensitive enough assay is developed.

Functions Tyrosine plays crucial roles in the human body: It helps deal with stress by becoming an adaptanogen helps minimize effects of the stress syndrome, in drug detoxification such as for cocaine, coffee and nicotine addictions. It reduces withdrawals and abuse. It assists in treating Vitiligo, pigmentation of skin, Phenylketonuria, the condition where phenylalanine is not metabolized. In addition, it is effective for depression treatment.

Tyrosine is also important in the production of epinephrine, norepinephrine, serotonin, dopamine, melanin, and enkephalins, which has pain-relieveing effects in the body. It also affects the function of hormones by regulating thryoid, pituatary and adrenal glands. For example, one need only look at the thyroid hormone thyroxine to see that it is synthesized from tyrosine. Tyrosine is known to dislodge molecules that may be harmful to cells, therefore it has protective qualities.

Deficiency and Excess Deficiency of tyrosine can result in low blood pressure, depression, and low body temperature. Tyrosine is a major amino acid responsible for skin, hair, and eye pigments. A loss of tyrosine amino acid in the body may lead to failure to form melanin pigments, resulting partial or full albinism. Interestingly enough, Tyrosine is produced mainly from phenylalanine in which a loss of one would lead to the increase of the other amino acid present in the organism's body.

## Valine - Val/ V

Structure Valine is an amino acid with an aliphatic, isopropyl side chain and is therefore a hydrophobic amino acid. Valine differs from threonine in that the OH group of threonine is replaced by a CH3 group. This is a nonpolar amino acid. It is an essential amino acid; therefore it cannot be produced by the human body. Being hydrophobic, this amino acid is often found in the interior of proteins.

Features In animals, valine must be ingested. In plants, it is created by using pyruvic acid, converting it to leucine followed by the reductive amination with glutamate. Valine is found in the following foods: soy flour, fish, cheese, meat and vegetables.

Functions Valine is essential in muscle growth and development, muscle metabolism, and maintenance of nitrogen balance in the human body. It can be used as an energy source in place of glucose. It can also be used as a treatment for brain damage caused by alcohol.

Deficiency and Excess Deficiency of valine affects myelin sheets of nerves. Maple Syrup Urine Disease is caused because leucine, valine and isoleucine cannot be metabolized.

## Ionization of amino acids

The 20 standard amino acids have two acid-base gorups: the alpha-amino and the alpha-carboxyl groups attached to the Cα atom. Those amino aicds with an ionizable side-chain (Asp,Glu,Arg,Lys,His,Cys,Tyr) have an additonal acid-base group. At low pH (i.e. high hydrogen ion concentration) both the amino group and the carboxyl group are fully protonated so that the amino acid is in the cationic form H3N+CH2COOH. As the amino acid in solution is titrated with increasing amounts of a strong base (e.g. NaOH), it loses two protons,first from the amino group which has the higher pK value (pK=9.6). The pH at which Gly has no net charge is termed its isoelectric point, pI. The α-carboxyl gorups of all the 20 standard amino aicds have pK values in the range 1.8-2.9, while their α-amino groups have pK values in the range 8.8-10.8. The side-chains of the acidic amino acids Asp and Glu have pK values of 3.9 and 4.1, respectively, whereas those of the basic amino acids Arg and Lys, have pK values of 12.5 and 10.8, respectively. Only the side-chain of His,with a pK value of 6.0, is ionized within the physiological pH range (6-8). It should be borne in mind that when the amino aicd are linked together in proteins, only the side-chain groups and the terminal α-amino and α-carboxyl gorups are free to ionize.

## Pyridoxal 5’-Phosphate-Mediated Decarboxylation of an �-Amino Acid

Step 1: The amino acid reacts with enzyme-bound pyridoxal 5�-phosphate (PLP). An imine linkage (CoeN) between the amino acid and PLP forms, and the enzyme is displaced.

Step 2: When the pyridine ring is protonated on nitrogen, it becomes a stronger electron-withdrawing group, and decarboxylation is facilitated by charge neutralization.

Step 3: Proton transfer to the � carbon and abstraction of a proton from the pyridine nitrogen brings about rearomatization of the pyridine ring.

Step 4: Reaction of the PLP-bound imine with the enzyme liberates the amine and restores the enzyme-bound coenzyme.

## References

Berg, Jeremy, Tymoczko J., Stryer, L.(2012). Protein Composition and Structure.Biochemistry(7th Edition). W.H. Freeman and Company. ISBN1-4292-2936-5

Berg, Jeremy M., ed. (2002), Biochemistry (6th ed.) New York City, NY: W.H. Freeman and Company,

Hames, Daivd, Hooper,Nigel. Biochemistry, 3rd edition. Taylor and Francis Group. New York. 2005.

Wise R, David; Thompson B, Craig “Glutamine addiction: a new therapeutic target in cancer” Trends in Biochemical Sciences 35 (2010) 427—433. Retrieved 2010-10-16.

"Chemistry of Health", US Department of Health and Services, NIH Publications,Reprinted 2006

## Purpose

The total chemical synthesis of a D-Enzyme experiment was conducted by R. C. deL. Milton, S.C. F. Milton, and S. B. H. Kent, which found enzyme enantiomers exhibiting reciprocal chiral specificity on peptide sequences. The concept of L-configuration of amino acids predominates in living organisms while the D-configuration remains biologically inactive; Milton et al. examined the ability of enzymes to distinguish and react with a specific enantiomer over the other.

## Methods

The following properties of D-HIV PR and L-HIV PR were analyzed: covalent structure, physical properties, circular dichroism spectra, and enzymatic activity. After the total synthesis of D-HIV PR and L-HIV PR, the new synthesized L- and D- sequences of HIV PR were initially protected and then deprotected to allow the folding of their secondary and tertiary structures. The second method used reversed-phase high-performance liquid chromatography which resulted to identical retention rates of the two polypeptide sequences. It was further examined by ion-mass spectroscopy that both polypeptide sequences had the same molecular weight. This method found that both the D-HIV PR and L-HIV PR sequences had the same covalent structure. Despite having the same covalent structure between D-HIV PR and L-HIV PR, differences arise within its chiral features; using a circular ion spectra proved the expected equal but opposite optical activity of the enantiomers. Within a fluorogenic assay containing a hexapeptide analog of a GAG cleavage site was used as a substrate to test the reactivity of the enantiomers. Both enzymes were equally active, yet exhibited reciprocal chiral specificity; reciprocal chiral specificity was apparent when L-enzyme degraded only the L-substrate and D-enzyme degraded only the D-substrate. In addition, reactivity of the D-HIV PR and L-HIV PR were further tested with enantiomers of an inhibitor called MVT101. As expected its corresponding enzyme determined the effectiveness of the inhibitor; L-MVT101 inhibited L-HIV PR but not D-HIV PR, and D-MVT101 inhibited D-HIV PR but not L-HIV PR. The folding of the polypeptide chains into the three-dimensional structure holds importance to the specificity and catalytic activity of HIV-1 protease. D-HIV PR and L-HIV PR displayed mirror images of each other within the secondary, supersecondary, tertiary, and quaternary structure. In the primary structure, only one chiral amino acid was introduced in the synthesis of the polypeptide chain for D-HIV PR and L-HIV PR; the consequence of this one chiral amino acid in the polypeptide backbone resulted to mirror images of the secondary, supersecondary, tertiary, and quaternary structures.

## Conclusion

The results of this experiment conclude that the two configurations of the enantiomer are reactive and should be reactive in vivo, yet due to evolution the L-proteins are prevalent in living organisms while D-proteins are biologically inactive.

## References

del. Milton, R. C, S.C.F. Milton, and S.B.H Kent. "Total Chemical Synthesis of a D-Enzyme: The Enantiomers of HIV-1 Protease Show Demonstration of Reciprocal Chiral Substrate Specificity."Science. 256. (1992): 1445-1448. Print. Nitrogen Fixation, or rather, the fixing of Nitrogen, is a process where N₂ is reduced into NH₃, either biologically or abiotically. The nitrogen in amino acids, pyrimidines, purines and other molecules all come from the N₂ in our atmosphere. The fixing of nitrogen can also be associated with the conversion of nitrogen into other forms, other than ammonia, such as nitrogen dioxide. The triple bond that is present in N₂ is very strong; it has a bond energy of 940 kJ/mol. Yet, it is thermodynamically favorable to form ammonia from hydrogen and nitrogen, yet the reaction is still very difficulty kinetically speaking since intermediates can prove to be unstable. It has been estimated that approximately 60 percent of the newly fixed nitrogen on Earth is produced by diazotrophic microorganisms, while lightning and ultraviolet radiation contribute another 15 percent and the rest 25 percent is done by industrial processes.

## Nitrogen Fixation

The main avenue for entry of nitrogen into the biosphere is nitrogen fixation. In the nitrogen fixation, we basically fix the dinitrogen, or nitrogen gas into ammonia. Also, fixation of nitrogen requires lots of energy because the triple bond of nitrogen gas is stable. However, breaking the triple bond to generate ammonia requires a series of reduction steps involving high input of energy. Biologically speaking, the conversion of nitrogen into ammonia is usually done by bacteria and archaea. These organisms that are responsible for nitrogen fixation are called diazotrophic microorganisms. For example, the symbiotic Rhizobium bacteria, a diazotrophic microorganism, goes into the roots of leguminous plants to form root nodules where they fix nitrogen. Other examples include Cyanobacteria, Azotobacteraceae, and Frankia. Industrial Processes of Nitrogen Fixation include Dinitrogen complexes, Ambient Nitrogen reduction, and the most common process is the Haber process, invented in 1910. The Haber process involves high pressure, high temperatures, possibly an iron or ruthenium catalyst to produce ammonia. Nitrogen Fixation, in the biological sense, is run by an enzyme called nitrogenase. The reason why the nitrogenase complex is used is because it has multiple redox centers. In general though, nitrogenase complex contains two proteins. The first, a reductase, which provides electrons while the second part, nitrogenase, uses these electrons to turn nitrogen into ammonia. The transferring of electrons, from reductase to nitrogenase, in this process is coupled with the hydrolysis of ATP by the reductase. The reaction for this process is N2 + 8 H+ → 2 NH3 + H2. The reason why this process is an 8 electron process and not simply a 6 electron process is due to the extra mole of Hydrogen that gets generated along with the generation of the ammonia. Often the microorganisms that carry out nitrogen fixation, contain the 8 electrons from the reduced form of Ferredoxin, which can be made from photosynthesis or oxidative processes. Also, this process is coupled by two ATP molecules for each mole, which in turn, equals 16 molecules. The reason for this is not that the ATP hydrolysis is making the reduction thermodynamically favorable since the process is already thermodynamically favorable, but rather allows the reaction to be kinetically possible. Nitrogen fixing bacteria generally separate anaerobic nitrogen fixation from aerobic metaboism by one of several mechanisms. In the ocean and in the freshwater systems, cyanobacteria are the major nitrogen fixers. Within an ecosystem, nitrogen fixers ultimately make the reduced nitrogen available for assimilation by nonfixing microbes and plants. Besides, nitrogen fixation is extremely energy intensive; thus the rate of fixation usually fails to meet the potential demand of other members of the ecosystem.

## Reference

Berg, Tymoczko, Stryer, Biochemistry Sixth Edition

Slonczewski, Joan L. Microbiology. "An Evolving Science." Second Edition.

## Introduction

When there are unneeded amino acids from either protein digestion or turnover, they are broken down into certain compounds. This process usually occurs in the liver.

In amino acid degradation the amino group is removed and turned into an α-ketoacids which is then modified so that the carbon chain could enter the metabolism and eventually become glucose or intermediates of the citric acid cycle.

The amino group is transferred to α-ketoglutarate which forms glutamate. Then the glutamate is oxidatively deanimated to form the ammonium ion NH4+

Aminotransferases catalyzes the reaction that turns the α-amino group from an α-amino acid to an α-ketoacid. These enzymes catalyze α-amino groups from a variety of amino acids to α-keto-glutarate for conversion to NH4+

Aspartate aminotransferase, catalyzes the transfer of the amino group of aspartate to α-ketoglutarate.

Alanine aminotransferase catalyzes the transfer of the amino group of alanine to α-ketoglutarate.

The nitrogen from the α-ketoglutarate in the transamination reaction is converted into an ammonium ion by oxidative deamination. This reaction is catalyzed by glutamate dehydrogenase. This enzyme is special in that it is able to utilize either NAD+ or NADP+. The reaction dehydrogenates the C-N bond, and then hydrolyses of the Schiff base to make a ketoglutarate

The equilibrium for this reaction favors glutamate. But the reaction can be pushed forward by the consumption of ammonia. Glutamate dehydrogenase is found in the mitochondria. This compartmentalization prevents interaction with ammonia. In vertebrates, the activity of glutamate dehydrogenase is allosterically regulated.

NH4+ is converted into urea, which is then excreted as waste.


Overview

To synthesize amino acids, there must be a source of nitrogen that is in a form that can be easily used. Various microorganisms reduce inert nitrogen gas into two molecules of ammonia to provide for this source of nitrogen. On the other hand, the carbon backbone can be provided in three different ways--these include the citric acid cycle, the glycolytic pathway, and the pentose phosphate pathway.

Since amino acids are all chiral except for glyciene, biosynthesis of amino acides must generate the correct isomers efficiently. This is done by transamination reactions and high regulation of biosynthetic pathways, through feedback and other mechanisms.

Nitrogen Fixation

To reduce atmospheric nitrogen gas (N2) to ammonia (NH3, a process called nitrogen fixation, microorganisms require ATP. Nitrogen fixation is performed by nitrogenase complex, an enzyme that has many centers for redox. This enzyme is composed of a reductase and nitrogenase. The reductase provide electrons while the nitrogenase uses these electrons, reducing atmospheric nitrogen to ammonia in the following reaction:

N2 +8 e- + 8 H+ <--> 2 NH3 + H2

Most microorganisms that are capable of nitrogen fixation carry out this reaction by generating a reduced ferredoxin through photosynthesis, providng the electrons. Two molecules of ATP are then used to transfer each electron, meaning that 2x8=16 electrons are needed to generate the two molecules of ammonia. The total reaction for this can then be written as:
N2 +8 e- + 8 H+ + 16 ATP + 16 H2O <--> 2 NH3 + H2 + 16 ADP + 16 Pi
Then, through the amino acids glutamine and glutamate, ammonium ion (NH4+)is assimilated.

Chirality

Of the 20 amino acids, humans can synthesize 11 of them. These amino acids are referred to as nonessential amino acids. The remaining 9 amino acids are referred to as essential amino acids, and they must be provided for in the diet. Synthesizing the 11 nonessential amino acids require different intermediates, but one fact remains common among them--the gycolytic pathway, the citric acid cycle, and the pentose phosphate pathway provide intermediates that their carbon skeletons come from. Also, in all these amino acids, the same step ensures the correct chirality. This step is in a transamination reaction, and a quinonoid intermediate is protonated, forming an external aldimine. The direction the proton comes from dictates the amino acid's chirality.

Regulation by Feedback

The rate of amino acid biosynthesis depends on the amount of enzymes present and the activity of those enzymes. However, there are other ways of regulating the biosynthesis of amino acids.

Feedback Inhibition

The first reaction that is irreversible in the biosynthesis of amino acids is referred to as the committed step, and the feedback loop of amino acid synthesis is a negative one, with the product inhibiting the catalyst to the committed step. This indicates that the biosynthesis of amino acids is regulated by a negative feedback loop. There are variety of different feedbacks that regulate the synthetic pathway.

Branched Pathways

Branched pathways are more complex in that they involve more sophisticated regulation. They can involve both positive and negative feedback. In other words, reactions have both feedback inhibition and feedback activation. An example of this is the enzyme threonine deaminase. This enzyme converts threonine to alpha-ketobutyrate, and valine activates this process, while isoleucine inhibits it.

Branched pathways may also involve enzyme multiplicity, a phenomenon in which multiple enzymes regulate or catalyze one single reaction. These enzymes may all have different activities and different regulatory mechanisms. Lastly, in cumulative inhibition, multiple proteins are capable of inhibiting one enzyme's activity. Even if the inhibited enzyme is saturated with one protein, other inhibiting proteins can still continue to reduce its activity. An example of this is the cumulative feedback inhibition of glutamine synthetase in E. coli.

Enzymatic cascade is another form of regulation in branched pathways. An enzymatic cascade is a reaction that requires successive steps of enzymatic catalysis after initiation. The advantages of this process is that it can amplify signals and highly increase allosteric control. This is due to the fact that requiring different enzymes basically combines multiple regulations of the enzymes, so that the process, in entirety, will have all these regulations occurring. This extends the potential for more efficient accruing of nitrogen in the cell.

So What?

Why is the biosynthesis of amino acids important? Amino acids are not only the basic building blocks to all peptides and proteins. A wide variety of biomolecules are also derived from amino acids. Examples of these include the purine and pyrimidine bases in DNA and RNA, a vasodilating protein called histamine, the hormone thyroxine, and the hormone epinephrine, to name a few. Amino acids are also a part of other compounds in the body, such as buffers, antioxidants, and enzymes. Another molecule formed from amino acids is nitric oxide (NO). Nitric oxide is derived from arginine, and serves as a messenger in signal transudction.

As amino acids are involved in the synthesis of so many proteins and compounds within the body, lack of amino acids therefore has its consequences. Various inherited disorders may occur as a result of lack of a certain amino acid, or a certain compound derived from amino acids. An example is porphyrias. Thisdisorder may be inherited or acquired during one's lifetime, and it is due to a deficiency of heme pathway enzymes.

Source: Berg, Jeremy and Stryer, Lubert. Biochemistry: Fifth Edition. United States of America: W.H. Freeman and Company, 2002.

## Introduction

The formation of amyloid fibrils, protofibrils, and oligomers from β-amyloid peptides have been very crucial for the research of the disease, Alzheimers. However, determning the structures of these peptides has been a struggle. In the past five years, there has been new data obtained about these structures through electron cryo-microscopy and NMR which has enhanced scientists' understanding of a certain mechanism, Aβ aggregation and has paved new pathways of relevance of specific conformers in terms of neurodegenerative pathologies.

## Structural diversity of β-amyloid aggregates

The β-amyloid (Aβ) peptide resides inside the human brain as a proteolytic fragment of the amyloid precursor protein, with an amphiphilic structure, possessing a hydrophilic N- and hydrophobic C terminus. The two most studied Aβ alloforms are Aβ(1-40) and aβ(1-42), where they contain 40 and 42 residues, respectively. More than 10 single-site sequence variants have been connected to similar forms of Alzheimer's disease. These alloforms are important because since Aβ amyloid fibrils form the center of amyloid plaques inside the brain parenchyma, they are correlated to Alzheimer's disease. Scientists have been trying to determine the structure of these alloforms, but they cannot be isolated or easily purified within the laboratories. Thus, there is no reliable structural information of Aβ amyloid fibrils. This provides a challenge for scientists who need this structural information to understand their biological properties.

## Cross-β structure of Aβ amyloid fibrils

Amyloid fibrils are fibrillar polypetide aggregates with a cross-β structure. In cross-β structures, the β-sheet plane ad the backbone hydrogen bonds connecting the β-strands are positioned parallel to the axis while the β-strands run perpendicular to the axis. Further study of these structures showed that these peptides hve things called steric zippers. Steric zipper are composed of a pair of two cross-β sheets with interlacing side chains. They're formed by many short peptide chins, like Aβ residues 37-42 or 35-40. Also, steric zipper's structure is similar to that of the spine of amyloid fibrils.

## General topology and polymorphism of mature amyloid fibrils

TEM (transmission electron microscopy) and atomic force microscopy have observed that mature amyloid fibrils have a length greater than 1 um, whereas previously analyzed fibrils were thought to have a length of about 25 nm. Mature Aβ amyloid fibrils have one or more protogilaments. Amyloid protofilaments create the substructures of mature fibrils, found by TEM to show that these fibrils are twisted left-handed with polarity. Studying thes structures shows that there's a structural feature of structural polymorphism of amyloid fibrils. Structural polymorphism is the variability in peptide conformation of fibrils3D reconstructions of polymorphic amyloid fibrils have revealed that fibrils differ in:

(i) number of protofilaments
(ii) different internal protofilament substructures
(iii) relative protofilament orientation

In addition to structural polymorphism (or inter-sample polymorphism), study of Aβ fibril samples with single particle techniques has shown that there is a lot of intra-sample polymorphism. Such as, an analysis of Aβ(1-40) fibrils created in 50mM sodium borate with a pH of 9 has revealed variations in the fibril width (13 to 29 nm); however, most fibrils demonstrate crossover distances of 100 to 200 nm. Thus, there is a wide range of morphologies, especially when fibrils are grown under sodium or potassium chloride (buffer systems).

## Structural deformations report on the nanoscale flexibility properties of amyloid fibrils

Structural deformation is another cause for heterogeneity of amyloid samples besides polymorphism. These deformations bend and twist themselves and although these can create more potential problems for structural analysis, they can be used to understand anoscale mechanical properties of amyloid fibrils.

## Structural methods for studying amyloid fibrils

Atomic structures of full-length Aβ fibrils have not been found because:

There have not been any fibril that creates a crystal suitable for X-ray crystallography
The fibrils are too large for NMR techniques.

However, solid-state NMR and cryo-EM have been found to possibly determin ethe structure of Aβ smyloid fibrils at atomic resolution.
Solid-state NMR can determine structural constraints like chemical shift values, bond angles, and/or specific interatomic distances. Thus, identification of residues of Aβ amyloids interconnecting with the β-sheet structure of fibrils.
Cryo-EM can visualize the structure of the fibrils and can calculate the 3D density. Thus, the observation of individual fibrils can determine specific fibril morphologies.

## Protofilament structure of mature Aβ fibrils

The protofilament substructure of an Aβ fibril has been found by cryo-EM. The protofilaments have cross-sectional dimensions of 4 x 11 nm and a cross-sectional subdivision of quasi twofold symmetry (4 x 5 nm) with two peripheral regions. Aβ(1-40) fibril contains two protofilaments and Aβ(1-42) fibril contains only one protofilament. The single-protofilament in Aβ(1-42) fibril has two equally shaped peripheral regions, fully solvent-exposed and structurally disordered. In contrast, the two-rotofilament Aβ(1040) fibril has an arch-shaped peripheral region at the protofilament-protofilament interface. The other peripheral region is the one that is solvent-exposed and structurally disordered.

## Structural comparison of Aβ(1-40) and Aβ(1-42)

The Aβ(1-40) peptide is more pathogenic than the Aβ(1-40) peptide. For example, when it is expressed in Drosophila melanogaster, the Aβ(1-40) peptide is very toxic and halves the life-span of the animal; however, Aβ(1-40) don't present a discernible phenotype.Although of this difference, their chemical properties are pretty similar (the first 40 residues are identical) which leads to similarities in their conformation proerpties. Some of the differences include the Aβ(1-40) peptide having additional two C-terminal residues and the higher aggregation propensity of Aβ(1-40).Also, Aβ(1-40) can affect aggregation mechaisisms of Aβ(1-40) and thus prevents formation of matue Aβ(1-40) fibrils.
According to cryo-EM of these two peptides, it shows differences in their protofilament packing. Aβ(1-40) fibrils have eiher a single-protofilament arrangement or a two-protofilament arrangement with a hollow core. But, all in all, the protofilaments of these two fibrils are pretty similar. For example, they can both produce the same mPL values, cross-sectional areas and shapes, and the cross-sections of the protofilaments have a similar division at the one central and two peripheral regions. Thus, they have similar peptide folding. Also, according to IR and NMR data, they both have concluded that both fibrils have a parallel β-sheet structure.

## Reference

Fandrich, Marcus, and Matthias Schmidt, and Nikolaus Grigorieff. "Recent Progress in understanding Alzheimer's β-amyloid structures ." Trends in Biochemical Sciences 36.6 (2011) 338-345. Academic Search Complete. Web. 21 Nov. 2012.

## General Information

Proteins are important organic compounds that serve as structural elements, transportation channels, signal receptors and transmitters, and catalysts; they are the most versatile macromolecules found in living organisms. Protein compositions are made up of one or more polypeptides which are composed of combinations that are derived from the 20 different amino acid subunits. These polypeptides are linear polymer chains of amino acids that are bonded together by a peptide bond that is formed between carboxyl and amino groups of adjacent amino acid residues. Each amino acid has its own size, shape, and set of properties, and proteins have 50 to 2,000 amino acids connected end-to-end in many different combinations (Structures of Life 3). Proteins can have different functionalities and roles in the body due to many different possible structures and shapes. One specific characteristic of proteins is that only the L isomers of amino acids are found in nature and used in protein. There is no evidence that explains why this happens. Proteins have many different active functional groups attached to them to help define their properties and functions. Proteins perform a number of important functions, ranging from acting as very rigid structural elements to transmitting information between cells. In addition, complex assemblies are formed due to proteins reacting with each other and with other macromolecules. Proteins fold into secondary, tertiary, and quaternary structures based on the intramolecular bonding between functional groups and can take on a variety of three-dimensional shapes depending on the amino acid sequence.

One example of a protein is collagen, a fibrous structural protein that is the most abundant protein found in animals. The structure of collagen consists of a triple helix and consists of mainly three polypeptide chains held together by hydrogen bonds, similar to that of DNA's double helix. This structure of collagen was determined using the method of X-ray crystallography. There are several important properties that enable proteins to perform a variety of crucial functions.

1. LINEAR POLYMERS: Proteins are built out of monomer units (amino acids): Based on the sequence of amino acids, proteins spontaneously fold up into three-dimensional structures.

2. CONTAIN A WIDE RANGE OF FUNCTIONAL GROUPS: Proteins contain functional groups such as alcohols, thiols, thioethers, carboxylic acids, etc. These functional groups are key to the variety of functions the protein can perform.

3. PROTEIN INTERACTION FOR COMPLEX ASSEMBLIES: Within complex assemblies, proteins act synergistically in order to achieve a specific function.

4. STRUCTURE: Proteins vary in flexibility. Rigid units of a protein can function as structural elements in the cytoskeleton of cells or in connective tissue. Protein structure Is divided into four categories and is a crucial element in the specificity of protein function.

Proteins are usually portrayed in 3D structures. They are usually categorized into 4 different characteristics and levels:

A picture of primary structure of protein.

Primary: The primary structure of a polypeptide is its amino acid sequence, from beginning to end. The primary structures of polypeptides are determined by encoding genes. Genes carry the information to make polypeptides with a defined amino acid sequence. An average polypeptide is about 300 amino acids in length, and some genes encode polypeptides that are a few thousand amino acids long.

Secondary: The amino acid sequence of a polypeptide, together with the laws of chemistry and physics, cause a polypeptide to fold into a more compact structure. Amino acids can rotate around bonds within a protein. This is the reason proteins are flexible and can fold into a number of shapes. Folding can be irregular or certain regions can give a repeating folding pattern. Such repeating patterns are called secondary structures. The two types are the α-helix and β-sheet. In an α-helix, the polypeptide backbone forms a repeating helical structure that is stabilized by hydrogen bonds. These hydrogen bonds occur at regular intervals and cause the polypeptide backbone to form a helix. In a β-sheet, regions of the polypeptide backbone come to lie parallel to each other. When these regions form hydrogen bonds, the polypeptide backbone forms a repeating zigzag shape called a β-sheet.

One type of secondary sturcture, an alpha helix.
Another type of secondary structure, a beta sheet.

Tertiary: As the secondary structure becomes established due to the primary structure, a polypeptide folds and refolds upon itself to assume a complex three-dimensional shape called the protein tertiary structure. The tertiary structure is the three-dimensional shape of a single polypeptide. For some proteins, such as ribonuclease, the tertiary structure is the final structure of a functional protein. Other proteins are composed of two or more polypeptides and adopt a quaternary structure.

Quaternary: Most functional proteins are composed of two or more polypeptides that each adopt a tertiary structure and then assemble with each other. The individual polypeptides are called protein subunits. Subunits can be identical polypeptides or can be different. When proteins consist of more than one polypeptide chain, they are said to have quaternary structure and are also known as multimeric proteins, meaning many parts. These proteins bind in a specific shape through interactions such as hydrogen bonding, salt bridges, and disulfide bonds. The two major structure categories of proteins are fibrous and globular. An example of a fibrous protein is keratin, which is found in wool, hair, myosin and actin in muscles, fur, nails, and fibrinogen for blood clotting. Examples of a globular protein include insulin, hemoglobin, and most enzymes.

A picture of Hemoglobin, one of the most well-known quaternary structure of protein.

## Factors that influence protein structure:

Several factors determine the way that polypeptides adopt their secondary, tertiary and quaternary structures. The amino acid sequences of polypeptides are the defining features that distinguish the structure of one protein from another. As polypeptides are synthesized in a cell, they fold into secondary and tertiary structures, which assemble into quaternary structures for most proteins. As mentioned, the laws of chemistry and physics, together with amino acid sequence, govern this process. Five factors are critical for protein folding and stability:

2. Ionic bonds and other polar interactions

## Protein Recognition

Protein functions such as molecular recognition and catalysis depend on their complementary binding sites. They also depend on specialized microenvironments that result from protein's tertiary structure. Such specialized microenvironments at binding sites eventually contribute to catalysis. Binding sites have a diverse distribution of charges which allow the substrates to bind.

## Protein Denaturing

As temperature is risen, a protein starts to denature.

Upon addition of heat, proteins begin to denature. Denaturation occurs in the tertiary and secondary structures. If denaturation occurs, this could lead to protein inactivity, or even cause the cell to die and no longer function.

The reason that heat is able to cause the protein to denature is because it disrupts the bonds due to the rapid vibrations that it causes in these molecules.

Heat effects the tertiary and secondary structures. The primary structure of a protein is just peptide bonds, and heat is not strong enough to break these peptide bonds, so heat doesn't have an effect on the primary structure.

## Protein Hormones

### Leptin and Insulin

Hyperphagia as well as elevated levels of insulin and leptin are found in obesity although leptin is supposed to be a feeding inhibitor while lowering insulin levels and suppressing insulin production. Leptin may not be functioning as predicted due to the correlation found that Hyperphagia may cause leptin resistance. This could be related to insulin resistance as well. Leptin is a strong modulator of biochemical pathways and metabolic fluxes which in turn causes a redistribution of glucose fluxes. Research suggests that if leptin secretion at an early time due to overeating may have a correlation with obesity and glucose intolerance. Over feeding decreases the rate of glucose infusion needed to maintain regular glucose levels. Due to this, the intake of carbohydrates was drastically altered because after 7 days of over eating the rate of glucose intake was decreased. Over feeding drastically decreased insulin’s inhibition of glucose production. Voluntary over feeding decreases the extent to which leptin affects food consumption. In an experiment with over fed rats and rat control group, this was proved by injecting leptin to both groups. The group of over fed rats had no response to the leptin therefore their food intake did not decrease but the control group was seen to have the expected outcome of leptin. In the control group, the leptin functioned as expected and inhibited food intake. The increase in body mass due to increase in food intake may be related to causing insulin resistance as well as early increase in glucose production during hyperphagia. Therefore, it is proved that the increase in food consumption plays a role in the paralysis/ decrease of the leptin system and a decreased action of insulin on carbohydrate metabolism.

## References

Matthew D. Shoulders and Ronald T. Raines. "Collagen Structure and Stability" http://www.annualreviews.org/doi/full/10.1146/annurev.biochem.77.032207.120833?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%3dpubmed "Quaternery Protein." Elmhurst College: Elmhurst, Illinois. Web. 12 Nov. 2011. <http://www.elmhurst.edu/~chm/vchembook/567quatprotein.html>. http://diabetes.diabetesjournals.org/content/50/12/2786.full.pdf+html Here is a summary for the primary structure of a protein: I. Primary Structure: 1. It is a sequence of amino acids. 2. It is a linear polymer: linking the alpha-cacboxyl group of one amino acid to the alpha amino group of another amino acid => PEPTIDE BOND (covalent bond). 3. In some proteins, the linear polypeptide chain is cross-linked: Disulfide bonds.

The primary structure is a polypeptide, in which:

     + each amino acid in the peptide is a residue.
+ there is a regularly repeating segment called the main chain or backbone,and a variable part, comprised of the side chain.


## Primary Structure

The primary structure of a protein is a linear polymer with a series of amino acids. These amino acids are connected by C-N bonds, also known as peptide bonds. The formation of peptide bonds produce water molecules as a by-product when an amino terminal residue (N-terminal) loses an oxygen from the alpha-carboxyl group while the other amino acid loses two of its hydrogens from its alpha-amino group. Thus, polypeptide, or polypeptide chain, is a term that describes the multiple connected peptide bonds between numerous amino acids. Each amino acid in a polypeptide chain is a unit, commonly known as a residue. These chains have a planar backbone, as the peptide bonds have double bond characteristics due to the existence of resonance between the carbonyl carbon and the nitrogen where the peptide bonds form. The primary structure of each protein has been precisely determined by the specific genes. The C-N bond in an amino acids chain has the character of a double bond. This bond has a short length and stable. It cannot be rotated. This double-bond character can be explained structurally, in that the R groups in amino acid chains avoid steric clash.

Amino acids are linked by peptide bonds to form polypeptide chain; each amino acid unit is known as a residue; a polypeptide chain constructed by the same unit is known as the main chain or backbone and a changing R group, side chains.

### Forces that stabilize Protein Structure

Protein structures are governed primarily by hydrophobic effects and by interactions between polar residues and other types of bonds. The hydrophobic effect is the major determination of original protein structure. The aggregation of nonpolar side chains in the interior of a protein is favored by the increase in Entropy of the water molecules that would otherwise form cages around the hydrophobic groups. Hydrophobic side chains give a good indication as to which portions of a polypeptide chain are inside, out of contact with the aqueous solvent. Hydrogen bonding is a central feature in protein structure but only make minor contributions to protein stability. Hydrogen bonds fine tune the tertiary structure by selecting the unique structure of a protein from among a relatively small number of hydrophobically stabilized conformations. Disulfide bonding can form within and between polypeptide chains as proteins fold to its native conformation. Metal ions may also function to internally cross link proteins.

### Factors that cause denaturing

1)Temperature

2) pH

Extreme temperatures will result in the unfolding of a polypeptide chain leading to a change in structure and often a loss of function. If the protein functioned as an enzyme denaturing will cause that protein to lose its enzymatic activity. As the temperature of a solution containing the protein is raised, the extra heat causes twisting and bending of bonds. As proteins begin to denature the secondary structure of the protein is lost and adopts a random coil configuration. Covalent interaction between amino acid side chains such as disulfide bonds are also lost.

At high or low pH levels the protein will denature due to the lose or gain of a proton and, therefore, will lose their charge or become charged, depending on which way the pH is changed and by how much. This will eliminate many of the ionic interactions that were necessary for maintenance of the folded shape of the protein. As a result the change in structure will cause a change or loss of function.

### Determination of Primary Structure: Amino Acid Sequencing

After the polypeptide has been purified, the composition of the polypeptide should be established. To determine which amino acid and how much of each is present, the entire strand is degreaded by amide hydrolysis (6N HCl, 1100C, 24hr) to produce a mixture of all free amino acid residues. The mixture is separated and its composition recorded by amino acid analyzer. The amino acid analyzer establishes the composition of a polypeptide by giving a chromatogram, which records the peaks of each amino acid presents in the sequence. However, the amino acid analyzer can only give the composition of a polypeptide, not the order in which the amino acids are bound to one another.

To determine the amino acid sequence, it usually starts from the determination of the amino terminal of the polypeptide. The procedure is known as Edman degradation, and the reagent employed is phenyl isothiocyanate.

Phenyl isothiocyanate

In Edman degradation, the terminal amino group adds to the isothiocyanate reagent to produce a thiourea derivative. Treating with mild acid, the tagged amino acid is turned into a phenylthiohydantoin, and the remainder of polypeptide is unchanged. Since the phenylthiohydantoins of all amino acid are known, the amino terminal of the original polypeptide can be identified easily. However, Edman degradation can only be used to identify the amino end of the polypeptides; therefore, for polypeptides that are made up by hundreds of amino acids, it is not a practical method in general. In addition, multiple degradation rounds will build up impurities which will seriously affect the yield of peptide. High yield means not completely quantitative, and with each step of degradation, incompletely reacted peptide will mix with the new peptide, resulting in a intractable mixture.

In other words, secondary structure refers to the spatial arrangement of amino acid residues that are nearby in the sequence. The alpha helix, and beta strands are elements of secondary structure.

## Secondary Structure

Secondary structures of proteins are typically very regular in their conformation. They are the spacial arrangements of primary structures. Alpha Helices and Beta Pleated Sheets are two types of regular structures. An interesting bit of information is that certain amino acids making up the polypeptide will actually prefer certain folding structures. The Alpha Helix seems to be the default but due to interactions such as sterics, certain amino acids will prefer to fold into Beta pleated sheets and so on. For example, amino acids such as Valine, Isoleucine, and Threonine all have branching at the beta carbon, this will cause steric clashes in an alpha helix arrangement. Glycine is the smallest amino acid and can fit into all structures so it does not favor the helix formation in particular. Therefore, these amino acids are mostly found where their side chains can fit nicely into the beta configuration.

The structure of polypeptide main chains is mostly of hydrogen-bonding; each residue has a carbonyl group that is a good hydrogen- bond acceptor; nitrogen- hydrogen group, a good hydrogen- bond donor.

Alpha helix look like the outside of structure. + Right hand appeared in right bottom of Rachamanda plot often

+ Left hand (LOOP): rare on the left top of Ramacha plot


### Alpha Helix

#### Structure

The general physical properties of an alpha helix are:

Alpha helix project outward in helical array
Ribbon displaying the backbone of the alpha helix
• 3.6 residues per turn
• Translation (rise) of 1.5 A
• Rotation of 100 degrees
• Pitch (or height) of 5.4A (1.5A*3.6 residues)
Alpha helix with hydrogen bonds
• Screw sense = clockwise (usually) because it would be less sterically hindered
• Inside the helix consist of the coiled backbone and the side chains project outward in helical array
• Hydrogen bonding between the 1st carbonyl to the hydrogen on the 4th amino
• The shorthand drawing of the alpha helix is a ribbon or rod
Ribbon shorthand notation for the alpha helix
Ramachandran diagram
• Alpha helix falls within quadrant 1 (left-handed helix) and 3 (right-handed helix) in the Ramachandran diagram

#### Supersecondary Structure of Alpha Helix

##### Fibrous Proteins

I. COILED-COIL (α-keratin)

An alpha coiled coil consists of two or more alpha helices intertwined, creating a stable structure. This structure provides support to tissues and cell, contributing to the cell cytoskeleton and muscle proteins such as myosin and tropomyosin. Alpha keratin consists of heptad repeats (imperfect repeats of 7 amino acid sequences). This facilitates bonding between the two or more helices.

II. COLLAGEN

Collagen is another type of fibrous protein that consists of three helical polypeptide chains. It is the most abundant protein found in mammals, making up a large component of skin, bone, tendon, cartilage, and teeth. Wrinkles are also caused by the degradations of this protein. In the structure of collagen, every third residue in the polypeptide is glycine because it is the only residue that is small enough to fit in the interior position of the superhelical cable. Unlike normal alpha helices, each collagen helix is stabilized by steric repulsion of the pyrrolidine rings of the proline and hydroxyproline residues. However, the three strands intertwined are stabilized by hydrogen bonding.

#### Alpha Tertiary

I. MOTIFS

Motifs are simple combinations of the secondary structure such as the helix-turn-helix, which consist of two helices separated by a turn. The helix-turn-helix motif are usually found in DNA-binding proteins.

II. DOMAINS (GLOBULAR)

Domains, or compact globulars, consist of multiple motifs.They are polypeptide chains folded into two or more compact regions connected by turns or loops. Their structure is spherical, which is beneficial for the protein because it conserves space. Generally, inside the globular protein consist of hydrophobic amino acids such as leucine, valine, methionine, and phenylalanine. The outside consists of amino acids with hydrophilic tendencies such as aspartate, glutamate, lysine, and arginine. An example of a globular protein is myoglobin, which is the oxygen carrier in muscle. It is an extremely compact molecule made of only alpha helices (70%) except for loops and turns (30%).

#### Transmembrane and Non-Transmembrane Hydrophobic Helix

Studying the topography of transmembrane and non-transmembrane helix have helped answer many questions about membrane protein insertion. Specifically, studying the sequence and lipid dependence of the topography provide insights into post-translational topography changes. Furthermore, studying topography has lead to the design of hydrophobic helices that have biomedical applications. For example, a tumor marker called pHLIP peptide has been designed.

Different tests have been used to show the various effects on the hydrophobic helices. For example, hydrophilic residues such as tryptophan and tyrosine destabilize the transmembrane state. The hydrophilic domains cannot cross the membrane so it blocks any transmembrane and non-transmembrane equilibration. Furthermore, charged ionized residues also destabilize the transmembrane state. Stabilization of the transmembrane is also achieved in helix-helix interaction. Moreover, anionic lipids promote membrane binding of hydrophobic peptides and proteins.

Alpha helices, beta strands, and turns are formed by a regular pattern of hydrogen bonds between the peptide N-H and C=O groups of amino acids that are NEAR ONE ANOTHER IN THE LINEAR SEQUENCE. Such folded segments are called secondary structure.

====Summary==== The alpha-helix consists of a single polypeptide chain in which the amino group (N-H) hydrogen bonds to a carboxyl group (C=O) 4 residues away. The alpha - helix is a rod-like structure. The tightly coiled backbone of the chain forms the inner part of the rod and the side chains extend outward in a helical array. This results in a clockwise coiled structure, which is known as a "right handed" screw sense. This folding pattern, along with the beta-pleated sheets were actually proposed by Linus Pauling and Robert Corey half a decade before people could actually see it. Most of the alpha strands are located in the lower left corner or upper right corner of the Ramachandran diagram . Essentially, most of the alpha helices are found in the right-hand helices area. An alpha helix is especially suited for cross-membrane proteins because all of the amino hydrogen and carbonyl oxygen atoms of the peptide backbone can interact to form intrachain hydrogen bonds while its aliphatic side chains can stabilize in hydrophobic environment of cell membrane.

Alanine, leucine and glutamic acid (existed as glutamate as physiological pH) are the most common residues present in alpha-helices.

The alpha-helix content of protein ranges widely, from none to almost 100%.

In general, the alpha helix is the "normal" shape of a polypeptide chain; however, features of certain amino acids disrupt alpha helix formation and instead favor beta strand formation. Amino acids with branching at the beta carbon (i.e. valine, threonine, and isoleucine) are problematic because they crowd the peptide backbone. H-bond accepting/donating groups attached to the beta carbon (i.e. serine, asparagine, and aspartate) can bond with backbone amine and carboxyl groups, again interfering with alpha helix formation.

While individual amino acids may favor one form or another, predicting the 2° structure of even a short (<7 amino acid) peptide strand is only 60-70% accurate. Such variability suggests other factors, like tertiary interactions with amino acids further down the chain, influence the folding into its observed 3° structure.

Beta-strand is: 1. It is around ʊ = 120 and ϕ = -120 2. You have the angle, and you form the zigzag. The zigzag have the distance between amino acids is 3.5 Angstron

### Beta Pleated Sheet

In contrast to the alpha helical structure, Beta Sheets are multiple strands of polypeptides connected to each other through hydrogen bonding in a sheet-like array. Hydrogen bonding occurs between the NH and CO groups between two different strands and not within one strand, as is the case for an alpha helical structure. Due to its often rippled or pleated appearance, this secondary structure conformation has been characterized as the beta pleated sheet. The beta strands can be arranged in a parallel, anti-parallel, or mixed (parallel and anti-parallel) manner.

Anti-parallel Beta Strand

The anti-parallel configuration is the simplest. The N and C terminals of adjacent polypeptide strands are opposite to one another, meaning the N terminal of one peptide chain is aligned with the C terminal of an adjacent chain. In the anti-parallel configuration, each amino acid is bonded linearly to an amino acid in the adjacent chain.

Parallel Beta Strand

The parallel arrangement occurs when neighboring polypeptide chains run in the same direction, meaning the N and C terminals of the peptide chains align. As a result, an amino acid cannot bond directly to the complementary amino acid in an adjacent chain as in the anti-parallel configuration. Instead, the amino group from one chain is bonded to a carbonyl group on the adjacent chain. The carbonyl group from the initial chain then hydrogen bonds to an amino group two residues ahead on the adjacent chain. The distortion of the hydrogen bonds in the parallel configuration affects the strength of the hydrogen bond because hydrogen bonds are strongest when they are planar. Therefore, due to this distortion of hydrogen bonds, parallel beta sheets are not as stable as anti-parallel beta sheet (exp: formation of parallel beta sheet with less than 5 residues is very uncommon).

The side chains of beta strands are arranged alternately on opposite sides of the strand. The distance between amino acids in a beta strand is 3.5A which is longer in comparison to the 1.5A distance in alpha strands. Because of this, beta sheets are more flexible than alpha helices and can be flat and somewhat twisted. The average length of beta sheets in a protein is 6 amino acid residues. The actual length ranges from 2 to 22 residues.

Ramachandran Plot: Beta strands are found in the purple region

Beta sheets are graphically found in the upper left quadrant of a Ramachandran plot. This corresponds to ψ angles of 0 to 180 and Φ angles of -180 to 0.

The schematic model of beta sheets

Visual representations in 3D models for beta sheets are traditionally denoted by a flat arrow pointing in the direction of the strand.

Loop is everything, but what is alpha helix and beta-strand does. It is related to SECONDARY structure of protein.

### Turn and Loop

Polypeptide chains can change direction by making reverse turns and loops. Alpha helices and beta strands are connected by these turns and loops. Most proteins have compact, globular shape owing to reversals in the direction of their polypeptide chains, which allows the polypeptide to create folds back onto itself. In many reverse turns, the CO group of residue i of a polypeptide is hydrogen bonded to the NH group of residue i+3. A turn helps to stabilize abrupt directional changes in the polypeptide chain. Loops are more elaborate chain reversal structures that are rigid and well defined. Loops and turns generally lie on the surfaces of proteins so they often participate in interactions between proteins and other molecules. In a loop, there are no regular structures as can be found in helices or beta strands.

Two hypotheses have been proposed for the role of turns in protein folding. In one view, turns play a critical role in folding by bringing together interactions between regular secondary structure elements. This view is supported by mutagenesis studies indicating a critical role for particular residues in the turns of some proteins. Also, nonnative isomers of X-Proline peptide bonds in turns can completely block the conformational folding of some proteins. In the opposing view, turns play a passive role in folding. This view is supported by the poor amino-acid conservation observed in most turns. Also, non-native isomers of many X-Pro peptide bonds in turns have little or no effect on folding.

### Beta Hairpin Turns

A motif is when secondary structure elements combine in specific geometric arrangements. Beta hairpin turns are one type of arrangement; they are one of the simplest structures and then are found in globular proteins. Upon turning, the antiparallel strand can bind effectively through hydrogen bonding between the carbonyl carbon and the peptide backbone nitrogen. It has been shown that 70% of beta-hairpins are less than seven residues long; the majority being 2 residues long. There are two types of two-residue beta hairpin turns. The first, Type I, forms a left-handed alpha-helical conformation. This left-handed conformation has a positive phi angle due to the properties of the aforementioned amino acids. Glycine does not have a side chain to sterically interfere with the turned amino acid sequence. Asparagine and aspartate both readily form hydrogen bonds with the carbonyl oxygen as a hydrogen bond acceptor. The second amino acid in the Type I turn is usually glycine due to steric hindrance that would result using any amino acid with a side chain. In a Type II beta hairpin turn, the first residue can only be glycine due to steric hindrance. However, the second residue is usually polar, such as serine or threonine.

### Fibrous proteins

Fibrous protein such as alpha-keratin and collagen consist of two right handed alpha helix intertwined to form a type of left handed super-helix called an alpha coiled coil. The two helices in this type of protein usually cross-linked by weak interaction such as Van der Waals forces force and ionic interaction. The side chain interaction can be repeat every seven residues, forming heptad repeats. Another form of fibrous protein, that of collagen, exists as three helical polypeptide chains. These chains are relatively long, ~1000 residues, and because of overcrowding, glycine appears once every three residues. While the helix is stabilized by the steric repulsions, the three strands are stabilized by hydrogen bonding. These protein usually serve structural roles in organisms, alpha-keratin is commonly found in the cytoskeleton of a cell, as well as certain muscle proteins. Collagen is often found in teeth, skin, and tendons.

### Secondary Structure Prediction

The science of predicting what polypeptide chain will conform to which secondary structure group (alpha-helix, beta-sheet/strand or turns/loops) is not particularly exact. However, various frequencies of secondary structure formation of certain amino acids have been recorded in actual scientific experimentation, and these values can allow scientists to predict the folding of a protein based on its amino acid composition with about 60-70% accuracy. Stretches of six or less residues can usually be predicted with this accuracy. Although, certain amino acids tend to fold in its preferred conformation, there are of course exceptions and so secondary structure prediction is not always accurate. Tertiary interactions, interactions with residues further apart from each other, can also determine the folding structures. Each amino acid has a preference for either secondary structure, but it normally is only a small preference towards one in comparison to another, therefore, this unfortunately does not mean much. Amino acids can appear in an alpha-helix in one protein and also in a beta-sheet in another. Due to the unpredictability of the secondary structure based on the sequence of amino acids, secondary structures are being analyzed and predicted in relations to a similar family of sequences.

Various techniques have risen throughout history in the study of secondary structural prediction. With the aid of computers, prediction has been a pursued research topic in bioinformatics and many approaches continue to be proposed. After Linus Pauling and Robert Corey discovered the periodic alpha helix and beta sheet structures within proteins in 1951, further elucidation of protein structure prediction began to grow. A major method in secondary structure prediction was the Chou-Fasman method; it yielded a 50-60% accuracy. This method based its predictions on assigning a set of prediction values to a certain amino acid residue and then applied an algorithm to that value. Shortly after, further improvements were made on this method, the GOR method, which was developed in the late 1970s and utilized information theory|entropy and information concepts for secondary structure prediction. When devised, the method was about 65% accurate, however, improvements have also been made to it. There are deductive techniques in which similar sequences are found in already identified proteins. This method is accomplished by having computer software search databases of identified proteins. Opposite of that would be the Ab initio method, which builds 3-dimensional models without looking at similar residue sequences. This method is based on hydrogen bonding principals and localization.

Other methods and factors of folding prediction include analyzing the basic chemical tendencies of the side chains of amino acids to determine its preference in secondary structure. The alpha-helix is taken as the default structure, thus amino acids that destabilize alpha-helices are often found in beta-pleated sheets or loops and turns. For instance, valine, threonine, and isoleucine will often destabilize the helix because of branching of the beta carbon. These three amino acid residues are more often found in beta-pleated sheets, where their side chains will lie in a separate plane than the main chain. There are also amino acid residues that prefer neither alpha-helices nor beta-pleated sheets, for example, Proline has a restricted phi angle of ~60 degrees and no NH group, all due to the fact that it is cyclic. This will disrupt both alpha-helices and beta-pleated sheets, thus is found mostly in loops and turns. A counter-intuitive example is glycine which, according to its small size, theoretically can fit in any structure easily, but in reality it tends to avoid alpha-helices and beta-sheets also. The folding definitely also relies on chemical interactions between the side chains so the surrounding amino group interactions also affect the tendency of folding. These tendencies are reflected in the frequencies of secondary structure for individual amino acids.

The relative tendencies of secondary structures for particular amino acids are listed below:

alpha-helix: Glu, Ala, Leu, Met, Lys, Arg, Gln, His

beta-sheet: Val, Ile, Tyr, Cys, Trp, Phe, Thr

turns and loops: Gly, Asn, Asp, Pro, Ser

#### Torsion Angles

Torsion angles are also called dihedral angles. The torsion angle is the measure in degrees in bonds between atoms. Folding of proteins are influenced by the degree of rotation amino bonds can hold. There are two different types of torsion angles existing in polypeptide bonds. Phi, φ is the angle between the α-carbon and the nitrogen atom of a peptide bond. The other bond is called psi, ψ which is the angle between the α-carbon and the carbonyl group. To measure φ, one must look from the nitrogen atom towards the α-carbon to measure if the angle is negative or positive. The angle is negative if the α-carbon rotates counterclockwise and vice versa. Furthermore, to measure ψ, one must look from the nitrogen atom towards the carbonyl group. Likewise, the angle is negative if the carbonyl group rotates counterclockwise and vice versa.

#### Ramachandran Diagram

The Ramachandran Diagram, created by Gopalasamudram Ramachandran, helps to determine if amino acids will form alpha helices, beta strands, loops or turns. The Ramachandran Diagram is separated into four quadrants, with angle ϕ as the x axis and angle ψ as the y axis. The combinations of torsion angles will put the amino acids in specific quadrants, which determine whether it will form an alpha helix, beta strand, loop, or turn. Those that fall in quadrants 1 and 3 a few times in a row form alpha helices, and those that repeat in quadrant 2 form beta strands. Quadrant 4 is generally disfavored because of steric hindrance. Also, it is mostly impossible because the different torsion angles combinations in quadrant 4 can't exist because they cause collisions between the atoms of the amino acids. If the amino acids land in the different quadrants, with no repeats, then they become loops or turns. Furthermore, the principle of steric exclusion states that two atoms cannot occupy the same place simultaneously.

MYOGLOBIN is one of example of tetriary structrue. It is oxygen carrier in muscle is a single polypeptide chain of 153 amino acids. The capacity of myoglobin to bind oxygen depends on the presence of HEME, a nonpolypeptide PROSTHETIC group consisting of protoporhyrin IX and a central iron atom. Myoglobin is an extremely compact molecule.

## Tertiary Structure

The tertiary structure of a protein is the three-dimensional structure of the protein. This three-dimensional structure is mostly determined by the amino acid sequence, which is denoted by the primary structure of the protein, however the amino acid sequence cannot entirely predict on how the three-dimensional structure is formed. Another contributing factor to the final shape of the tertiary structure is based on the environment in which the protein is synthesized. The tertiary structure is stabilized by the sequence of hydrophobic amino acid residues in the backbone of the protein. The interior consists on hydrophobic side chains while the surface consists of hydrophilic amino acids that interact with the aqueous environment.

Tertiary structure is formed by interactions between side chains of various amino acids - in particular disulfide bonds formed between two cysteine groups. At this stage, some proteins are complete, while other proteins incorporate multiple polypeptides subunits which creates the quaternary structure.

Nucleation-condensation model- The tertiary folding process is very structured with key intermediates. When a protein starts to fold, localized areas of the protein first begin folding. Then, the individual localized folds come together to complete the tertiary structure. The key concept is that when a correct fold is achieved, that fold is retained until all other parts of the protein are also correctly folded. This folding process follows reason because a random trial and error folding process would not only take much more time to complete, but also would require much more input energy.

Tertiary structure refers to the spatial arrangement of amino acid residues that are far apart in the sequence and to the pattern of disulfide bonds. Tertiary structure is also the most important protein structure that is used in determining the enzymatic activity of proteins.

### Structure

A lobster's exoskeleton is not an example of keratin (it is made of chitin, a polysaccharide).
A dog's fur is an example of keratin.

Cysteine, an amino acid containing a thiol group, is responsible for the disulfide bonds that hold a tertiary structure together. In the tertiary structure, when two helices come together, they may be linked by these disulfide bonds. A tertiary structure with fewer disulfide bonds form less rigid structures that are flexible, but still strong and can resist breakage such as hair and wool. While tertiary structures that contain more crossed disulfide bonds, formed by cysteine residues, lead to stronger, stiffer and harder structures such has exoskeletons. Others examples of protein that contain more disulfide bonds include claws, nails, and horns.

A structure made of two a-helices such as keratin can be found in living organisms. Immunoglobulin, also known as antibodies, is an example of an all beta-sheet protein fold. It consists of approximately 7 anti-parallel beta-strands arranged in 2 beta-sheets. For instance, if a cysteine is mutated to another amino acid it can code to a different protein which would lead to incorrect folding.

### Domains

Some polypeptide chains fold into several compact regions. These regions in a polypeptide chain are called domains and generally range from 30 to 400 amino acids. On average, domains contain roughly 100 amino acids. Each domain forms its own tertiary structure which contributes to the overall tertiary structure of the protein. These domains are independently stable. Stabilization is caused by metal ions or disulfide bridges that cause the folding of polypeptide chains. Different proteins may have the same domains even if the overall tertiary structure is different.

There are four types of domains:

• All-α domains - Domains made purely from α-helices.
• All-β domains - Domains made purely from β-sheets.
• α+β domains - Domains made both of α-helices and β-sheets.
• α/β domains - Domains made from both α-helices and β-sheets layered in a β,α,β fashion with a α-helix sandwiched in between 2 β-sheets.

### Mutations

In order for a protein to be functional (except in food), it must have an intact tertiary structure. If a tertiary structure of a protein is disrupted, it is said to be denatured. Once a protein is denatured, it will not be able to perform its intended or original function. A primary cause for an alteration of the tertiary structure is a mutation in the gene encoding a protein. The mutation in the gene can cause a domino effect that will lead to the degradation of the tertiary structure. Degradation can cause several diseases, one of which is called cystic fibrosis. Cystic fibrosis is brought about by a mutation of a genes called cystic fibrosis transmembrane conductance regulator (CFTR). This disease causes the exocrine glands to overproduce mucus. Most commonly, CF patients suffer from lung failure by the age of early 20-30. Diabetes insipidus, familial, hypercholesterolemia, and Osteogenesis imperfecta are also diseases that originate from degraded proteins. A mutation in the tertiary structure itself, rather than from a mutation in the nucleotide sequence can also lead to diseases. Such mutated proteins can also aggregate and become insoluble deposits called amyloids, and therefore lose the ability to function. A common mutation is when a hydrophobic R group folds in, rather than out, in a hydrophobic environment. The inherited form of Alzheimer's disease is one disease that is caused by mutated tertiary structure. Another disease is "mad cow" disease, which is caused due to a-helix (which are soluble) mutating into b-sheets (which are insoluble and cause amyloid deposits). [7]

### Folding

The folding of a protein is dependent on the amino acid sequence laid out in the primary structure. It is also dependent on the environment in which the folding occurs. In a hydrophobic environment, the hydrophobic side chains of the amino acids of the protein fold out while the hydrophilic side chains fold in and vice versa for a hydrophilic environment. An example of a protein that is folded in a hydrophobic environment is Porin. Its hydrophilic side chains are folded in which creates a channel for water to pass through. Amino acids that have nonpolar/hydrophobic side chains such as leucine, valine, methionine, phenylalanine, and isoleucine would be folded out in the folding of the protein in a hydrophobic environment. Likewise, in a hydrophilic environment, amino acids with polar side chains such as glutamine and asparagine fold outwards and the hydrophobic side chains would fold inwards.

### Determination of Tertiary Structure

The tertiary structure of a protein is determined through X-Ray Crystallography and Nuclear Magnetic Resonance (NMR) Spectroscopy. X-ray Crystallography was the first method used to determine the structure of proteins. X-ray crystallography is one of the best methods because the wavelength of an x-ray is similar to that of covalent bonds found throughout proteins, creating a clearer visualization of a molecule's structure. The scattering of x-rays by electrons is analyzed to determine the structure of proteins. In order to use x-ray crystallography, the protein in question must be in crystal form. Some proteins crystallize readily, while others do not. For those proteins that do not crystallize readily, nuclear magnetic resonance (NMR) spectroscopy must be used to determine its structure. NMR spectroscopy uses the spin of nuclei with a magnetic dipole and chemical shifts to determine a molecule’s relative position.

Hemoglobin is one of example of quaternary structure. HEMOGLOBIN: the oxygen-carrying protein in blood, consists of two subunits of one type (designated alpha) and two subunits of another (designated beta).

## Quaternary Structure

Atomic structure of the 50S Subunit from Haloarcula marismortui. Proteins are shown in blue and the two RNA strands in orange and yellow.[11] This is an example of the tertiary structure of the large unit of a ribosome

A quaternary structure refers to two or more polypeptide chains held together by intermolecular interactions to form a multi-subunit complex. The interactions that hold together these folded protein molecules include disulfide bridges, hydrogen bonding, hydrogen bonding interactions, hydrophobic interactions interactions and London forces. These forces are usually conveyed by the side chains of the peptides.

These polypeptide chains are the subunits of a protein, capable of taking part in a variety of functions such as serving as enzymatic catalysts, providing structural support in the cytoskeletons of cells, and even composing the hair on our heads.

The peptides of the protein can be identical or different. Insulin is a dimer consisting of two identical peptides, while Hemoglobin is a tetramer consisting of two identical alpha subunits and two identical beta subunits.

### Naming Quaternary Structures

In naming quaternary structures, the number of subunits (tertiary structure) and the suffix -mer (Greek for "part, subunit")are used:

• 1 subunit = Monomer
• 2 subunits = Dimer
• 3 subunits = Trimer (These are sometimes viewed as cyclic trimers. For example: aliphatic and cyanic acids)
• 4 subunits = Tetramer

The pattern continues with pent-, hex-, hept-, oct-, and so forth.

### Dimers

Computer-generated image of insulin hexamers highlighting the threefold symmetry, the zinc ions holding it together, and the histidine residues involved in zinc binding.
• Insulin
• Dimer – alpha chain and beta chain
• Linked by 2 disulfide bridges
• HIV Protease
• Dimer
• Composed of identical subunits

### Trimer

• Collagen
• Composed of 3 helical polypeptide chains
• Glycine appears at every third residue because there is no space in center of the helix
• Stabilized by steric repulsion of the pyrrolidine rings of the proline and hydroxylproline residues
• Hydrogen bonds hold together the strands of the collagen fibers

### Tetramer

Structure of human hemoglobin. The protein's α and β subunits are in red and blue, and the iron-containing heme groups in green. From PDB 1GZX Proteopedia Hemoglobin
• Hemoglobin
• Consists of 2 alpha and 2 beta groups
• Has a globular shape
• Has reverse turns that contribute to circular shape of the protein
• Aquaporin
• Made of 6 alpha helices
• Form hydrophobic loops
• Forms tetramers in the cell membrane with each monomer acting as water channels

### Breaking Apart the Quaternary Structure

The quaternary structure of a protein can be denatured by breaking the covalent and non-covalent forces that keep it together. Heat, urea or guanidinium chloride will denature a protein by disrupting the non-covalent forces, while beta-mercaptoethanol will break disulfide bridges by reducing the bridges.

### Protein Folding

A protein is never "half folded", at the point where the concentration of the denaturant is in between that of the folded and unfolded form of the protein, there are two structures that exist. Folded and Unfolded, at a ratio of 1:1

Proteins are either folded, or not. There does not exist a stage where a protein is "half-folded". This can be observed by slowly adding denaturant to a protein. This will result in a sharp transition, from the folded state to the unfolded state, suggesting there only exist these two forms. This is a result of cooperative transition.

For instance, if a protein is put in a denaturant where only one part of the protein is unstable, the entire protein will unfold. This is due to the domino effect where destabilizing one part of the protein will in turn destabilize the remainder of the structure. When a protein is in conditions which correspond to the middle of the transition between folded and unfolded, there is a 50/50 mixture of folded and unfolded protein, instead of 'half-folded' protein.

After all is said about being in one structure or the other, there must be something in between them on an atomic level. Unfortunately, this is an area that is still under development, and much research is still being done. Theories such as the condensation Nucleation Principle are concerned with this area of protein folding.

The properties of quaternary structure are: 1. Polypeptide chains can assemble into multisubunit structure. 2. It refers to the spatial arrangement of subunits and the nature of their interactions.

### Analogy

If one takes each student in a class to be a different amino acid, each right hand to be an alpha-carboxyl group, each left hand to be an alpha-amino group, and the head to be the R group; then by joining right hands to left hands, the class will form a polypeptide. The "bonds" joining the hands will be peptide bonds. This can be considered the primary structure of a protein.

If one then takes students and "attract" them to other students 4 "bonds" away, this structure will then fold into a secondary structure; namely the alpha-helix. If the students were put into lines and were attracted to respective students in another line, they would form a beta-pleated sheet.

Now imagine that the heads, or R groups, vary in areas such as personalities, or polarity, like will attract like. The people who are more compatible will then gather together, for instance, hydrophobic areas will usually gather together in the center while surrounded by hydrophilic areas. This makes up the tertiary structure.

Now add in a different class, the people from the new class would have their own tertiary structure, these new people will then come in and react with the original class to form quaternary structures.

### Human attempt to manipulate protein assemblies (Quaternary Structures)

Controlling the quaternary structures is currently catching more and more interest in academics. There are many advantages in manipulating protein assemblies. Firstly, people are able to grow/synthesize enzymes that are beneficial to human. Yet, to get these enzymes to work is the hard part. For example, nitrogenase, the enzyme that can fix nitrogen gas to yield ammonia, can only work under aerobic environment and coupled with ATP as energy source. In addition, researchers have revealed that nitrogenase is compose of two proteins, one for ATP coupling&electron source and the other is the reactive center for nitrogen fixation. The two protein assemble to work as a whole. Recently, scientists remove the ATP coupling protein and replace it with a Ruthenium complex. It turned out that Ruthenium complex can provide electrons with light exposure. Now scientists don't have to deal with the complicate chemistry of coupling ATP, but just shine lights on engineered nitrogenase to get it work! Secondly, protein assemblies can have a lot of clinical/material applications. Ferritin is a family of high-order protein assembly family, usually 12mers or 24mers. Previous researches showed it can absorb large amount of Fe ion. Many researchers are working to control the association and disassociation of Ferritins, seeking for solutions of drug delivery, gas storage, metal harvest and etc. Many approaches have been developed to control protein assembling. Following are some of them.

1. Transition metal-directed. Metal centers in protein are important, not only because they are reactive centers, but also they help stabilize the shape of protein by coordination. Many amino acids are ligands by themselves. Cysteine, Histidine, lysine are the common ones. Plus, researchers can engineer inorganic ligands onto proteins by cysteine substitution. Thus, introducing inorganic ligands much broaden the horizon of protein assemblies.

the structure of Phenanthroline (inorganic ligand).
the structure of Terpyridine (inorganic ligand).

Metal-ligand bonding has several properties. Most obviously, it is a strong interaction. It is stronger than hydrogen bond and weaker than colvant bond. Therefore metal-ligand bond is strong yet not so strong that it is still reversible. Spatially speaking, metals have its coordination orientation, mostly, octahedral and tetrahedral. This property provides human great convenience in arranging proteins spatially.

shown is the cartoon model of a dimer of two terpyridine-labeled proteins.
shown is the cartoon model of a trimer of three phenanthroline-labeled proteins.

2. Hydrophobic interaction. In aqueous environment, amino acid with hydrophobic side chains tend to aggregate together to minimize the exposure to water. Researchers utilize this character and engineer certain matching pair of non-polar amino acids onto proteins to obtain protein oligomers in water solution.

3. Salt bridges. It is well known that amino acids have different pI's. So at certain pH, some amino acids are negatively charged, some are positively charged. If an area on a protein is occupied by mostly negatively charged amino acid and another area is occupied by positively charged amino acids, proteins can aggregate by electrostatic attraction. However, this technique is usually not so selective.

More technique to direct protein assemblies are being investigated, such as coiled-coil. Human's ability to control quaternary structures is promising.

## Overview

In most archaebacteria, a protein coat is the primary structure that surrounds and shapes the cell. This coat of protein armor is composed of a paracrystalline array of “surface layer proteins.”

Half a million surface layer proteins line next to each other to form a shell that encloses the cell. Inside the shell, they bind to sugar chains on the cell surface, or in the case of archaebacteria, interact directly with the membrane. This protein coat provides protection, and it can also assist in the gathering of nutrients and attachment to targets in the environment.

## Reference

1. Kern, J. et al. Structure of surface layer homology (SLH) domains from Bacillus anthracis surface array protein. J. Biol. Chem. 286, 26041-26049 (2011)

Protein folding is a process in which a polypeptide folds into a specific, stable, functional, three-dimensional structure. It is the process by which a protein structure assumes its functional shape or conformation

Proteins are formed from long chains of amino acids; they exist in an array of different structures which often dictate their functions. Proteins follow energetically favorable pathways to form stable, orderly, structures; this is known as the proteins’ native structure. Most proteins can only perform their various functions when they are folded. The proteins’ folding pathway, or mechanism, is the typical sequence of structural changes the protein undergoes in order to reach its native structure. Protein folding takes place in a highly crowded, complex, molecular environment within the cell, and often requires the assistance of molecular chaperones, in order to avoid aggregation or misfolding. Proteins are comprised of amino acids with various types of side chains, which may be hydrophobic, hydrophilic, or electrically charged. The characteristics of these side chains affect what shape the protein will form because they will interact differently intramolecularly and with the surrounding environment, favoring certain conformations and structures over others. Scientists believe that the instructions for folding a protein are encoded in the sequence. Researchers and scientists can easily determine the sequence of a protein, but have not cracked the code that governs folding (Structures of Life 8).

## Protein Folding Theory and Experiment

Early scientists who studied proteomics and its structure speculated that proteins had templates that resulted in their native conformations. This theory resulted in a search for how proteins fold to attain their complex structure. It is now well known that under physiological conditions, proteins normally spontaneously fold into their native conformations. As a result, a protein's primary structure is valuable since it determines the three-dimensional structure of a protein. Normally, most biological structures do not have the need for external templates to help with their formation and are thus called self-assembling.

### Protein Renaturation

Protein renaturation known since the 1930s. However, it was not until 1957 when Christian Anfinsen performed an experiment on bovine pancreatic RNase A that protein renaturation was quantified. RNase A is a single chain protein consisting of 124 residues. In 8M urea solution of 2-mercaptoethanol, the RNase A is completely unfolded and has its four disulfide bonds cleaved through reduction. Through dialysis of urea and introducing the solution to O2 at pH 8, the enzymatically active protein is physically incapable of being recognized from RNase A. As a result, this experiment demonstrated that the protein spontaneously renatured.

One criteria for the renaturation of RNase A is for its four disulfide bonds to reform. The likelihood of one of the eight Cys residues from RNase A reforming a disulfide bond with its native residue compared to the other seven Cys residues is 1/7. Furthermore, the next one of remaining six Cys residues randomly forming the next disulfide bond is 1/5 and etc. As a result, the probability of RNase A reforming four native disulfide links at random is (1/7 * 1/5 * 1/3 * 1/1 = 1/105). The result of this probability demonstrates that forming the disulfide bonds from RNase A is not a random activity.

When RNase A is reoxidized utilizing 8M urea, allowing the disulfide bonds to reform when the polypeptide chain is a random coil, then RNase A will only be around 1 percent enzymatically active after urea is removed. However, by using 2-mercaptoethanol, the protein can be made fully active once again when disulfide bond interchange reactions occur and the protein is back to its native state. The native state of the RNase A is thermodynamically stable under physiological conditions, especially since a more stable protein that is more stable than that of the native state requires a larger activation barrier, and is kinetically inaccessible.

By using the enzyme protein disulfide isomerase (PDI), the time it takes for randomized RNase A is minimized to about 2 minutes. This enzyme helps facilitate the disulfide interchange reactions. In order for PDI to be active, its two active site Cys residues needs to be in the -SH form. Furthermore, PDI helps with random cleavage and the reformation of the disulfide bonds of the protein as it attain thermodynamically favorable conformations.

### Posttranslationally Modified Proteins Might Not Renature

Proteins in a "scrambled" state go through PDI to renature, and their native state does not utilize PDI because native proteins are in their stable conformations. However, proteins that are posttranslationally modified need the disulfide bonds to stabilize their rather unstable native form. One example of this is insulin, a polypeptide hormone. This 51 residue polypeptide has two disulfide bonds that is inactivated by PDI. The following link is an image showing insulin with its two disulfide bonds. Through observation of this phenomenon, scientists were able to find that insulin is made from proinsulin, an 84-residue single chain. This link provides more information on the structure of proinsulin and its progression on becoming insulin. The disulfide bonds of proinsulin need to be intact before conversion of becoming insulin through proteolytic excision of its C chain which is an internal 33-residue segment. However, according to two findings, the C chain is not what dictates the folding of the A and B chains, but instead holds them together to allow formation of the disulfide bonds. For one, with the right renaturing conditions in place, scrambled insulin can become its native form with a 30% yield. This yield can be increased if the A and B chains are cross-linked. Secondly, through analysis of sequences of proinsulin from many species, mutations are permitted at the C chain eight times more than if it were for A and B chains.

## Determinants of Protein Folding

There are various interactions that help stabilize structures of native proteins. Specifically, it is important to examine how the interactions that form protein structures are organized. In addition, there are only a small amount of possible polypeptide sequences that allow for a stable conformation. Therefore, it is evident that specific sequences are used through evolution in biological systems.

## Helices and Sheets Predominate in Proteins because They Efficiently Fill Space

On average, about sixty percent of proteins contain a high amount of alpha helices, and beta pleated sheets. Through hydrophobic interactions, the protein is able to achieve compact nonpolar cores, but they lack the ability to specify which polypeptides to restrict in particular conformations. As seen in polypeptide segments in the coil form, the amount of hydrogen boding is not lesser than that of alpha helices and beta pleated sheets. This observation demonstrates that the different kinds of conformations of polypeptides are not limited by hydrogen bonding requirements. Ken Dill has suggested that helices and sheets occur as a result of the steric hindrance in condensed polymers. Through experimentation and simulation of conformations with simple flexible chains, it can be determined that the proportion of beta pleated sheets and alpha helices increase as the level of complication of chains is increased. Therefore, it can be concluded that helices and sheets are important in the complex structure of a protein, as they are compact in protein folding. The coupling of different forces such as hydrogen bonding, ion pairing, and van der Waals interactions further aids in the formation of alpha helices and beta sheets.

## Protein Folding is Directed by Internal Residues

By investigating protein modification, the role of different classes of amino acid residues in protein folding can be determined. For example, in a particular study the free primary amino groups of RNase A were derivatized with poly-DL-alanine which consist of 8 residue chains. The poly-Ala chains are large in size and are water-soluble, thus allowing the RNase's 11 free amino groups to be joined without interference of the native structure of the protein or its ability to refold. As a result, it can be concluded that the protein's internal residues facilitates its native conformation because the RNase A free amino groups are localized on the exterior. Furthermore, studies have shown that mutations that occur on the surface of residues are common, and less likely to change the protein conformation compared to changes of internal residues that occur. This finding suggests that protein folding is mainly due to the hydrophobic forces.

## Protein Structures Are Hierarchically Organized

George Rose demonstrated that protein domains consisted of subdomains, and furthermore have sub-subdomains, and etc. As a result, it is evident that large proteins have domains that are continuous, compact, and physically separable. When a polypeptide segment within a native protein is visualized as a string with many tangles, a plane can be seen when the string is cut into two segments. This process can be repeated when n/2 residues of an n-residue domain is highlighted with a blue and red color. As this process is repeated it can be seen that at all stages, the red and blue areas of the protein do not interpenetrate with one another. The following link shows an X-ray structure of HiPIP (high potential iron protein) and its first n/2 residues on the n-residue protein colored red and blue. Furthermore, the subsequent structures shown in the second and third row show this process of n/2 residue splitting reiterated as shown where the left side of the protein has its first and last halves with red and blue while the rest of the chain colored in gray. Through this example, it is clearly seen that protein structures are organized in a hierarchical way, meaning that the polypeptide chains are seen as sub-domains that are themselves compact structures and interact with adjacent structures. These interactions forms a larger well organized structure largely due to hydrogen bonding interactions and has an important role in understanding how polypeptides fold to form their native structure.

Since the side chains inside globular proteins fit together with much complementary its packing density can be almost like that of organic crystals. As a result, in order to confirm whether or not this phenomenon of high packing density was an important factor in contributing to protein structure, Eaton Lattman along with George Rose attempted to verify if there was an interaction between side chains that was preferred in a globular protein. They analyzed a total of 67 well studied structures of globular proteins, and concluded that there were no preferred interactions. This experiment demonstrated that packing is not what directs the native fold, but instead the native fold is necessary for packing of a globular protein. This notion can be further supported as members of a protein family result in the same fold despite their lack of sequence similarity and distant relationships.

In addition, structural experimental data have shown that there are a variety of ways that a protein's internal residues can become compact together in an efficient manner. In an extensive study done by Brian Matthews based on T4 lysozyme, which is produced by bacteriophage T4, it was found that changes in the residues of the T4 lysozyme only affected local shifts and did not result in any global structure change. The following link gives an X-ray view of T4 lysozyme and a brief biochemical description of the structure. Matthews took over 300 different mutants of the 164 residue T4 lysozyme, and compared them with one another. Also, it was observed that the T4 lysozyme could withstand insertions of about 4 residues while still not having any major structural changes to the overall protein structure nor enzyme activity. Furthermore, by using assay techniques it was demonstrated that only 173 of the mutants in T4 of the 2015 single residue substitutions done had significant amounts of enzymatic activity diminished. Through these experiments, it is evident that protein structures are extremely withstanding.

Levinthal's paradox is a thought experiment, also constituting a self-reference in the theory of protein folding. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. An estimate of 3300 or 10143 was made in one of his papers.

The Levinthal paradox observes that if a protein were folded by sequentially sampling of all possible conformations, it would take an large amount of time to do so, even if the conformations were sampled at a rapid rate . Based upon the observation that proteins fold much faster than this, Levinthal then proposed that a random conformational search does not occur, and the protein must, therefore, fold through a series of meta-stable intermediate states.

In 1969 Cyrus Levinthal calculated that if a protein were to randomly sample every possible conformation as it folded from the unfolded state to the native state it would take an astronomical amount of time, even if the protein reached 100 billion conformations in one second. Observing that proteins fold in a relatively short amount of time, Levinthal proposed that proteins fold in a fixed and directed process. We now know that while protein folding is not a random process there does not seem to be a single fixed protein folding pathway.This observation came to be known as the Levinthal paradox. This paradox clearly reveals that proteins do not fold by trying every possible conformation. Instead, they must follow at least a partly defined folding pathway made up of intermediates between the fully denatured proteins and its native structure.

## Cumulative Selection

The way out of the Levinthal Paradox is to recognize cumulative selection. According to Richard Dawkins, he asked how long it would take a monkey poking randomly at a typewriter to reproduce "Methinks it is like a weasel", Hamlet's remark to Polonius. A large number of keystrokes, of the order of 1040 would be required. Yet if we suppose that each correct character was preserved, allowing the monkey to retype only the wrong ones, only a few thousand keystrokes, on average, would be needed. The crucial difference between these scenarios is that the first utilizes a completely random search whereas in the second case, partly correct intermediates are retained. This also reveals that the essence of protein folding is the tendency to retain partly correct intermediates, although the protein-folding problem is much more difficult than the one presented to Shakespeare example above.

## Nucleation-Condensation model

In order to correctly understand the protein-folding problem, we must consider certain characteristics of protein. Since proteins are only marginally stable, the free-energy difference between the folded and the unfolded states of a typical 1000-residue protein is 42 kJ mol−1 and thus each residue contributes on average only 0.42 kJ mol−1 of energy to maintain the folded state. This amount is less than the amount of thermal energy, which is 2.5 kJ mol−1 at room temperature. This meagre stabilization energy means that correct intermediates, especially those formed early in folding, can be lost. The interactions that lead to cooperative folding, nonetheless, can stabilize intermediates as structure builds up. Thus, local regions that have significant structural preference, though not necessarily stable on their own, will tend to adopt their favored structures and, as they form, can interact with one other, resulting in increased stabilization. Nucleation-condensation model refers to this conceptual framework in solving the protein-folding challenge.

## Intramolecular Interactions Role in the Folding Mechanism

Proteins folding forms energetically favorable structures stabilized by hydrophobic interactions clumping, hydrogen bonding and Van der Waals forces between amino acids. Protein folding first forms secondary structures, such as alpha helices, beta sheets, and loops. Different amino acids have different tendencies for whether they are going to form Alpha Helices, Beta sheets, or Beta Turns based upon polarity of the amino acid and rotational barriers. For example, the amino acids, valine, threonine, isoleucine, tend to destabilize the alpha helices due to steric hindrance. Thus, they prefer conformational shifts towards Beta sheets rather than alpha helices. The relative frequencies of the amino acids in secondary structures are grouped according to their preferences for alpha helices, beta sheets or turns (Table 1). Table 1: Relative frequencies of amino acid residues in secondary structures These structures in turn, fold to form tertiary structures, stabilized by the formation of intramolecular hydrogen bonds. Covalent bonding may also occur during the folding to a tertiary structure, through the formation of disulfide bridges or metal clusters. According to Robert Pain’s “Mechanisms of Protein Folding”, molecules also often pass through an intermediate “molten globule” state formed from a hydrophobic collapse (in which all hydrophobic side-chains suddenly slide inside the protein or clump together) before reaching their native confirmation. However, this means all the main chain NH and CO groups are buried in a non-polar environment, but they prefer an aqueous one, so secondary structures must fit together very well, so that the stabilization through hydrogen bonding and Van der Waals forces interactions overrides their hydrophilic tendencies. The strengths of hydrogen bonds in a protein vary depending on their position in the structure; H-bonds formed in the hydrophobic core contribute more to the stability of the native state than H-bonds exposed to the aqueous environment.

Water-soluble proteins fold into compact structures with non-polar, hydrophobic cores. The inside of protein contains non-polar residues in center (i.e. - leucine, valine, methionine and phenylalanine), while the outside contains primarily polar, charged residues (i.e. - aspatate, glutamate, lysine and arginine). This way the polar, charged molecules can interact with the surrounding water molecules while the hydrophobic molecules are protected from the aqueous surroundings. Minimizing the number of hydrophobic side chains on the outer part of the structure makes the protein structure thermodynamically more favorable because the hydrophobic molecules prefer to be clumped together, when surrounded by an aqueous environment (i.e. – hydrophobic effect). Proteins that span biological membranes (i.e. - porin) have an inside out distribution, with respect to the water-soluble native structure, they have hydrophobic residue covered outer surfaces, with water filled centers lined with charged and polar amino acids.

## Folding of Membrane Proteins

In “Folding Scene Investigation: Membrane Proteins”, a paper written by Paula J Booth and Paul Curnow, the authors attempt to answer how the folding mechanisms of integral membrane proteins with α helical structures work. Studying the folding of membrane proteins has always been difficult as these proteins are generally large and made of more than one subunit. The proteins posses a high degree of conformational flexibility—which is necessary for them to perform their function in the cell. Also, these proteins have both hydrophobic surfaces, facing the membrane, and hydrophilic surfaces, facing the aqueous regions on either side of the membrane. The proteins are move laterally and share the elastic properties of the lipid bilayer in which they are embedded. In order to study these proteins, Booth and Curnow believe that one must manipulate the lipid bilayer and combine kinetic and thermodynamic methods of investigation.

Reversible Folding and Linear Free Energy The free energy of protein folding is measured by reversible chemical denaturation. The reversible folding of a protein depends on this free energy. For the α helix proteins that were being studied, it was proven that a reversible, two-state process is followed. bR (a α helical membrane protein called bacteriorhodopsin) reversibly unfolds if SDS (a denaturant which is an anionic detergent) is added to mixed lipid, detergent micells. The two-state reaction involves a partly unfolded SDS state and a folded bR state. By comparing the logs of the unfolding and folding rate, and the SDS mole fraction, a linear plot was generated proving a linear relationship. This plot proved that bR had a very high stability outside of its membrane—proving that it was unexpectedly stable. Furthermore, bR was so stable outside of the membrane that it would not unfold during a reasonable period of time without addition of denaturant.

Comparison with Water-Soluble Proteins Booth and Curnow studied the 3 membrane proteins about which the most information is held: bR, DGK (Escherichia coli diacylglycerol kinase) and KcsA (Sterptococcus lividans potassium channel). These three membrane proteins were compared to water-soluble proteins (which fold by 2 or 3 state kinetics). The overall free energy change of unfolding in the absence of denaturant was the same for water-soluble proteins and membrane proteins of similar size. This proves that it is the balance of weak forces rather than the types of forces that stabilize the protein that determines its stability. It was proven that H-bonds in the membrane proteins were of similar strength to those of the water-soluble proteins, rather than being stronger in membrane proteins as was expected.

Mechanical Strength and Unfolding Under Applied Force Dynamic force microscopy can be used to measure the mechanical response of a particular region of a protein under applied force. The unfolding force in this case depends on the activation barrier. This unfolding has nothing to do with the thermodynamic stability of a protein. For unfolding under applied force, the membrane proteins (especially bR) seem to follow the rules of Hammond behavior. The energy difference between two consecutive states of this reaction is reduced and the states become similar in structure.

Influence of Surrounding Membrane Membrane proteins are influenced greatly by the membranes they are surrounded by. If the lipids incorporate in detergent micells—-increasing the stability of the lipid structure—both the protein and its folding are stabilized. Different combinations of different lipids can result in different stabilities or folding of membrane proteins. The size of the membrane can also affect the membrane protein. Different types of lipids cause different membrane properties. A type of lipids called PE lipids have higher spontaneous curvatures than a second type of lipid called a PC lipid. By adding PE lipids to PC lipids the monolayer curvature of the bilayer increases. Increasing the curvature of the lipid bilayer increases the stability of the protein folding.

## Protein translocation in biological membranes

In mitochondria, the proteins that are made from the ribosomes are directly take in from the cytosol. Mitochondrial proteins are first completely synthesized in the cytosol as mitochondrial precursor proteins, then taken up into the membrane. The Mitochondrial proteins contain specific signal sequence at their N terminus. These signal sequences are often removed after entering the membrane but proteins entering membranes that has outer, inner, inter membrane have internal sequences that play a major movement in the translocation within the inner membrane.

Protein translocation plays a major role in translocating proteins across the mitochondrial membranes. Four major multi-subunit protein complexes are found in the outer and the inner membrane. TOM complexes are found in the outer membrane, and two types of TIM complexes are found integrated within the inner membrane: TIM23 and TIM22. The complexes act as receptors for the mitochondrial precursor proteins.

TOM: imports all nucleus encoded proteins. It primarily starts the transport of the signal sequence into the inter membrane space and inserts the transmembrane proteins into outer membrane space. A Beta barrel complex called the SAM complex is then in charge of properly folding the protein in the outer membrane. TIM23 found in the inner membrane moderates the insertion of soluble proteins into the matrix, and facilitates the insertion of transmembrane proteins into the inner membrane. TIM23, another inner membrane complex facilitates the insertion inner membrane proteins comprised of transporters that move ADP, ATP, and phosphate across the mitochondrial membranes. OXA, yet another inner membrane complex, helps insert inner membrane proteins that were synthesized from the mitochondria itself and the insertion of inner membrane proteins that were first transported into the matrix space. File:Translocation.jpg

## Folding on Ribosome

The place where the protein chain begins to fold is a topic that is greatly studied. As the nascent chain goes through the “exit tunnel” of the ribosome and into the cellular environment, when does the chain begin to fold? The idea of cotranslational folding in the ribosomal tunnel will be discussed. The nascent chain of the protein is bound to the peptidyl transferase centre (PTC) at its C terminus and will emerge in a vectorial manner. The tunnel is very narrow and enforces a certain rigidity on the nascent chain, with the addition of each amino acid the conformational space of the protein increases. Co translational folding can be a big help in reducing the possible conformational space by helping the protein to acquire a significant level of native state while still in the ribosomal tunnel. The length of the protein can also give a good estimate of its three dimensional structure. Smaller chains tend to favor beta sheets while longer chains (like those reaching 119 out of 153 residues) tend to favor the alpha helix.

The ribosomal tunnel is more than 80 Å in length and its width is around 10-20 Å. Inside the tunnel are auxiliary molecules like the L23, L22, and L4 proteins that interact with the nascent chain help with the folding. The tunnel also has hydrophilic character and helps the nascent chain to travel through it without being hindered. Although rigid, the tunnel is not passive conduit but whether or not it has the ability to promote protein folding is unknown. A recent experiment involving cryoEM has shown that there are folding zones in the tunnel. At the exit port (some 80 Å from the PTC), the nascent chain has assumed a preferred low order conformation. This enforces the suggestion that the chain can have degrees of folding at certain regions. Although some low order folding can occur, the adoption of the native state occurs outside the tunnel, but not necessarily when the nascent chain has been released. The bound nascent chain (RNC) adopts partially folded structure and in a crowded cellular environment, this can cause the chain to self-associate. This self-association, however, is relieved with the staggered ribosomes lined along the exit tunnel that maximizes the distances between the RNC.

Generation of RNC for studies:

One technique of generating RNC and taking snapshots as it emerges from the tunnel is to arrest translation. A truncated DNA without a termination sequence is used. This allows for the nascent chain to remain bound until desired. To determining the residues of the chain, they can be labeled by carbon-13 or nitrogen-15 and later detected by NMR spectroscopy. Another technique is the PURE method and it contains the minimal components required for translation. This method has been used to study the interaction of the chains and auxiliary molecules like the TF chaperone. This method is coupled with quartz-crystal microbalance technique to analyze the synthesis by mass. An in vivo technique in generating RNC chain can be done by stimulating it in a high cell density. This is initially done in an unlabeled environment, the cells are then transferred to a labeled medium. The RNC is generated by SecM. The RNC is purified by affinity chromatography and detected by SDS-PAGE or immunoblotting.

By generating the RNCs, many experiments can be done to study more about the emerging nascent chain. As mentioned above, the chain emerges from the exit tunnel in a vectorial manner. This enables the chain to sample the native folding and increases the probability of folding to the native state. Along with this vectorial folding, chaperones also help in favorable folding rates and correct folding.

## Protein Folding in the Endoplasmic Reticulum

Protein Entering the Mammalian ER: The endoplasmic reticulum (ER) is a main checkpoint for protein maturation to ensure that only correctly folded proteins are secreted and delivered to the site of action. The protein entrance to the ER begins with recognition of a N’ terminus signal sequence. Specially, this sequence is detected by a signal recognition protein (SRP) causing the ribosome/nascent chain/SRP complex bind to the ER membrane. Then, the complex travels through a proteinaceous pore called Sec61 translocon which allows the polypeptide chain enter the lumen portion of the ER.

Processes in Conflict During Protein Folding: After the protein enters the ER, the proteins break up into an ensemble of folding intermediates. These intermediates take three different routes. They are either folded properly and sent to be exported out of the endoplasmic reticulum (ER) into the cytosol, aggregated or picked out for degradation. These three processes are in competition to properly secrete a protein. In order for a protein to be properly secreted, the competition between folding, aggregation and degradation must be in favor of folding, so that folding occurs faster than the other processes. This balance is termed proteostasis. The balance of proteostasis can be tipped in favor of folding by either using smaller molecules to stabilize the protein (called co-factors) or increasing the concentrations of folding factors. This ability to control proteostasis allows scientists the power to overcome some of the protein folding diseases such as cystic fibrosis.

The proteins that are folded properly are ready for anterograde transport, and secreted through the membrane of the ER into the cytosol by a cargo receptor that recognizes the properly folded protein. The proteins that are incorrectly folded are not secreted and are either targeted for degradation or aggregated. The aggregated proteins are able to re-enter the stage of protein ensembles ready to be folded so that they may try again at being folded properly.

Protein Folding in ER

Folding Factors in the Endoplasmic Reticulum:

Biochemical research on folding pathways has provided a comprehensive list of folding factors, or chaperones, involved with protein folding in the ER. Folding factors are categorized based on whether they catalyze certain steps or if they interact with intermediates in the folding pathway. General protein folding factors are typically separated into four different groups: heat shock proteins as chaperones or cochaperones, peptidyl prolyl cis/trans isomerases (PPIases), oxidoreductases, and glycan-binding proteins.

Many folding factors are great in that they are multi-functional. One folding factor can take care of different areas of the folding pathway. Unfortunately, this leads to redundancy due to different classes of proteins carrying out overlapping functions. This functional redundancy complicates the understanding of the specific roles of individual folding factors in aiding maturation of client proteins. Folding factors also prefer to act in concert during the maturation process, which further obscures the individual roles of each factor. Since these roles are not clear, it is difficult to confirm that even if one folding factor deals with a particular reaction in one protein, that same folding factor will carry out the same function in another.

In addition to aiding non-covalent folding and unfolding of proteins, folding factors in the ER sometimes delay interactions with the protein. This allows time for nascent proteins to fold properly and enables folded proteins to backtrack on its folding pathway, which prolongs equilibrium in a less folded state, preventing the protein from being held in a non-native state.

Folding after Endoplasmic Reticulum: Although ER provides only correctly assembled proteins to be secreted, some examples exist in which proteins change conformation in the Golgi bodies and beyond. Typically, newly folded proteins are sensitive and prone to unfolding while in the ER but resistant to unfolding after exit. In an environment without chaperones and other folding enzymes, proteins are compact and relatively resistant to change after exiting the ER. However, this doesn’t necessarily mean that protein folding ends because some molecular chaperones like Hsp 70s and Hsp 90s continue to assist in protein conformation throughout the protein’s existence.

Folding Factors in the ER and their Functions

## Techniques for Studying Protein Folding

A strategy for studying the folding of proteins is to unfold the protein molecules in high concentrations of a chemical denaturant like guanidinium chloride. The solution is then diluted rapidly until the denaturant concentration is lowered to a level where the native state is thermodynamically stable again. Afterwards, the structural changes of the protein folds may be observed. In theory, this sounds simple. However, such experiments are complex, since unfolded proteins have random coil states in chemical denaturants. Moreover, analyzing the structural changes taking place in a sample may is difficult, since all of the molecules may have significantly different conformations until the final stages of a reaction. As such, the analysis would have to be performed in a matter of seconds rather than days or weeks that are normally allowed to deduce the structure of a single conformation of a native protein. To avoid this problem, the disulphide bonds can be reduced after the protein is unfolded and reformed under oxidative conditions. The protein can then be identified by standard techniques such as mass spectroscopy to draw conclusions about the structure present at stages of folding where disulfide bonds are formed.

Multiple techniques are used to monitor structural changes during the refolding. For instance, in circular dichorism, UV is used from far away to provide a measurement of the appearance of the secondary structure during folding. UV at a close distance monitors the formation of the close-packed environment for aromatic residues. NMR is also a useful technique for characterizing conformations at the level of individual amino-acid residues. It can also be used to monitor how the development of structures protect amide hydrogens from solvent exchanges.

Circular Dichroism: This type of spectroscopy measures the absorption of circularly polarized light since the structures of protein such as the alpha helix and beta sheets are chiral and can absorb this sort of light. The absorption of light indicates the degree of the protein’s foldedness. This technique also measures equilibrium unfolding of protein by measuring change of absorption against denaturant concentration or temperature. The denaturant melt measures the free energy of unfolding while the temperature melt measures the melting point of proteins. This technique is the most general and basic strategy for studying protein folding.

Dual Polarization Interferometry: This technique uses an evanescent wave of a laser beam confined to a waveguide to probe protein layers that have been absorbed to the surface of the waveguide. Laser light is focused on two waveguides, one that senses the beam and has an exposed surface, and one that is used to create a reference beam and to excite the polarization modes of the waveguides. The measurement of the interferogram can help calculate the protein density or fold, the size of the absorbed layer, and to infer structural information about molecular interactions at the subatomic resolution. A two-dimensional pattern is obtained in the far field when the light that has passed through the two waveguides is combined.

Mass Spectrometry: The advantages of using Mass Spectroscopy to study protein folding include the ability to detect molecules with different amounts of deuterium, which allows the heterogeneity of the protein folding reactions to be studied. It can also measure the conformation of folding intermediates bound to molecular chaperones without disrupting the complex. Mass spectrometry can also directly compare refolding properties, since mixtures of proteins can be studied without separation if the two proteins have sufficiently different molecular weights.

High Time Resolution: These are fast time-resolved techniques where a sample of unfolded protein is triggered to fold rapidly. The resulting dynamics are then studied. Ways to accomplish this include fast mixing of solutions, photochemical methods, and laser temperature jump spectroscopy.

Computational Prediction of Protein Tertiary Structure: This is a distinct form of protein structure analysis in that it involves protein folding. These programs can simulate the lengthy folding processes, provide information on statistical potential, and reproduce folding pathways.

## Protein Misfolding

Protein misfolding refers to the failure of a protein to achieve its tightly packed native conformation efficiently or the failure to maintain that conformation due to reduction in stability as a result of environmental change or mutation. It has been established that failure of protein folding is a general phenomenon at elevated temperatures and under other stressful circumstances. The two most common results of misfolded proteins are degradation and aggregation. When a polypeptide emerges from the cell, it may fold to the native state, degraded by proteolysis, or form aggregates with other molecules. Proteins are in constant dynamic equilibrium so even if the folding process is complete, unfolding in the cellular environment can occur. Unfolded proteins usually refold back into their native states but if control processes fail, misfolding leads to cellular malfunctioning and consequently diseases. Diseases associated with misfolding cover a wide array of pathological conditions such as cystic fibrosis where mutations in the gene encoding the results in a folding to a conformer whose secretion is prevented by quality-control mechanisms in the cell. About 50% of cancers are associated with mutations of the p53 protein that eventually lead to the loss of cell-cycle control and causing the growth of tumors. Failure of proteins to stay folded can result in aggregation, a common characteristic of a group of genetic, sporadic, and infectious conditions known as amyloidoses. Aggregation usually results in disordered species that can be degraded within the organism but it may also result in highly insoluble fibrils that accumulate in tissue. There are about twenty known diseases resulting from the formation of amyloid material including Alzheimer’s, Type II diabetes, and Parkinson’s disease. Amyloid fibrils are ordered protein aggregates that have an extensive beta sheet structure due to intermolecular hydrogen bonds and have an overall similar appearance to the proteins they are derived from. The formation of the amyloid fibrils are the result of prolonged exposure to at least partially denatured conditions.

File:Imagealzheimers.jpg
An abnormal amount plaques and tangles can kill surrounding neurons.

Alzheimer's: This neurological degeneration is caused by the accumulation of Plaques and Tangles in the nerve cells of the brain.[2] Plaques, composed of almost entirely a single protein, are aggregation of the protein beta-amyloid between the spaces of the nerve cells and Tangles are aggregation of the protein tau inside the nerve cells. Tangles are common in extensive nerve cell diseases whereas neuritic plaque is more specific to Alzheimer's. Although scientists are unsure what role Plaques and Tangles play in the formation of Alzheimer's, one theory is that these accumulated proteins impede the nerve cell's ability to communicate with each other and makes it difficult for them to survive. Studies have shown that Plaques and Tangles naturally occur as people age, but more formation is observed in people with Alzheimer's. The reasons for this increase is still unknown.

Creutzfeldt-Jakob Disease (Mad Cow Disease): This disease is caused by abnormal proteins called prions which eat away and form hole-like lesions in the brain. Prions (proteinaceous infectious virion) were discovered to be proteins with an altered conformation. Scientists hypothesize that these infectious agents could bind to other similar proteins and induce a change in their conformation as well, propagating new, infectious proteins.[3] Prions are highly resistant to heat, ultraviolet light, and radiation which makes them difficult to be eliminated. In Creutzfeldt-Jakob Disease there is an incubation period for years which is then followed by rapid progression of depression, difficulty walking, dementia and death. Currently there is no effective treatment for prion diseases and all are fatal.[4]

Parkinson's disease:A mutation in the gene which codes for alpha-synuclein is the cause of some rare cases of familial forms of Parkinson's disease. Three point mutations have been identified thus far: A53T, A30P and E46K. Also, duplication and triplication of the gene may be the cause of other lineages of Parkinson's disease.Victims of Parkinson's disease have primary symptoms that result from decreased stimulation of the motor cortex by the basal ganglia, normally caused by the insufficient formation and action of dopamine. Dopamines are produced in the dopaminergic neurons of the brain. People who suffer from this disease have brain cell loss (death of dopaminergic neurons), which may be caused by abnormal accumulation of the protein alpha-synucleinbinding to ubiquitin in the damaged cells. This makes the alpha-synuclein-ubiquitin complex unable to be directed to the proteosome. New research shows that the mistransportation of proteins between endoplasmic reticulum and the Golgi apparatus might be the cause of losing dopaminergic neurons by alpha-synuclein.

Cystic Fibrosis: Francis Collins first identified the hereditary genetic mutation in 1989. The problem occurs in the regulator cystic fibrosis transmembrane conductance regulator (CFTR), which regulates salt levels and prevents bacterial growth, when the dissociation of CFTR is disturbed as a protein regulating the chloride ion transport across the cell membrane.[5] The deleted amino acid doesn't allow bacteria in the lungs to be killed thereby causing chronic lung infections eventually leading to an early death.[6] Scientists have used nuclear magnetic resonance spectroscopy (NMR) to study Cystic Fibrosis and its effects.

Normal and sickle-shaped red blood cells.

Sickle Cell Anemia: Sickle-shaped red blood cells cling to walls in narrow blood vessels obstructing the flow of blood define sickle cell anemia. The shortage of red blood cells in the blood stream in addition to the lack of oxygen-carrying blood causes serious medical problems. The defect in the Hemoglobin gene is detected with the presence of two defective inherited genes. The sickle cell shape is formed as hemoglobin give up their oxygen resulting in stiff red blood cells forming rod-like structures. Some symptoms include: fatigue, shortness of breath, pain to any joint or body organ lasting for varying amounts of time, eye problems potentially leading to blindness, and yellowing of the skin and eyes which is due to the rapid breakdown of red blood cells. Luckily, sickle cell anemia can be detected by a simple blood test via hemoglobin electrophoresis. Even though there is no cure, blood transfusions, oral antibiotics, and hydroxyurea are treatments that reduce pain caused.[7]

Huntington's Disease: Also known as the trinucleotide repeat disorder, Huntington's disease results from glutamine repeats in the Huntingtin protein. Roughly 40 or more copies of C-A-G (glutamine) will result in Huntington's disease as the normal amount is between 10 and 35 copies. During the post-translational modification of mutated Huntingtin protein(mHTT), small fractions of polyglutamine expansions misfold to form inclusion bodies. Inclusion bodies are toxic for brain cell. This alteration of the Huntingtin protein does not have a definite effect except that it affects nerve cell function.[8] This incurable disease affects muscle coordination and some cognitive functions.

Cataract in human eye

Cataracts: Eye lens are made up of proteins called crystallins. Crystallins have a jelly-like texture in a lens cytoplasm. The current leading cause of blindness in the world, cataracts occurs when crystallin molecules form aggregates scattering visible light causing the lens of the eye to become cloudy. UV light and oxidizing agents are thought to contribute to cataracts as they may chemically modify crystallins. In children, it has been observed that the deletion or mutation of αB-crystallin facilitates cataracts formation. The likelihood of developing cataracts exponentially increases with age. Pain, Roger H. (2000). Mechanisms of Protein Folding. Oxford University Press. pp. 420–421. ISBN 019963788. Retrieved 2009-10-18.

### Amyloid Fibrils

Protein misfolding caused by impairment in folding efficiency leads to a reduction in number of the proteins available to conduct its normal role and formation of amyloid fibrils, protein structures that aggregate, resulting in a cross-β structure that can generate numerous biological functions. Protein aggregation can come from different processes occurring after translation including the increase in likelihood of degradation through the quality control system of the endoplasmic reticulum (ER), improper protein trafficking, or conversion of specific peptides and proteins from its soluble functional states into their highly organized aggregate fibrils.

Structures

X-ray Crystallography

From X-ray crystallography, three-dimensional crystals of amyloid fibril structures were formed and the structure of the peptide formation and how the molecule is packed together were examined. In one particular fragment, the crystal was found to contain parts of parallel β-sheets where each peptide contributes one single β-strand. The β-strands are stacked and β-sheets formed are parallel and side chains Asn2, Gln4 and Asn6 interact with each other in a way that water is kept out of the area in between the two β-sheets with the rest of the side chains on the outside are hydrated and further away from the next β-sheet.

Solid State Nuclear Magnetic Resonance (SSNMR)

Through solid-state nuclear magnetic resonance (SSNMR) and the help of other methods such as computational energy minimization, electron paramagnetic resonance and site-directed fluorescence labeling and hydrogen-deuterium exchange, mass spectrometry, limited proteolysis and proline-scanning mutagenesis the structure of an amyloid fibril was suggested to be four β-sheets separated by approximately 10Å.

Through NMR with computational energy minimization, a 40-residue form of amyloid β peptide at pH 7.4 and 24˚Celius was determined to contribute one pair of β-strand to the core of the fibril which is connected by a protein loop. The amyloid β peptides are stacked on each other in a parallel fashion.

From experiments of site-directed spin labeling coupled to electron paramagnetic resonance (SDSL-EPR), the molecule was found to be very structured in the fibrils and in parallel arrangement. SDSL-EPR along with hydrogen-deuterium exchange, mass spectrometry, limited proteolysis and proline-scanning mutagenesis suggests that the structure has high flexibility and exposure to solvent of N-terminal side, but is rigid in the other parts of the structure.

Experiments through SSNMR with fluorescence labeling and hydrogen-deuterium exchange determined that the C-terminals are involved in the core of the fibril structure with each molecule contributing four β-strands with strands one and three forming one β-sheet and strands two and four forming another β-sheet about 10Å apart.

SmallAmyloidFibril

Further experimentation approaching the atomic level with SSNMR techniques resulted in very narrow resonance lines in the spectra, showing that the molecules within fibrils hold some uniformity with peptides that display extended β-strands with the fibrils.

Conclusion

The structures determined from X-ray crystallography or SSNMR were similar to previously proposed structures from cryo-electron microscopy (EM) formed from insulin. EM, which uses electron density maps, revealed untwisted β-sheets in the structure. The similarities of the structures found in these experiments suggest a lot of amyloid fibrils can have similar characteristics such as the side-chain packing, aligning of β-strands and separation of the β-sheets. [9] Annu. Rev. Biochem. 2006.75:333-366. www.annualreviews.org. Retrieved 24 Oct 2011</ref>

Formation

The capability to form amyloidal protein structures that are considered to be genetic is from the findings that an increasing number of proteins show no signs of protein related diseases. It has been found that amyloidal proteins can be converted from its own protein that has a function rather than disease- related characteristics in living organisms.

In these protein mutations, different factors that affect the formation of amyloid fibril formation and different chains form amyloid fibrils at different speeds. In different polypeptide molecules, hydrophobicity, hydrophillicity, changes in charge, degree of exposure to solvent, the number of aromatic side chains, surface area, and dipole moment can affect the rate of aggregation of protein. It has been found that the concentration of protein, pH and ionic strength of the solution the protein is in as well as the amino acid sequence it is in determines the aggregation rate from the unstructured, non-homologous protein sequences.

As the hydrophobicity of the side chains increases or decreases can change the tendency for the protein to aggregate.

Charge in a protein can create aggregations through interaction of the polypeptide chain with other macromolecules around it. Also, the low tendency for β-sheets to form along with the high tendency for α-helixes to form contributes in facilitating amyloid formation.

It was found that the degree in which the protein sequence are exposed to solvent tend to affect the formation of amyloids. Proteins that are exposed to solvent seem to promote aggregation. Even though some other parts of the protein that had a high tendency to aggregate were not involved in the aggregation, they seem to at least be partially unexposed to the solvent but other regions that were exposed to solvent that were not involved in the aggregation had a low tendency to form amyloid fibrils.

It has even been raised that protein sequences have evolved over time to avoid forming clusters of hydrophobic residues by alternating the patterns of hydrophobic and hydrophillic regions to lower the tendency for protein aggregation to occur. [9]

The Affects of Sequence on the Formation of Amyloid Proteins

Amyloid formation arises mostly from the properties of the polypeptide chain that are similar in all peptides and proteins, but sometimes, the sequence affects the relative stabilities of the conformational states of the molecules. In that case, the polypeptide chains with different sequences form amyloid fibrils at various rates. Sequence difference affects the behavior of the protein aggression instead of affecting the stability of the protein fold. Various physicochemical factors affect the formation of amyloid structure by unfolded polypeptide chains.

Hydrophobicity of the side chains affects the aggregation of unfolded polypeptide chains. The amino acid in the regions of the aggregation site can change the ability of aggregation of a sequence when they increase or decrease the hydrophobicity at the site of the mutation or folding site. Over time, sequences have evolved to avoid creating clumps of hydrophobic residues by alternating hydrophobic areas of the protein.

Charge affects the aggregation of amyloid protein folding. A high net charge can have the possibility of impeding self association of the protein. Mutations in decreasing the positive net charge may result in the opposite effect of aggregate formation as increasing the positive net charge. It has been seen found that polypeptide chains can be run by interactions with highly charged macromolecules, displaying the importance of charge of a protein aggregation.

Secondary structures of proteins affect the amyloid aggregation as well. Studies show that a low probability to form α-helix structures and a high probability to form β-sheet structures are contributive factors to amyloid formation. However, it has been found that β-sheet formation is not particularly favored by nature since there are little alternation of hydrophilic and hydrophobic residue sequence patterns to be found.

The characteristics of the amino acid sequences affect the amyloid fibril structure and rate of aggregation. Different mutations, including changes in the number of aromatic side chains, the amount of exposed surface area and dipole moment, have been said to change the aggregation rates of lots of polypeptide chains.

Unfolded regions play vital roles in promoting the aggregation of partially folded proteins. Some regions that were found to be flexible or exposed to solvent were fond of aggregation. Other regions that are not involved in the aggregation were found to not be exposed, but rather half buried even though they have high possibility of aggregating while the exposed regions of the structure that are not involved in the aggregation have a low probability of aggregating amyloid fibrils. The fibrils tend to come together by association of unfolded polypeptide segments rather than by docking the structural elements.

Overall, it has been found that unfolded proteins have lower less hydrophobicity and higher net charge than that of a folded protein. Residues that tend not to form the secondary structure of β-sheet structured proteins seem to inhibit the occurrence of amyloid aggregation. Concentration of protein, pH and ionic strength were found to be associated with the amino acid sequence, which affects the rate of aggregation.

[9]

## Environmental Effects

It is understood that the primary structure (the amino acid sequence) of a protein predisposes the protein for a specific three dimensional structure and how it will fold from the unfolded form to the native state. The concentration of salts, the temperature, the nature of the primary solvent, macromolecular crowding, and the presence of chaperones are all factors that affect the mechanism of folding and the ratio of unfolded proteins to those in the native state. More than anything, these environmental factors affect the likelihood of any single protein reaching the correct final structure.

Isolated proteins placed in proper environments (specific solvent, solute concentrations, pH, temperature, etc.) tend to “self-fold” into the correct native conformation. Altering any of these environmental characteristics can disrupt the structure and/or interfere with the folding mechanism. A pH outside the “normal” range of a given protein can ionize specific amino acids or interfere with both polar and dipole-dipole intramolecular forces that would otherwise stabilize the structure. Excess heat (cooking) proteins can break hydrogen bonds essential to the secondary structure of proteins.

Extreme environments or the presence of chemical denaturants (such as reducing agents that can break disulfide bonds) can cause proteins to denature and lose its secondary and tertiary structure, forming into a “random coil.” Under certain conditions fully denatured proteins can return to their native state. Intentional denaturing is used in various methods to analyze biomolecules.

The complex environments within cells often necessitate chaperones and other biomolecules for proteins to properly form the native state.

## Molecular Chaperones

Molecular Chaperones are known mainly for assisting the folding of proteins. Chaperones are not just involved in the initial stages of a protein’s life. Molecular Chaperones are involved in producing, maintaining, and recycling the structure and units of protein chaperones. Chaperones are present in the cytosol but are also present in cellular compartment such as the membrane bounded mitochondria and endoplasmic reticulum. The role or necessity of chaperones to the proper folding of proteins varies. Many prokaryotes have few chaperones and less redundancy in the types of chaperones and whereas eukaryotes have large families of chaperones containing some redundancy. It is hypothesized that some chaperones are essential to proper protein folding such as the example of the prokaryote which has less variations of a chaperone family available. Other chaperones play less of an essential role such as in eukaryotes where more variations within a family of chaperones exist and gradients of efficiency or affinity are produced. This redundancy or existence of less efficient chaperones may exist in one state but the effectiveness of chaperones is also a function of their environment. The pH, space, temperature, protein aggregation and other external factors may render a chaperone that was once ineffective into a more essential chaperone. These environmental factors show why it is important to simulate cellular in vivo conditions, or native states, in order to grasp the conditions that require use of chaperones. This briefly summarizes the difficulties in analyzing and comparing chaperone function in vivo vs. in vitro. Simulating in vivo, or the environment within the cell, is important not just because of physical factors such as pH or temperature but also because the time in which the chaperone begins to conform the polypeptide. Some chaperones are nearby the ribosome and attach immediately to the polypeptide to prevent misconformation. Other chaperones allow the polypeptide to begin folding by itself and attach later on. Thus the role of each chaperone becomes specific to its vicinity to the polypeptide and time and place in which it assists folding. Recent research has implicated that chaperones within the nucleolus not only catalyze protein folding but also catalyze other functions important to maintain a healthy cell. These nucleolar chaperones are called Nucleolar Multitasking Proteins (NoMP's). Heat shock proteins, for example, not only help other proteins fold but also act during moments of stress to regulate protein homeostatis. Furthermore, there is evidence that chaperones work together in networks to oversee certain functions like dealing with toxins, starvation or infection.

The nucleolar chaperone network is divided into different branches that have specific functions. The network is dynamic and can vary in concentration or location of the network components depending on changes in the physiology and environment of the cell. Heat shock proteins (HSPs), which are classified based on their molecular weights, are integral components of the chaperone network. HSP 70s and 90s maintain proteostasis by ensuring that proteins are properly folded and preventing proteotoxicity, which is the damage of a cell function due to a misfolded protein. HSP70s help to fold recently synthesized proteins, while HSP90s help later in the folding process. The nucleolar network also contains chaperones that are part of ribosome biogenesis, or the synthesis of ribosomes in the cells. Proteins in the HSP70 and DNAJ families, which help to process pre-rRNA, are regularly found in protein complexes that process pre-rRNA in Saccharomyces cerevisiae (a species of yeast). Other HSPs are important in ribosome biogenesis as well, including HSP90 which works together with TAH1 and PIH1 to create small nucleolar ribonucleoproteins. The nucleolar chaperone network provide the organization and assistance needed to complete the biological taks necessary for cell survival, and if it does not function properly there can be many problems. For instance, when cancer cells have increased levels of rRNA synthesis, ribosome biogenesis is increased. Scientists are researching the compound CX-3543, which can stop nucleolin from binding with rDNA and impede RNA synthesis, leading to cell death. It is possible to potentially use drugs designed to target specific branches of the nucleolar chaperone network in malfunctioning cells. Other networks of chaperones include networks that specifically participate in de novo protein folding, meaning they help to fold newly made proteins, and the refolding of proteins that have been damaged. One chaperone network that exists in tumor cell mitochondria contains HSP90 and TRAP1, which protect the mitochondria and prevent cell death, allowing the cancer cells to continue to spread uncontrollably.[10]

### Example: Molecular Chaperone (HSP 70)

HSP 70 is a protein in the Heat Shock Protein family along with HSP 90. It works together with HSP 90 to support protein homeostasis. It binds to newly synthesized proteins early in the folding process. It has three major domains, the N-terminal ATPase domain, the Substrate binding domain, and C-terminal domain. The N-terminal ATPase binds and hydrolyzes ATP, the substrate binding domain hold an affinity for neutral, hydrophobic amino acid residues up to seven residues in length while the c-terminal domain acts as a sort of lid for the substrate binding domain. This lid is open when hsp 70 is ATP bound and closes when hsp 70 is ADP bound. HSP70, or DnaK, are bacterial chaperones and can help in folding by clamping down on a peptide.[11]

### Example: GroEL and GroES

GroEL and GroES, or 60kDa and 10kDa, are both bacterial chaperones. Both GroEL and GroES are structured so that they are a stacked ring with an empty center. The protein fits in this hollow center. Conformational changes within the chamber can then change the shape and folding of the protein.[11]

### Example: Molecular Chaperone (HSP 90)

HSP 90 is a protein in the Heat Shock Protein family. This particular protein, however, is different from other chaperones in that HSP90 is limited in the folding aspect of molecular chaperones. Instead, Hsp 90 is vital to study and understand because many cancer cells have been able to take over and utilize the Hsp 90 in order to survive in many virulent surroundings. Therefore, if one were to structurally study and somehow target Hsp90 inhibitors, then there could be a way to stop cancer cells from spreading. Furthermore, many studies have been performed in order to test whether or not the Hsp 90 chaperone cycle is driven by ATP binding and hydrolysis or some other factor. But after much research by Southworth and Agard, there was enough evidence to state that HSP90 protein could conformationally change without nucleotide binding but rather the stabilization of an equilibrium is the factor that will change the Hsp90 to a closed or compact or open state. The three conformations of the Hsp90 were found through x-ray crystallography and also through single electron particle microscopy and by studying the three-state conformational changes in yeast Hsp90, human Hsp90 and bacteria Hsp 90 (HtpG) it was clear that there are distinct conformational changes for specific species. Overall, Hsp90 is a chaperone that is more involved with maintaining homeostasis within a cell rather than the involvement of protein folding. Hsp90 has rising potential in the area of drug development in the future since it plays such an essential role in aiding the survival for cancer cells.

### Example: Molecular Chaperone (TF)

This is the first chaperone to interact with the nascent chain as it exits the ribosome tunnel. Without the nascent chain, the TF cycles on and off but once the nascent chain is present, it binds onto the chain, forming a protecting cavity around. In order to do its function, TF scans for any exposed hydrophobic segment of the nascent chain and it can also re-associate with the chain. Folding is found to be more efficient in the presence of the TF, however, this is done at the expense of speed, it can stay with the chain for more than 30 seconds. The release of the chain is triggered when the hydrophobic portions is buried as the folding progresses toward the native state.

### Example: Molecular Chaperone (YidC, Alb3, Oxa1)

YidC, Alb3, and Oxa1 are proteins that facilitate the insertion of proteins in the plasma membrane. YidC is a protein that has only two polypeptide chains. The formation of its structure has been supported by particular phospholipids. YidC proteins can be found in Gram-negative and Gram-positive bacteria. Oxa1 can be found in the inner membrane of the mitochondria. Alb3 locates in the membrane of the thylakoid inside the chloroplast. Experiments showed that YidC protein actively contributes to the insertion of Pf3 coat protein. In addition, YidC also has direct contact with the hydrophobic segment of Pf3 coat protein. Although Oxa1 can only be found in the mitochondria it can also facilitate the insertion of membrane proteins in the nucleus. The role of YidC and Alb3 seems to be interchangeable because Alb3 can replace YidC in E. coli. Moreover, YidC, Oxa1, and Alb3 all support the insertion of Sec-independent proteins. Oxa1 only supports the insertion of Sec-independent proteins because the mitochondria in yeast cell do not have Sec proteins.

### NLR

Nucleotide-binding domains that are leucine- rich (NLR) provide a pathogen-sensing mechanism that is present in both plants and animals. They could either be triggered directly or indirectly by a derivation of pathogen molecules via elusive mechanisms. Researches show that molecular chaperones like HSP90, SGT1, and RAR1 are main stabilizing components for NLR proteins. HSP90 can monitor the function of its corresponding clients that apply to NLR proteins in three practical ways: promotion of steady-state of functional threshold, activating stimulus-dependent activity, and raising the capacity to evolve.

Plants contain many NLR genes that considered being polymorphic in the LRR domain in order to be familiar with the highly diversified pathogen effectors. The NLR sensor stability will be the mechanism that will determine the pathogen recognition. The HSP90 system is advantageous for plants because it will couple metastable NLR proteins and stabilize them in a signaling competent condition. This will allow for the masking of mutations that would be detrimental.

### Molecular Chaperone Mechanism for Substrate Binding in Protein Folding

It is known that chaperones work together to aid in the folding of protein in order to prevent misfolding. However, the mechanism of how chaperones help in protein folding was not fully understood. Recent studies on Hsp40 and Hsp70 have provided more insights into the mechanism of chaperones and their substrate. The Hsp40 family consists of many Hsp40 with different J-domain. Different J-domain will carry out different Hsp70 ATPase activities when Hsp40 binds to Hsp70. In protein folding, an unfolded polypeptide binds to a Hsp40 co-chaparone. From there, the J-domain of Hsp40 binds to the nucleotide-binding domain (NBD) of Hsp70. A conformation change in the Hsp70 substrate-binding domain occurs when the hydrolysis of ATP to ADP takes place on the HSP70 NBD. This causes Hsp70 to have a higher affinity for the polypeptide substrate and unbind the substrate from Hsp40. When ADP is exchange for ATP, the polypeptide substrate is released from Hsp40. Studies have shown that nucleotide exchange factors make changes to the lobe on the Hsp70 ATPASE domain in way that decreases Hsp70’s affinity for ADP. Once the polypeptide is released from Hsp70, it can fold to its native state or it can be refolded by the chaperones if there is a misfolding. If a polypeptide that is bounded to Hsp70 is recognized by E3 ubiquitin ligase CHIP, it will be degraded.[12]

### Small Heat Shock Proteins & α-crystallins as Molecular Chaperones

It is known that small heat shock proteins (sHSPs) and the related α-crystallins (αCs) are virtually ubiquitous proteins that are strongly induced by a variety of stresses, but that also function constitutively in multiple cell types in many organisms. Extensive research has demonstrated that a majority of sHSPs and αCs can act as ATP-independent molecular chaperones by binding denaturing proteins and reversing denaturation. This approach thereby protects cells from damage due to irreversible protein aggregation. Many inherited diseases have been discovered to result from defects in sHSP/αCs, and these proteins accumulate in neurodegenerative disorders and other diseases linked to aberrant protein folding. sHSP/αC proteins range in size from ~12 to 42 kDa and is a C-terminally located domain of ~90 amino acids, known as the αC domain. sHSP-substrate complexes can be observed by size exclusion chromatography . They are large and heterogenous, and their size distribution depends on the ratio of sHSP/αC to substrate as well as the rate of substrate aggregation, which is affected by concentration and temperature. Substrate binding is generally facilitated by an increase in available hydrophobic surface on the sHSP/αC, which seem to occur without significant loss of defined sHSP/αC secondary and tertiary structure. There is no single, specific substrate binding surface on sHSP/αCs. It rather appears that many sites contribute to substrate interactions, and binding is probably different for different substrates dependent on the conformation of surfaces exposed when a substrate unfolds. However, some sHSP/αCs recognize almost any unfolding protein, which suggests that they act on any labile or damaged cellular component.

## The Energy Landscape for Protein Folding

If proteins folded randomly and unpredictably, the amount of time taken to reach the native conformation would be much larger than the actual time it takes. The current theory on how protein folding occurs naturally and efficiently involves a "funnel" of sorts-the idea being that there exists not a step by step means of reaching the correct 3-D structure, but rather a number of paths that become progressively narrower from top to bottom. The funnel starts at the top and proceeds downward from energetically disfavorable folding at the top to energetically favoring proper folding at the bottom.

The experiment that sparked the idea of proteins relying on energetics and thermodynamics to reach their native folding was conducted by Christian Anfinsenf in 1961, when he discovered that ribonuclease could spontaneously refold into its proper structure after being denatured without the help of other molecules. Further theoretical proof that protein folding is not random is seen in Levinthal's Paradox, which states that it would take roughly 10^81 years for a protein 100 amino acids long to reach the proper conformation, when in reality, it takes anywhere from a millisecond to a day.

These funnel models (such as the Go-type model) show funnels with hills and bumps that represent the protein taking the path of least resistance when moving down the energy funnel. These bumps are termed "points of frustration". It is believed that funnels with the fewest frustration points or bumps fold into their native forms faster since fewer energy boundaries exist. Although these models are simplified attempts and do not account for misfoldings, they nonetheless prove accurate in the case of many proteins.

Another model that uses algorithms and computers is the empirical force field. This model uses hundreds of thousands of computers running idly to compute folding scenarios of proteins under 50 amino acids with surprising accuracy. However, these computer models will sometimes overestimate unlikely folding structures or produce folding patterns that are rarely or never seen. For example, some simulations/algorithms have a tendency of getting stuck in the local minima and are unable to reach the global minima, which is the correctly folded protein. Simple models such as Go-type models not only predict the folded protein, but also the transition states that determine the rate of the protein folding.

These models are just beginning to show the dynamics of the intermediate stages of protein folding. As such, this is an area under further investigation. The understanding of the kinetics of protein folding is less established, and the movement of proteins between initial amino acid strands and the final product is also an area under investigation. The energy landscape model also has trouble accounting for external factors like crowding and aggregates. One such example of external interaction, called "domino swapping", involves the swapping of monomers from one protein to another in order to activate the correct folding of both proteins.

Recent studies have combined human and computer power to correctly predict the protein conformation. Websites like fold.it, overseen by the University of Washington's Computer Science department, turn the folding problem into a video game, allowing people around the world to solve protein folding problems like puzzle games. Users are given partially folded proteins, usually those stuck in a locally favorable conformation that seems optimal to a computer, and asked to reconfigure the protein into a shape that looks more stable. Utilizing a computer's computing power and speed along with a human's ability to manipulate objects in space shows promise in helping to solve protein folding problems more efficiently.

## Co-operativity and Protein Folding Rates

The cooperative nature expressed in protein folding is one of the most remarkable aspects of protein folding. Contrary to the traditional viewpoint of complex and heterogeneous mechanisms involved in the folding of a protein, the cooperative two-state folding kinetics shown by many proteins is relatively simple. Due to its simplicity, efforts to understand what determine the co-operativity and the diversity of protein folding rates are made recently by means of applying the cooperative two-state folding kinetics.

The co-operativity of the protein is usually referred to the mechanism by which the presence of a structural region makes additional order more favorable in protein folding. As mentioned previously, the cooperative two-state folding kinetics of small globular proteins is relatively simple and become an interest of study of many scientists. The experiment that excites single molecule that is sensitive enough to allow estimation of transition time reveals two-state co-operativity.

The general trends revealed by two-state folding proteins may be summarized as the following two points. Firstly, more topologically complex proteins tend to fold more slowly than proteins with simpler, local topology; secondly, larger proteins tend to fold more slowly than smaller proteins. The largeness and smallness of a protein here are defined base on its chain length.

Protein folding kinetics is controlled by the free energy barrier determined by the gain of energy and the loss of entropy in the transition state. In describing the pattern, scientists introduce principle of minimum frustration of energy landscape theory. The theory refers to the concept that native-like structures have lower free energy than other random configurations during protein folding. Thus, native-like structures encourage fast folding of the protein and serve as a driving force toward native state, the functional form or the tertiary structure of the protein. This principle can be expressed by the funnel energy landscape.

Funnel Energy Landscape

Funnel energy landscape depicts the energy landscape of a folding protein as a rough funnel. The roughness comes from non-native contacts in protein folding process.The landscape is inherently many-dimensional, so funnel is a projection on the two-dimensional graph. The depth of the funnel represents the energy of a conformational state; the width of the funnel represents the measure of l entropy. The bottleneck of the funnel represents the transition state configuration of the folding protein, whereas the bottom of the funnel represents the native state of the protein. As the protein goes toward its native state, it experiences entropy loss and it achieves lower energy state. The funnel energy landscape serves as a convenient illustration for scientists to envision the thermodynamics and kinetics of the protein folding process.

φ (phi)value

Another concept that plays a role in the study of protein folding kinetics is the φ (phi) value. The value refers to the approximate measurement of native structure content in transition state configuration. The comparison with φ value serves as one of the ways to examine various models that studies protein folding kinetics.

General observations

The fist trend mentioned may be easily understood from an entropic point of view. More topologically complex proteins, or proteins that have long-range contacts, are expected to have higher entropic cost compared with proteins have short-range contacts in terms of folding. The second trend was recently confirmed by experiments focused on the influence of protein size on folding rates. It was found that simple model based only on chain length could roughly predict a protein’s folding rate and stability.

Go¯model

Coarse-grained topology models (Go¯model) are widely used to study the co-operativity and kinetics of protein folding, as it is noted that the topology of native protein determines the folding mechanism. Typical Go¯model simplifies the protein where there is only one interactions stabilizing the folding protein. Early models often examine the non-additive force acting in the protein folding, such as side-chain ordering and hydrophobic effects. Recently, more variety of Go¯models is used to study the protein folding kinetics.

• Bulleted list item
• The Go¯model (this refers to Eastwood and Wolynes’ model here) with nonpairwise-additive interactions between the native contacts of the protein demonstrates that short-ranged multi-body interaction can increase the free energy barrier and make the transition state configuration more localized.
• The lattice Go¯model, on the other hand, demonstrates the coupling local and core burial interactions promoting co-operativity as well as increasing the correlation with contact order.
• The Go¯model with pairwise-additive interactions, particularly the ones focusing on the effects of varying strength of three-body interactions and φ values, shows that three-body interactions increase energy barrier and increases the agreement with measured φ values.
• In addition, solvent-mediated interactions are also introduced into Go¯model. Where the interactions between contacts are replaced by solvent separated minimum and desolvation barrier, it is observed that kinetics and co-operativity of protein function increase as a function of the height of desolvation barrier. The advantage of solvent-mediated Go¯model is that it is useful in distinguishing short-ranged contacts and long-ranged contacts and therefore differentiating proteins with simple topologies and the ones with more complex topologies. In study of solvent-mediated Go¯model the chevron plot is often used. The chevron plot is a way to represent protein folding kinetic datas in varying concentration of denaturation that disrupts the native structure of the protein.
• Variational Go¯model improves co-operativity by excluding volume force between the residues that are in close contacts in native state. In this model it is achieved that a) the Co-operativity is stronger for long-ranged contacts; b) the range of calculated rate is broaden; c) the calculated φ values are improved. There is also Go¯models that entirely focus on the funnel aspect of the protein folding energy landscape and ignore the non-native contact effects.

Other model, such as capillarity model, assumes the volume of folding nuclei scales with number of monomers. In such model, it is shown that increased co-operativity tends to slow down kinetics and smooth the energy landscape.

Conclusion

The recent development of topological models with non-additive forces is becoming a more popular and reliable way to understand the co-operativity of protein folding rates. Refinement of this model has shown its promising future on a more explicit and through understanding of what determines protein folding rates and mechanism. Go¯models that enables long-ranged contacts become more cooperative, and φ values more accurate need further improvement and more attention in the study of protein folding kinetics and the folding mechanism.

## Relationship between Protein Sequence, Structure, and Function

There have been several protein prediction methods developed in the past 20 years. A universal method has not been developed that applies to all proteins because each method has its advantages and disadvantages. The difficulty of developing such a method is due to our incomplete understanding of the highly intricate relationship between protein sequence, structure, and function.

The theory of correlating amino acid sequence to its structure was shown by Anfinsen. He demonstrated that a denatured (unfolded) protein could regain its native tertiary structure spontaneously. This method is also a useful contributor for assigning function to protein structure. A protein researcher could predict that hydrophobic substrates could potentially bind to hydrophobic regions of the protein and vice versa for charged regions. The problem with this method is that it doesn’t take into account certain factors such as atypical environmental conditions.

It was thought that similar sequencing implies related structures. This theory only holds true for a handful of proteins. Researchers saw that similarities in protein folds aren’t always related to its protein sequence. Due to these findings, the ‘Paracelsus Challenge’ was purposed in 1995. The theory behind the ‘Paracelsus Challenge’ was to develop two proteins that were more than 50% identical in sequence, but they both had completely different folds. The challenge was satisfied in 1997 by with two protein sequences that shared 88% sequence identity (GA88 and GB88). Recent studies show that as little as 3 mutations are enough to induce different folding patterns. Although the outcomes of the ‘Paracelsus’ challenge are very interesting, they rarely occur in nature.

Functional convergence causes problems in assigning a specific function to a structure. Various structures can adopt similar functions, but some can adopt very different functions as well. However, there is a significant correlation between certain folds and specific functions. There are two major variables in function prediction: (1) the locations of binding site, and (2) the range of functions at the site. Metal, ions, cofactors, and other proteins that contribute to functions must be taken into considerations as well. One problem that arises with these factors is when determining a structure via crystallography. The PROCOGNATE resource and PIDA database offers a solution to this problem.

A widely used method by which protein function is defined is derived from the Gene Ontology, which consists of three graph structures in which functional terms and relationships between them are defined. Limitations of gene ontology arise with proteins that are non-positional and when proteins have no defined relationship between ligand in its crystal structure. Other developments that attempt to bridge this gap includes The Protein Feature Ontology (PFO [29]) and The Distributed Annotation System (DAS[30]).

Two approaches are used to determine a functional site: (1) either with no knowledge of where the site is or what it binds, or (2) with prior knowledge of the interaction partner. The most highly used methods involve bioinformatics such as the SOIPPA method. A very important contributor to assigning function to protein is sequence conservation, but it is difficult to determine if residues are conserved for structural or functional reasons. Another method involves energy-based approach. A recent development is the ProFunc server, which combines methods such as InterProScan and BLAST search.

Predicting binding sites (which are immensely complex in its own nature) is only the first step of the puzzle. The next step is to determine the overall function in terms of biochemical function, and even more challenging is determining its biological role. The difficulties with analyzing protein function increased another magnitude of complexity when researchers came across the fact that protein function may not only depend on its final folded product. A protein could have functionalities in its partially denatured state and it fully denatured state. With all of this said, it is safe to say that there is still a lot to learn about the relationship between sequence, structure, and function of proteins.

## Domain Swapping, Folding and Misfolding

The domain swapping that occurs in proteins may be important in the folding or misfolding process in proteins. Domain swapping occurs when two or more identical protein chains swap with each other. The domain swapping can be thought of as a mechanism for the interchanging of monomers and oligomers. What happens in oligomeric swapping is that one monomer from one protein will swap with another identical monomer from a different protein. This domain swapping mechanism has been observed in various proteins, more than 40 different proteins. The swapping mechanism is important for some protein functions. For a specific protein for example, p13suc1 it has been seen that the swapping and aggregation correlate meaning that they have a common mechanism. P13suc1 is required for cyclin-dependent kinase (Cdk) during the cell cycle progression. P13suc1 has two different states, one being a monomer and the other a swapped dimmer. The domain swapped part is a β strand is not an independently folded domain. While studying this, it was found that β4 has a critical role when in contact with β2 because they pair with each other early on in the folding process. Therefore, for p13suc1, it has been shown that the regions that have been interchanged are responsible for the folding and misfolding of the protein. There seems to be a competition between folding and misfolding in proteins because polypeptide chains can fold into structures or misfold into amyloid fibrils. What seems to be even more crucial in protein folding is the presence of a folding nucleus which forms part of the protein chain in the transition state. A correlation between residues involved in protein folding nuclei location and amyloidogenic regions have been found as well as important information that fibril formation and protein folding may contain key residues. By using the modeling of folding of proteins and looking at the exchangeable regions in the oligomeric form, the relationship can be seen as responsible for folding and misfolding. This may take researchers one step closer to solving the protein solving problem and understand how proteins get their folding instructions. Reference: http://www.benthamscience.com/open/tobiocj/articles/V005/27TOBIOCJ.pdf

## Death-fold Superfamily[13]

There are 4 subfamily structures in the death-fold superfamily. They consist of Death Domains (DDs), Death Effector Domains (DEDs), CAspase Recruitment Domains (CARDs) and PYrin Domains (PYDs). These subfamily structures are involved in the assembly of multimeric complexes which may be implicated in cell inflammation and death.

Structure and Function of a Death-Fold Domain

There are currently 102 known proteins that have death-fold superfamily domains. These domains contain homotypic interactions. These proteins consist of 39 DDs, 8 DEDs, 33 CARDs, and 22 PYDs. Although these domains have up to a 90% difference in sequence, they all have the characteristic death-fold. This fold consist of a "globular structure where 6 amphipathic alpha-helices are arranged in an anti-parallel alpha-helix bundle with Greek key topology" (Peter Vandenabeele et al., 2012). The difference between these death-domains which constitute either of the subfamilies is found in the alpha-helices length and orientation and the distribution of hydrophobic and charged residues along the surfaces of the complexes.

The believed function of the death-fold domains is to mediate the assembly of large oligomeric signaling complexes. At these complexes, caspases and kinases activity is increased. Before now, little was known about the structural conformation of protein assemblies with death-fold domains.

Three distinct Interaction Types

Type I Interaction: Residues from helices 1 and 4 (Patch Ia) of one death-fold domain interact with residues from helices 2 and 3 (Patch Ib) of another death-fold domain. Type II Interaction: Residues from helix 4 and the loop between helices 4 and 5 (Patch IIa) of one death-fold domain interact with residues of the loop between helices 5 and 6 (Patch IIb) of another death-fold domain. Type III Interaction: Residues from helix 3 (Patch IIIa) of one death-fold domain interact with residues located on the loops between helices 1 and 2 and between helices 3 and 4 (Patch IIIb) of another death-fold domain.

Previous theory suggested that the three interaction types were conserved throughout the death-fold superfamily but it now seems that there are differences seen between interactions of the same type of death-fold domains.

Crystal Analysis of Death-Fold Domains

Only three DD complexes have had their crystal structure analyzed. They are PIDDosome, MyDDosome, and the Fas/FADD-DISC. The analyses of these structures have shown that DDs can engage in up to six interactions.

Death-Domains and Medicine

Death-domains have been shown to facilitate the assembly of multimeric complexes that lead to inflammation and cell death. Understanding of these structures can generate therapeutic benefit by preventing or triggering the formation of these oligomeric complexes. Diseases that may be affected by these interactions can include neurodegenerative and inflammatory disorders as well as many others that have characteristic of inflammation or excessive cell death.

## Disordered Proteins

While folding is typically a major contributor to protein function, some proteins do not fold into a specific structure, yet still possess a function. Instead of a specific structure, these proteins often shift between different forms and/or have disordered regions that do not hold to a particular shape.

Just as a protein's folding is determined by its amino acid sequence, non-folding proteins are non-folding because of their sequence. These proteins tend to have much less of certain amino acids than folding proteins, and much more of others. Specifically, non-folding have less of the amino acids that form the hydrophobic cores of folding proteins and more of the surface amino acids. The formation of a hydrophobic core is one of the first steps in most protein folds and, once formed, the core tends to provide the driving force for stable final structures. Without the amino acids to form a core, proteins are not driven towards a specific structure.

# = = CONCEPT = =

Several molecular chaperons that are fully folded and inactive under non-stress conditions have been known as conditionally disordered proteins. These chaperons have a partially disordered conformation when they exposed to distinct stress conditions. This disorder is very important because they are able to protect cells against stressors. The study of these disordered chaperons lead to more understanding of the functional role for protein disorder in molecular recognition. X-ray crystallography is a useful technique that helps visualize the structures of the proteins. Based on this technique, over 95% of the entire molecule is represented by 25% of crystal structures and all others have missing electron density for more than 5% of their sequence due to the multiple conformations on these regions. Proteins actually have some disordered conformation and these disordered proteins lie at one extreme part from very flexible to static structural states on a continuous spectrum. Either only a part of the protein or the whole complete polypeptide chain is found in this disorder. Therefore, investigating only some parts of the proteins would not help summarize the flexibility of the protein. The term “conditionally disordered” means the disorder of proteins may happen under some certain conditions and may not happen under other conditions. It is very common to see the intrinsic disorder within proteins. For example, between 30% and 50% of eukaryotic proteins are estimated to have more than 30 amino acids that violate the defined secondary structure in vitro and many complete unstructured proteins have been predicted to exist too. It is still very challenging to verify the status of folding of proteins within the region of cells despite a lot of computational methods that have been used. There is a chance that many proteins which are seen either partially or fully folded happen to be unstructured in cells. The number of these chances is still uncertain. It is however thought that the presence of the appropriate binding pairs would make the disordered proteins come into their folded state, which means that the percentage of intrinsically disordered proteins in vitro might be lower in the cell. The extent of the disorder might be decreased by the stabilizing interactions within the cells. Through chemical shift, residual dipolar coupling, and paramagnetic resonance enhancement measurements, NMR serves as a good method to provide the detailed information on extent of disorder of the proteins.

# = = CONDITIONALLY DISORDERED PROTEINS = =

There are two states of disordered proteins. One shows a high degree of flexibility and the other state is where the protein is found more ordered. Thus, in order to know the cause and effect relationships between disorder and function, it is essential to study both states. Many disordered proteins like DNA, proteins, and membranes refold once they find a partner to bind to. Also, order-to-disorder-to-order transitions can occur. Proteins that are involved into multiple binding are very good examples of conditional disorder. Binding surfaces that are disordered before binding are able to fold into distinct conformations with other partners better than the binding surfaces that are already well-organized. The ‘conformational selection hypothesis’ suggests that different members of conformational ensemble can be stabilized by the binding of different partners. On the other hand, the ‘folding upon binding’ model proposes that proteins may be able to fold into different conformations when they bind with different partners.

### Frequency

Predictions done on whole proteomes suggest that the frequency of disordered proteins in eukaryotes is much larger than in prokaryotes, with the frequencies in the two groups of prokaryotes, archaea and eubacteria, being similar. In mammals, about half of all proteins are predicted to have large unordered regions, with about a quarter being fully disordered.

### Function

Disordered proteins are prevalent in signaling and regulation, especially in interactions with biomolecules such as nucleic acids and other proteins. Molecular recognition and protein assembly and modification frequently involve proteins with disordered regions. The ability of these proteins to interact with multiple molecular partners means that they are also common in protein-protein networks, either as hub proteins or as proteins interacting with hub proteins.

### Diseases

Disordered proteins are implicated in a number of human diseases. In particular, the amyloid diseases, which involve the accumulation of misfolded proteins, seem to be associated with disordered proteins, probably because their variable regions make them more likely to have a structure that favors their accumulation. This category includes many neurodegenerative diseases, such as Alzheimer's and Parkinson's.

## The Role of Computers in Determining Structure and Function of Proteins

The structure or folding of an amino acid and by extension its function can be analyzed and compared through its primary structure or amino acid sequence using computer algorithms. Comparisons of amino acid sequences of unknown folding patterns with similar amino acid sequences of known folding is enhanced using computers. A computer automated tool called Protein Basic Local Alignment Search Tool, or protein BLAST, is a free search tool open to the public that allows quick comparison of amino acid sequences in an online database. The output of this tool is the percent match of amino acids and the known properties of the sequence matches. Furthermore, because amino acid sequences are based on DNA sequences, three bases code for one amino acid, the protein under scrutiny can be analyzed on a DNA level using DNA BLAST. The integration of public databases of amino acid and DNA sequences along with computer algorithms has accelerated the genome and proteome field by allowing scientists around the world to share and analyze sequences.

Appendix

The Role of Computers

The scientists credited for creating the BLAST program are Webb Miller, David J. Lipman, Warren Gish, Eugene Myers, and Stephen Altschul from the NIH

Molecular Chaperones

Pain, Roger H. Mechanisms of Protein Folding. 2nd ed. 364-85

The Energy Landscape for Protein Folding

Cho, Samuel S. "Energy Landscapes for Protein Folding, Binding, and Aggregation: Simple Funnels and Beyond." UCSD Dissertation (2007).

Cheung, Margaret S. "Energy Landscape Aspects of Protein Folding Dynamics Relevant to Molecular Functions." UCSD Dissertation (2003).

Yang, Sichun. "Extending the Theoretical Framework of Protein Folding Dynamics." UCSD Dissertation (2006).

Intramolecular Interactions

Pain, Roger H. "Mechanisms of Protein Folding" 2nd ed.

http://www.nature.com/horizon/proteinfolding/background/importance.html

Berg "Biochemistry" 6 Edition

## Co-translational protein folding

In silico modeling studies have helped identify several characteristics of co-translational folding pathway. First, it was determined that in vivo protein folding is a vectorial process, which is a dispersion change. Second, co-translational vectorial folding of the developing polypeptide from its N-terminal end to its C-terminal end results in a sequential structuring of the distinct regions of the polypeptide emerging from the ribosomal tunnel. Third, attachment to the developing polypeptide chain to the ribosome during protein synthesis reduces the conformational space and the degrees of freedom of the growing chain. This limits the number of possible intermediates and reduces the number of possible folding pathways. Fourth, co-translational protein folding begins early during the process of polypeptide chain synthesis on the ribosome, with some elements forming inside the ribosomal tunnel. Fifth, folding catalysis and molecular chaperones interact with the growing developing chain as soon as it emerges from the tunnel. This accelerates the slow steps in protein folding and prevents misfolding of proteins.

## References

1. Berg, Jeremy, Tymoczko J., Stryer, L.(2012). Protein Composition and Structure.Biochemistry(7nd Edition). W.H. Freeman and Company. ISBN1-4292-2936-5
2. Lindquist, Susan (1999). "What is a Prion?". Retrieved 2009-10-09.
3. "Unraveling the Mystery of Protein Folding". [Thomasson, W.A. "Unraveling the Mystery of Protein Folding]. Retrieved 2009-10-18.
4. "Folding Away Cystic Fibrosis". [1]. Retrieved 2009-10-18.
5. "The Basics of Huntington's Disease". [3]. Retrieved 2009-10-18.
6. a b c Invalid <ref> tag; no text was provided for refs named annu
7. Piotr Banski, Mohamed Kodiha and Ursula Stochaj (2010). "Chaperones and multitasking proteins in the nucleolus: networking together for survival?". Retrieved 2010-10-16.
8. a b Joan L. Slonczewski, John W. Foster. "Microbiology: An Evolving Science."
9. Summers, Daniel W., and Peter M. Dougla (2009). "Polypeptide Transfer from Hsp40 to Hsp70 Molecular Chaperones.". Retrieved 2010-10-24.
10. 11. Kersse K, Verspurten J, Vanden Berghe T, Vandenabeele P. The death-fold superfamily of homotypic interaction motifs. Trends in biochemical sciences. 2011;36(10):541–52. Available at: http://www.ncbi.nlm.nih.gov/pubmed/21798745. Accessed October 29, 2012.

12. Small heat shock proteins and α-crystallins: dynamic proteins with flexible functions. Basha E, O'Neill H, Vierling E. Trends Biochem Sci. 2012 Mar;37(3):106-17. Epub 2011 Dec 14

Conditional disorder in chaperone action. Bardwell JC, Jakob U. Trends Biochem Sci. 2012 Sep 24. pii: S0968-0004(12)00127-2. doi: 10.1016/j.tibs.2012.08.006. [Epub PMID 23018052 [PubMed - as supplied by publisher] "Molecular Biology of the cell." Fifth Ed-Alberts, Johnson, Lewis, Raff, Roberts, Walter. pg. 716-717

Braakman, Ineke, and Neil J. Bulleid. "Protein Folding and Modification in the Mammalian Endoplasmic Reticulum." Annual Review of Biochemistry. 80. (2011): 71-99. Web. 29 Oct. 2011. <http://www.annualreviews.org/doi/pdf/10.1146/annurev-biochem-062209-093836>.

Cabrita LD, Dobson CM, Christodoulou J. Protein folding on the ribosome. Current Opinion in Structural Biology 2010, doi:10.1016/j.sbi.2010.01.005

A Keith Dunker, Israel Silman, Vladimir N Uversky, Joel L Sussman. "Function and structure of inherently disordered protein." Curr Opin Struct Biol. 2008 Dec;18(6):756-64

Booth Paula J, Curnow Paul. Folding Scene Investigation: Membrane Proteins. Current Opinion in Structural Biology 2009, doi:10.1016/j.sbi.2008.12.005

Heijne, Gunnar Von. "Membrane Protein Folding and Insertion." Annual Review of Biochemistry 80 (2011): 157-60. 26 Oct. 2011 <http://www.annualreviews.org/doi/full/10.1146/annurev-biochem-111910-091345?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%3dpubmed>

Kuhn, Andreas, Rosemary Stuart, Ralph Henry, and Ross E. Dalbey. "The Alb3/Oxa1/YidC protein family: membrane-localized chaperones facilitating membrane protein insertion?" TRENDS in Cell Biology 13 (2003): 510-16. 26 Oct. 2011 <http://www.cell.com/trends/cell-biology/abstract/S0962-8924(03)00196-X>

Table 1: Berg, Jeremy. Relative Frequencies of Amino Acid Residues in Secondary Structures. 2012. Biochemistry, New York . Print.

Voet, Donald, Judith G. Voet. Biochemistry 3rd ed. New Jersey: John Wiley & Sons, Inc, 2004. Print.

Original hard-sphere, reduced-radius, and relaxed-tau φ,ψ regions from Ramachandran, with -180 to +180 axes
Backbone dihedral angles φ and ψ (and ω)

A Ramachandran plot, also known as a Ramachandran diagram or a [φ,ψ] plot, was originally developed by Gopalasamudram Ramachandran, an Indian physicist, in 1963. Ramachandran Plot is a way to visualize dihedral angles ψ against φ of amino acid residues in protein structure. Ramachandran recognized that many combinat ions of angles in a polypeptide chain are forbidden because of steric collisions between atoms. His two-dimensional plot shows the allowed and disfavored values of ψ and φ: three-quarters of the possible combinations are excluded simply by local steric clashes. Steric exclusion is the fact that two atoms cannot be in the same place at the same time is the powerful organizing principle that propels the use of the Ramachandron plot forward.

## Torsion Angles

The two torsion angles of the polypeptide chain, also called Ramachandran angles, describe the rotations of the polypeptide backbone around the bonds between N-Cα (called Phi, φ) and Cα-C (called Psi, ψ). The Ramachandran plot provides an easy way to view the distribution of torsion angles of a protein structure. It also provides an overview of allowed and disallowed regions of torsion angle values, serving as an important factor in the assessment of the quality of protein three-dimensional structures.

Torsion angles are among the most important local structural parameters that control protein folding - essentially, if we would have a way to predict the Ramachandran angles for a particular protein, we would be able to predict its 3D structure. The reason is that these angles provide the flexibility required for folding of the polypeptide backbone, since the third possible torsion angle within the protein backbone (called omega, ω) is essentially flat and fixed to 180 degrees. This is due to the partial double-bond character of the peptide bond, which restricts rotation around the C-N bond, placing two successive alpha-carbons and C, O, N and H between them in one plane. Thus, rotation of the main chain (backbone) of a protein can be described as the rotation of the peptide bond planes relative to each other.

## Regions in Ramachandran Plot

The Ramachandran Plot helps with determination of secondary structures of proteins.

• Quadrant I shows a region where some conformations are allowed. This is where rare left-handed alpha helices lie.
• Quadrant II shows the biggest region in the graph. This region has the most favorable conformations of atoms. It shows the sterically allowed conformations for beta strands.
• Quadrant III shows the next biggest region in the graph. This is where right-handed alpha helices lie.
• Quadrant IV has almost no outlined region. This conformation(ψ around -180 to 0 degrees, φ around 0-180 degrees) is disfavored due to steric clash.

## Exceptions

Exception from the principle of clustering around the α-helix and β-strand regions is glycine. Glycine does not have a complex side chain, which allows high flexibility in the polypeptide chain as well as torsion angles, something normally not allowed for other amino acid residues. That is why glycine is often found in loop regions, where the polypeptide chain makes a sharp turn. This is also the reason for the high conservation of glycine residues in protein families, since the presence of turns at certain positions is a characteristic of a particular fold of a protein structure.

Another residue with special properties in terms of its torsion angles is proline. Proline, in contrast to glycine, fixes the torsion angles at values, which are very close to those of an extended conformation of the polypeptide (like in a beta-sheet). Proline is often found at the end of helices and functions as a helix disruptor.

## References

SPX

The “protein folding problem” consists of three closely related puzzles:

1. What is the folding code?
2. What is the folding mechanism?
3. Can we predict the native structure of a protein from its amino acid sequence?

## Protein Folding Problem

The Protein Folding Problem is the obstacle that scientists confront when they try to predict 3D structure of proteins based on their amino acid sequence. Although it is known that a given sequence of amino acids almost always folds into a 3D structure with certain functions, it is impossible to predict, with high precision, the exact folding pattern. Understanding the speed of proteins folding, which occurs extremely quickly, has also become a challenge to scientists. To be able to understand any type of biochemical reaction requires isolation and structure determination of reactants, intermediates and products. In protein folding, the isolation of reactants, intermediates and products is complicated because most interactions in proteins are non-covalent and weak interactions which lead to rapid rates of interconversion between each reaction state. Therefore, the isolation of intermediates is not easily achieved and therefore inaccessible for X-ray crystallography. In addition, several advances in protein folding research have been made in characterizing reactants and intermediates. Based on the complexity of protein folding, there are 3 major problems of protein folding: The folding code, structure prediction and the folding speed and mechanism.

## The Three Folding Problems

### The Folding Code

In the late 1980s, scientists discovered that there is a sequence of amino acid code that folds proteins in a particular way. The starting point of protein folding is indeed the primary structure (the sequence of amino acids), also known as denatured state of the protein. Even the smallest amount of the denatured state can activate nucleation and proliferation carried out through protein folding pathways. Characterization of these denatured states of proteins at physiological conditions is very difficult because it is necessary to unfold the proteins to their denatured states without the presence of denaturants [2, Travagilini-Allocatelli et al.].

Recent research has allowed the study of denatured states to reach new heights using the single-molecule approach. Researchers used single-molecule experiments to examine coil to globule transition of proteins and have demonstrated that the denatured state showed steady expansion as the concentration of denaturant was increased. Similarly, at low denaturant concentrations, the peptide chain of the protein collapsed in a sequence dependent manner [2, Travagilini-Allocatelli et al.].

Also there have been advancements to study intermediates in protein folding. For example, the denatured state of the engrailed homeodomian (En-HD) was engineered to be denatured in physiological conditions and Nuclear Magnetic Resonance (NMR) has shown that it resembles a folding intermediate. An additional study discovered that the specific section of the En-HD called the helix-turn-helix motif (HTH) behaves as an independent folding domain. When examining the full protein, the HTH motif represents a folding intermediate in the En-HD folding pathway [2, Travagilini-Allocatelli et al.].

Although the folding of protein is still an enigma, scientists have taken the advantage of these protein information to design new materials, such as medicine, reagents and inhibitors, to benefit the society.

### Structure Prediction

Nowadays, researchers predict the structure of a protein by inputting the amino acid sequence into a computer. The advanced technology and modeling software allow scientists and researchers to form a predicted structure. However, the structure is not accurate, as there is always a small degree of errors present. Nevertheless, this can speed up discovery of new medications since the digital structure can be manipulated.

Secondary structure prediction

Secondary structure prediction is a set of techniques that aim to predict the secondary structures of proteins and RNA sequences based only on their primary structure which is amino acid or nucleotide sequence. For example, proteins, a prediction consists of assigning regions of the amino acid sequence as alpha helices, beta strands, or turns. The success of a prediction is determined by comparing it to the results of the DSSP (the DSSP algorithm is the standard method for assigning secondary structure to the amino acids of a protein, given the atomic-resolution coordinates of the protein) algorithm applied to the crystal structure of the protein; for nucleic acids, it may be determined from the hydrogen bonding pattern. Specialized algorithms have been developed for the detection of specific well defined patterns such as transmembrane helices and coiled coils in proteins, or microRNA structures in RNA.

Tertiary structure prediction

Experimental methods such as NMR spectroscopy or x-ray diffraction analysis are widely used in order to determine tertiary protein structures. But the rate at which protein structures can be determined by experimental techniques is much lower than the rate at which new genes are identified by the various genome projects.

Ab initio protein modelling methods have been used to build 3-D protein models. For example, based on physical principles rather than on previously solved structures. There are many possible procedures that either attempt to mimic protein folding or apply some stochastic method to search possible solutions (like, global optimization of a suitable energy function). These procedures require massive computational resources, and have thus only been carried out for tiny proteins. To predict protein structure for larger proteins will require better algorithms and larger computational resources like those afforded by either powerful supercomputers. Although these computational barriers are massive, the potential benefits of structural prediction make ab initio an active research topic.

Side-chain geometry prediction describes a computational approach that can make predictions for a series of coiled-coil dimers. This method comprises a dual strategy that augments extensive conformational sampling with molecular mechanics minimization.

Quaternary structure

In the case of complexes of two or more proteins, where the structures of the proteins are known or can be predicted with high accuracy, protein–protein docking methods can be used to predict the structure of the complex.

Annexin II

### Folding Speed and Mechanism

In 1968, Cyrus Levinthal pointed out that protein folding, with precision, happens in microseconds, which seems unrealistic and impossible. This is also known as the Levinthal's paradox. Nowadays, we have advanced methods such as mutational methods, which give us the value of phi and psi during folding, and hydrogen exchange methods, which allow us to see structural folding events. However, the dynamics and mechanism of protein folding still require additional research and understanding.

The dynamics and kinetics of unfolded polypeptide chain have been addressed by recent studies of loop formation by Keifhaber and coworkers. They used different model systems each representing different types of loops: end to end, end to interior, or interior to interior. Their experiments showed that end to interior and interior to interior loop formation formed slower than end to end loops. This discovery suggests that chain motion of one part of the unfolded polypeptide chain is coupled to other parts of the chain. These kinetics experiments also revealed that protein folding processes take place on different time scales and thus there is a hierarchy in loop formation[2, Travagilini-Allocatelli et al.].

Although additional research is necessary to understand mechanisms in protein folding, there are two different classical mechanisms that have been used to describe folding of single domain proteins. The first of the mechanisms is called the Diffusion-Collision Model. Proteins that follow this mechanism fold in a stepwise manner that involves growing secondary structure elements. These elements then collide, combine and strengthen. For example, there is evidence that the En-HD mentioned above follows the diffusion-collision model. The second mechanism is known as the Nucleation-Condensation Model. Proteins following this method have been seen to fold from an unstructured denatured state with simultaneous formation of secondary and tertiary structure. For example, a homologous protein of En-HD called hTRF1 has been shown to follow this model. However, there are many proteins that exhibit characteristic pathways of both diffusion-collision and nucleation-condensation models [2, Travagilini-Allocatelli et al.].

The starting point of protein folding: the denatured state

In the denatured state, the structure can trigger nucleation and propagation, which may carry through the folding pathway. Characterization of denatured states of proteins at physical conditions represents a hard task as needed to disfavor the population of native states without adding denaturants. Chemically denatured states may act like random-coil polymer at high denaturant concentrations. Sherman and Haran used single-molecule experiments to analyze the coil to globule transition of protein L and showed that the denatured state of the protein increases as the denaturant concentration increases. Also Eaton and co-workers compared the size and dynamics of the denatured states of those two proteins, displaying a similar length of 64 and 66 amino acids.

Mechanisms of protein folding

There were two different mechanism used to describe the folding of single-domain proteins. Some proteins such as barnase, has been described to fold in a stepwise manner with rapid formation of distinct nuclei and also with their collision and consolidation. There are also other proteins, with chymotrypsin inhibitor 2 as an example of the nucleation-condensation model. The folding pathway of the small alpha beta protein domain has been shown to be distinct from the pure nucleation-condensation and diffusion-collision, but still displaying the characteristics of both models.

Folding stability and function

The inherent stability of individual protein segment is a key factor in determining the folding mechanism of a given protein. Many times, cell’s life relies on the ability of its constituent proteins to fold into 3D structures that are crucial for their function. The amount of folded functional protein in a cell depends on several factors such as, rate of protein biosynthesis and degradation.

There was a question about whether the stability and folding of fully folded proteins can be related to their activity. Allostery can be the bridge where protein folding meets function. Allosteric effects involve communication between ligand binding sites which is critical to many physiological processes. As allostery is a thermodynamic process, it should not only be considered by changes in conformation but also by changes in the dynamics of the mean conformation.

Therefore more research is necessary to fully comprehend the mechanism of protein folding and find a solution to the protein folding problem.

## Reference

1. Ken A Dill, S Banu Ozkan, Thomas R Weikl, John D Chodera and Vincent A Voelz. The protein folding problem: when will it be solved?Current Opinion in Structural Biology 2007.
2. Carlo Travaglini-Allocatelli, Yiva Ivarsson, Per Jemth and Stefano Gianni. Folding and stability of globula proteins and implications for function Current Opinion in Structural Biology 2009, 19:3-7.
3. Mount DM (2004). Bioinformatics: Sequence and Genome Analysis. 2. Cold Spring Harbor Laboratory Press. ISBN 0879697121
4. Zhang Y (2008). "Progress and challenges in protein structure prediction". Curr Opin Struct Biol 18 (3): 342–8. doi:10.1016/j.sbi.2008.02.004. PMC 2680823. PMID 18436442

Although much work has been done on protein folding "in vitro", few research has significantly advanced the work contributing to "in vivo" protein folding. The importance of the latter comes as a consequence that protein folding is presumably guided by a molecular mechanism instead of a protein independently folding according to the lowest energy conformation. Although it has proven that proteins are highly successful at reaching their native state only by chaperone proteins, it seems that at the creation of a new protein, something must assist the development of the secondary and tertiary structure. The authors of a current opinion article in Structural Biology, Lisa D. Cabrita, Christopher M. Dobson, and John Christodoulou have published an update on the recent discoveries of how the nascent chains of a newly synthesized protein emerges in the article entitled, "Protein Folding on the Ribosome."

## Folding on Ribosome

The place where the protein chain begins to fold is a topic that is greatly studied. As the nascent chain goes through the “exit tunnel” of the ribosome and into the cellular environment, when does the chain begin to fold? The idea of cotranslational folding in the ribosomal tunnel will be discussed. The nascent chain of the protein is bound to the peptidyl transferase centre (PTC) at its C terminus and will emerge in a vectorial manner. The tunnel is very narrow and enforces a certain rigidity on the nascent chain, with the addition of each amino acid the conformational space of the protein increases. Co translational folding can be a big help in reducing the possible conformational space by helping the protein to acquire a significant level of native state while still in the ribosomal tunnel. The length of the protein can also give a good estimate of its three dimensional structure. Smaller chains tend to favor beta sheets while longer chains (like those reaching 119 out of 153 residues) tend to favor the alpha helix.

The ribosomal tunnel is more than 80 ampere in length and its width is around 10-20 ampere. Inside the tunnel are auxiliary molecules like the L23, L22, and L4 proteins that interact with the nascent chain help with the folding. The tunnel also has hydrophilic character and helps the nascent chain to travel through it without being hindered. Although rigid, the tunnel is not passive conduit but whether or not it has the ability to promote protein folding is unknown. A recent experiment involving cryoEM has shown that there are folding zones in the tunnel. At the exit port (some 80 ampere from the PTC), the nascent chain has assumed a preferred low order conformation. This enforces the suggestion that the chain can have degrees of folding at certain regions. Although some low order folding can occur, the adoption of the native state occurs outside the tunnel, but not necessarily when the nascent chain has been released. The bound nascent chain (RNC) adopts partially folded structure and in a crowded cellular environment, this can cause the chain to self-associate. This self-association, however, is relieved with the staggered ribosomes lined along the exit tunnel that maximizes the distances between the RNC.

The current understanding of protein folding has come from in vitro studies of renaturation of proteins through a variety of different environments as well as in silico computer simulations. These studies can only help to extrapolate fractions of the in vivo process of protein formation. Protein folding is initiated following the synthesis of the nascent polypeptide chain as it is synthesized by the ribosome. The start of protein folding is therefore coupled with the continuing synthesis of the polypeptide chain.

Currently, protein folding is view as a process that takes place as a consequence of interactions been the amino acid of that protein which can take certain paths to achieve a lowest energy state, the native state. However, there are certain paths a protein may start to fold by and lead to a conformation that is of low energy but not the native state. The protein has not way of coming of this conformation without a significant amount of energy input. This non-native state is a way a protein can be misfolded and lead to aggregation. Another factor that can influence the likelihood of obtaining the native state is the fact that larger proteins have more possibilities of folding, this decreases the likelihood of forming the most energetically favorable state. Proteins us the "co-translational folding' to reduce the extent of conformational space available to the protein. Adding to this, molecular chaperones help to further assist proteins in achieving their native conformational state.

### Generation of RNC for studies

One technique of generating RNC and taking snapshots as it emerges from the tunnel is to arrest translation. A truncated DNA without a termination sequence is used. This allows for the nascent chain to remain bound until desired. To determining the residues of the chain, they can be labeled by carbon-13 or nitrogen-15 and later detected by NMR spectroscopy. Another technique is the PURE method and it contains the minimal components required for translation. This method has been used to study the interaction of the chains and auxiliary molecules like the TF chaperone. This method is coupled with quartz-crystal microbalance technique to analyze the synthesis by mass. An in vivo technique in generating RNC chain can be done by stimulating it in a high cell density. This is initially done in an unlabeled environment, the cells are then transferred to a labeled medium. The RNC is generated by SecM. The RNC is purified by affinity chromatography and detected by SDS-PAGE or immunoblotting.

By generating the RNCs, many experiments can be done to study more about the emerging nascent chain. As mentioned above, the chain emerges from the exit tunnel in a vectorial manner. This enables the chain to sample the native folding and increases the probability of folding to the native state. Along with this vectorial folding, chaperones also help in favorable folding rates and correct folding.

## Ribosome Structure and Co-translational Protein Folding

In E. coli the 70S ribosomal particle is composed of 50 proteins and three RNA molecules. The most interesting structural feature in the 70S ribosomal particle in regards to protein folding is the ribosomal exit tunnel. This is a channel that links PTC(peptidyl transferase centre) with the cellular environment. The dimensions include a length of 80 angstoms, width between 10-20 angstroms. 70S is lined with a large RNA molecule and L4 and L22 ribosomal proteins. Also L23 serves as a docking point for other molecules to assist in the folding process. L4 and L22 proteins in the ribosomal exit tunnel have been shown by recent cryoEM studies that they can interfere with proteins synthesis along with other interactions with the nascent chain. In addition, arginine residues have been observes to stop the translation process by changing electrostatic potentials. Although ribosomal exit tunnel is presumably to have a more or less rigid structure, it seems that it does partake to a degree support nascent chain folding. This is evidence by the fact that on average the tunnel is able to accommodate about 30-40 residues, which is considerably more than a polypeptide chain sequence that is fully extended. The degree to which a nascent chain folds seems to vary depending on the kind of protein being synthesized. Certain nascent chains transmembrane protein sequences appear to possibly already construct an alpha-helical structure inside the tunnel. Studying nascent chains emerging from the ribosomal exit tunnel has proven to be a significant challenge for any of the current methods of structural and cellular biology. One idea presented in this paper is to take be able to have "snapshot" of the elongation process. In order to due this, translation must be arrested artificially which would involve engineering DNA strands that lacks a stop codon. Another issue is also in focusing on the particular residues of interest on the nascent chain within the sea of other residues form the ribosome.

## Understanding Co-translational Folding by Biochemical and Biophysical Studies

Once examples illuminated in the article is using SDS-Page on the risbosomal bound nascent chains(RNCs) of influenza haemagglutinin which showed they can form disulfide bonds and undergo glycosylation. Also, using monoclonal antibodies, it has been discovered that there is variability in the emergence of the nascent chain from the tunnel. These examples among others demonstrate that not only can nascent chains acquire structure but also activity while still being attached to the ribosome. The speed of folding for nascent chains seems to be related to the number of stop and rare codons present. The reasoning is that a discontinuous translation rate will slow down the folding process. However, slower rates seem to produce more efficient folding since the nascent chain has more time to develop its native structure. Most of the biochemical and physical methods illuminating the understanding of co-translational folding has been eluded by x-ray crystallography because of the dynamic nature of the folding process which in crystallography is very difficult to obtain.

## Auxiliary Factors in Co-translational folding

As the nascent chain starts emerging from the tunnel, it has to opportunity to interact with molecules that will assist int eh folding process. These include molecular chaperones, peptide deformylase, and the signal recognition particle. The first molecule in assisting the nascent chain in folding is the 48kDa TF which docks on L23. This protein in the absence of a nascent chain will dock on and off however iwth the presence of the nascent chain its affinity to bind to L23 increases. TF undergoes a conformational change in a where a protective cavity is formed for the nascent chain. TF enables enough of the polypeptide chain to emerge such that a significant degree of folding can be achieved. It does this by binding to hydrophobic segments of the chain even after is has released from L23. Once hydrophobic regions of chain are no longer exposed, TF seem to unbind and allow further helper molecules to assist in protein folding. TF seems to increase folding efficiency but at the expense of being slower to fold. Protein translocation is then done by SRT which shuttles the TF to a heterotrimeric integral membrane protein. This then allow further processing and folding.

## Ribosome subunit in prokaryote cells and eukaryote cells

The ribosomes catalyze peptide bond formation, in a process called peptidyl transfer catalysis, and synthesize polypeptides by reading the genetic code of the mRNA. The ribosome is composed of a large and a small subunit both in prokaryote and eukaryote cells. Prokaryotes have 70S ribosomes, each consisting of a small (30S) and a large (50S) subunit. Eukaryotes have 80S ribosomes, each consisting of a small (40S) and large (60S) subunit. Due to the differences in their structures, the bacterial 70S ribosomes are vulnerable to these antibiotics while the eukaryotic 80S ribosomes are not. Within the cellular structure, mitochondria have ribosomes similar to the bacterial ones; however, mitochondria within eukaryote cells are not affected by these antibiotics because they are surrounded by membrane around its organelle. The initiation of the translation process in bacteria was found to locate on 30s subunit. This process requires the increase of both the incubation temperature and ionic strength in order to assemble into the correct tertiary structure contained with its amino acid sequence. The research experiments done by Dr. Masayasu’s research on the synthesis of ribosomes and ribosomal components in E-coli, also found that the correct assembly of the ribosomal particles is locating in the structures of their own molecular component and not by other nonribosomal factors.

A ribosome is the essential contributing factor in protein synthesis where it is assembled on the translation initiation region (TIR) of the mRNA during the initiation phase of translation. The mRNA is decoded as it slides through the large ribosomal subunit and places the a polypeptide chain in the other subunit of the ribosome. Newly synthesized protein will then dissociate once the stop codon is reached in the ribosome. In the final ribosome recycling phase, the ribosomal subunits dissociate and the mRNA is released. The main events of the translation process are relatively similar in both prokaryotic and eukaryotic cells. Major differences in the detailed mechanism of each phase exist. Bacterial translation involves relatively few factors, in contrast to the more complex process in eukaryotes.

## Peptidyl Transfer Catalysis By Ribosome

During protein elongation, the ribosome PTC acts as a catalyst to cleave the

Structural Biochemistry/Proteins/Protein Folding

## Reference

Ki Yun Leung, Edward, et al. (2011). [8] The Mechanism of Peptidyl Transfer Catalysis by the Ribosome, 80(1):527-555.

The basic process of forming membrane proteins into complexes

## Assembly of bacterial inner membrane proteins

Many membrane proteins form multiple sub unit protein complexes. They possess integral and peripheral subunits. Enzymes known as Sec translocase and YidC insertase insert bacterial membrane proteins into the inner membrane. This process is assisted by YidC and the phospholipid phosphatidylethanolamine. Glycine zippers and other motifs also help transmembrane-transmembrane helix interactions that can form alpha helical bundles of membrane proteins. When membrane insertion occurs or when after membrane insertion occurs, the subunits of oligomeric membrane proteins have to be able to locate each other to construct the homo-oligiomeric and the hetero-oligomeric membrane complexes. Even though chaperones can serve as assembly factors to construct the oligomer, numerous protein oligomers seem to fold and oligomerize spontaneously. It has been shown by experiments that many of the subunits of hetero-olgiomers are structured after a sequential and patterned pathway to create the membrane protein complex. If it so happens that the inserted protein folds improperly or the membrane protein is assembled incorrectly, quality control mechanisms can deactivate the proteins.

Membrane Proteins

## Overview

Membrane protein can do a large variety of functions inside the cell from metabolite exchange to cell signaling and nerve conduction. They can also function as ATPases, electron carriers, ion channels, and transporters, sheddases, and photosynthetic reaction centers. They are abundant in both the eukaryotic and prokaryotic cell and they comprise about 20 percent to 30 percent of the total amount of proteins.

Many of the integral inner membrane proteins are alpha helical bundles with alpha helical membrane spanning areas. Advanced research has shown that the structures of the membrane proteins possess not only membrane spanning helices that are straight, but also possess very curved helices that span the membrane partially through. Alpha helical membrane proteins can exist as monomer or as multimeric complexes.

In order to guarantee that membrane proteins behave and function properly, they must be instructed to their destined membrane in the cell and then inserted and folded to the appropriate structure. Membrane tageting in the eukaryotic cells is necessary and more complicated than in eubacteria. Eukaryotic cells must instruct at least 10 membranes while eubacteria must only instruct 1 or 2 membranes in the gram-positive and gram-negative bacteria, respectively. After targeting, membrane protein integration and topogensis are instructed by a coordinated process of topogenic sequences and translocases. While this process is occurring, the transmembrane segments and extramembranous loops are folded.

The process of bacterial inner membrane protein assembling into the membrane is very complex. In addition, the mechanisms that control the protein targeting and inserstion into the membrane, folding of the alpha helical bundles, and the assembly into oligomeric membrane protein complexes will be explored more in depth.

## Recognition and Targeting

The targeting of nascent chains to the membrane happens initially during the protein synthesis. It happens very early, even before the appearance of the polypeptide from the ribosomes channel. These nascent chains can already send signals in the ribosomes, which is a requirement of the signal recognition particle. A signal recognition particle is made up of a protein component Fth and a 4.5S RNA. The SRP combines with a hydrophobic part of a membrane protein as comes out from the ribosome at the membrane surface. The SRP-interacting area is most commonly the first TM region, but it can also be further apart and distinct from the TM segments. By studying the structure, it has been shown that a groove in the SRP M domain binds to the apolar segment.

When the receptor FTsy of the SRP- ribosome nascent chain complex is targeted by this complex, a SRP/FTsy complex is formed. The deconstruction of the complex and the freeing of the targeted protein needs GTP hydrolysis. The SRP and the FSty start out GTP bound and afterwards they construct into a complex by the interaction of their NG domains. A common trait between Ffh and FTsy is that they both have two homologous doamins and a distinct domain. By analyzing the structure if the Ffh and the FtsY NG doman complex, an interesting thing was found that there is a shared composite active-site area in the Ffh/FtsY hetereodimer, which is combined with two bound nucleotides. After the process of GTP hydrolysis, the membrane protein-nascent chain complex is sent to the SecYEG translocation channel, and the SRP and FtsY break apart from each other, which enables the SRP to recycl and interact in another round of SRP targeting. This sending of the nascent chain to the translocation channel is assisted by the interaction of the FtsY with SecY.

## Insertion of the membrane proteins

It is necessary for the enzymes Translocases and intertases to put the freshly synthesized proteins into membranes. In bacteria, the SecYEG translocase and the YidC insertase have been depicted and analyzed. It reveals that they both display their translocation and insertion function in reconstituted systems. In addition, they are necessary processes for the bacterial life.

## Sec Translocase Complex

The enzyme Sec translocase catalyzes the bacterial membrane protein insertion. The Sec translocase is made up of the membrane-embedded SeYEG and SecDFyajC complexes, in addition to the peripheral membrane component SecA. SecYEG supplies the protein-conducting channel. This is necessary for translocation and to make membrane protein insertion more efficient. Sec, which also known as the motor ATPase, is crucial for the translocation of preproteins through the membrane and for the translocation of particular hydrophilic areas of the membrane proteins. SecA utilizes ATP hydrolysis to propel the inserting polypeptide chain thorugh the Sec channel 20 to 30 residues simultaneously.

A major important discovery in the protein export area of studies was that the structure of the SecY complex was determine from an enzyme called Methanoccoccus jannaschii. This enzyme is made up of SecYEBeta. SecBeta does not have sequence homology to the eubacterial SecG but it does have sequence homology to the eukaryotic Sec61Beta. The SecY channel contains an hourglass structure with hydrophoibic narrow parts that is about 3 to 5 A in size which is found in the center of the channel. The narrow constriction wihtin the SecYEbeta splits the interior hydrophilic cavities on the periplasmic and cytoplasmic areas of the membrane. This narrow area is made up of a hydrophobic pore ring, which consists of 4 isoleucine residues, one valine, and one leucine residue. In addition, the aliphatic side chains of these amino acids are directed toward each other, which creates a hydrophobic collar through which the hydrophilic region of the polypeptide chain would be transport during translocation across the membrane.

Based on the crystal structure, the SecY channel is in sealed off state with the pore ring closed off by a helix on the luminal side. When the Sec channel opens up through signal peptide binding to the SecY TM2-TM7 region, the plug is relocated out of the channel site about 20 A away near the SecE helix.

Another important aspect of the SecY channel is the lateral gate. This is made to let the Tm regions of the inserting membrane proteins to be freed from the channel laterally and to split it into the lipid phase. The lateral gate is at the surface of SecY TM2 and TM7 of the Sec61alpha (SecY) which is found at the front side of the Sec channel. Before, TM2 and TM7 of the Sec61Alpha was thought to form the signal peptide-binding region because a signal peptide of preprotein can potentially be cross linked to these Tm parts during posttranslational translocation. When translocation of a polypeptide chain occurs, the lateral gate is opened up. The opening of this lateral gate is significant because locking the lateral gate by disulfide cross linking does not allow SecA-mediated preprotein translocation in Escherichia coli.

It is important to understand how the SecA operates with the SecY channel to translocate hydrophilic domains of membrane proteins across the membrane. The 4.5 A structure of the SecA/SecYEG from Thermotoga martima helps explain this process. First one copy of the SecA is attached to one copy of the SecY channel in the structure. The SecA is placed flat on the SecY channel about parallel to the membrane surface. It is important to note that the opening of the SecYEG channel has a two helix finger domain of SecA that can serve to transport substrates into the channel.

## YidC Insertase

The YidC insertase is important because its job is to fit tiny proteins into the membrane. It was discovered that YidC influences membrane protein insertion. When the amounts of YidC is lessened in the cell, the insertion of Sec-independent proteins were slowed and discouraged. Before it was thought to be fit into the membrane spontaneously.

Through experiments it was thought that YidC affects the process of insertion of Sec-independent substrates. Photocross-linking studies that utilize a cell-free system displayed that membrane proteins that were stuck at different points of membrane protein insertion interact with YidC. Lipid vesicles that have YidC are enough to put the Sec-independent Pf3 coat protein and the ATP synthase subunit c. It was found that the Pf3 coat proteins sticks to the YiDC. This leads a significant conformational structure difference in the YidC protein.

## Assembly of Multispanning Membrane Proteins

Many of important membrane proteins span the lipid bilayer often. They span it in such a way that the sequential TM segments are in an alternating N to C and C to N orientation of the alpha helices. The TM segments are put together by cytoplasmic and periplasmic loops. These loops are primary hydrophobic and have differences in how big or small it as including the charge. Small loops put the tow helices together. On the other hand, the big and longer transform into different domains by folding. This plays a role in how the protein behaves and functions.

[1]

## References

1. Ross E. Dalbey and Peng Wang and Andreas Kuhn(2011).[4]. "PubMed", p. 3-6.

Enzymes go through several mechanisms in order for it to survive and thrive in the biological world. The fact that proteins can fold amongst itself in their functional states after the process of synthesis is one of the most fascinating mechanisms ever studied by researchers.

## Basis of Protein Folding

In a living cell, protein folding occurs in a highly complex environment and uses different utility proteins for function. Some proteins' sole function is to protect the incomplete folding process from malfunctioning or the polypeptide chain from interactions other than folding. It is especially protective against factors that could lead to aggregation, folding catalysis or others that can slow down the process of protein folding in relation to isomerization or the forming of disulphide bonds. There are exceptions to the process of folding where auxiliary proteins are not needed to protect the sequence. Evidence shows that the code for protein folding is contained within the protein sequence. This is because studies have been shown where proteins undergo in vitro processes and can still function the same way as a protein supported by auxiliary proteins, as long as the in vitro occurs within conditional environments.

## Protein Folding Mechanisms

There have been a mass amount of studies performed on the mechanism of protein folding recently. Many researchers have also been receiving plenty of successful feedback on these conducted experiments. Many different types of applications, such as experimental and theoretical, have provided the basis for the main reason of studying protein folding in the first place.

One of the strongest cases of protein folding into new enzymes is known as the "stochastic process". The stochastic process is a random process that calculates different possibilities of pathways and conclusions to the final result of the experiment. The stochastic process is opposite to the deterministic process, which is having one initial possible result occur after an experiment is conducted. The stochastic process may initially start off with one possible result, but might end up with several different, plausible results, some more probable than others, after the experiment is completed.

Biased parties, nonetheless, believe that the original interactions between proteins are still more reliable and stable than newly-tested interactions and techniques. Studies have shown that the sequences of proteins can still be found in pristine condition even if the sequences live in very complex environments within a cell. However, when a protein folds on itself incorrectly or does not maintain to stay folded in the living cells, diseases of different types can occur.

An example of a possible group of diseases is called amyloidosis. Some common diseases that are derived from amyloidosis are Alzheimer's Disease and spongiform encephalophaties. These diseases occur when the protein is aggregated from failure of folding. An interesting fact about amyloidoses is that the formation of the aggregates show similarities to the property of polypeptides and not just a feature of proteins that suffer from poor or inadequate protein folding. It is not normal to find such amyloid aggregates in biological evolution, which begs the question if there are a variety of mechanisms that have been tampered with over time. In order to prevent such diseases from developing and to stop such mechanisms from mutating into insufficient mechanisms, the study of the folding of proteins is crucial to understanding the structure of a protein as well as the function to all living cells.

## Issues and Possible Results of New Protein Folding Mechanisms

Although groundbreaking discoveries have been mass produced in the protein folding community, several issues arise. Tampering with the folding of a protein can alter the initial theory as to why humans should manipulate a natural occurring mechanism. Because of the high volume of magnitude and conformational changes done on a protein sequence, it is more likely that the experiment could lead to the stochastic process in producing several pathways and results. Also, due to a strong presence of heterogeneity at the end of the folding process, the changing of the protein folding sequence can alter desired results. According to Christopher Dobson, a researcher at Oxford Centre for Molecular Sciences in the University of Oxford, "there are two main approaches to try and overcome this issue".

The first approach lies with the use of biophysical techniques that can monitor the properties of the amino acid sequence as the folding takes place. Because the process of folding occurs in a rapid fashion, several outlets of methods are needed to map out the individual properties of the sequence. For example, an ultraviolet circular dichroism can be used to monitor the secondary structure of evolution and fluoresence microscopy can monitor the progress of the tertiary structure.

The second approach is to use protein engineering to study the mechanism of protein folding. Protein engineering is a particularly good method of studying the folding process because it can also map out the transition states of the protein sequence. Examination of the folding and unfolding parts of the mechanism takes place upon mutation of the individual amino acids in the sequence. By studying the intermediate steps of the folding process, the mechanism shows that there is a formation of native-like proteins surrounding a number of important amino acids. This provides evidence that for another mechanism called "nucleation-condensation", where the majority part of the protein sequence rapidly forms once the nucleus of the entire process has been found.

## Reference

Dobson, Christopher M. Biochem. Soc. Symp. (2001) 68, (1–26) (Printed in Great Britain). http://symposia.biochemistry.org/bssymp/068/bss0680001.htm. Last accessed: 1 Dec. 2011.

## Introduction

A Fibrous protein is a protein with an elongated shape. Fibrous proteins provide structural support for cells and tissues. There are special types of helices present in two fibrous proteins α-keratin and collagen. These proteins form long fibers that serve a structural role in the human body. Fibrous proteins are distinguished from globular proteins by their filamentous, elongated form. Also, fibrous proteins have low solubility in water compared with high solubility in water of globular proteins. Most of them play structural roles in animal cells and tissues, holding things together. Fibrous proteins have amino acid sequences that favour a particular kind of secondary structure which, in turn, confer particular mechanical properties on the proteins.

## Examples

Collagen is a triple helix formed by three extended proteins that wrap around one another. Many rodlike collagen molecules are cross-linked together in the extracellular space to form collagen fibrils that have the tensile strength of steel. The striping on the collagen fibril is caused by regular repeating arrangement of the collagen molecules within the fibril.

Elastin polypeptide chains are cross-linked together to form rubberlike, elastic fibers. Each elastin molecule uncoils into a more extended conformation when the fiber is stretched and will recoil spontaneously as soon as the stretching force is relaxed.

alpha helix beta pleated sheet triple helix
Hydrogen bonding Peptide -C=O----HN-, Intrachain between, and n+4 residues Parallel to helix axis Peptide -C=O-----HN- , Interchain, Perpendicular to chain axis Peptide, -C-----HN- and -C=O-----HO- (hydroxyl from side chain of Hyp), Interchain
Residues Many types, Small or uncharged residues, such as Ala, Leu, and Phe, most common; Pro never found Mostly Gly, Ala, and Ser Many types, Gly every third residue; Pro and Hyp common
Chain direction and aggregation Four parallel right-handed alpha helices form a left-handed supercoil. Antiparallel chains Three parallel left-handed helices form a right-handed supercoil.

Unfolded Protein Response (UPR) is a response to cellular stress that is related to the endoplasmic reticulum (ER) in mammalian species, but has also been found in yeast and worms.

When ER conditions are disrupted (such as alterations of redox state, calcium levels, failure to posttranslationally modify secretory proteins, etc.) or the chaperone proteins that assist protein folding is overcapacity (both are considered ER stress), the cell launches signals that try to deal with these changes and make a favorable folding environment. When the UPR is not sufficient to deal with this stress, apoptotic cell death happens.

## Introduction

The ER lumen's environment is made so that it favors the production of secretory and membrane proteins and a good amount of these proteins are rapidly degraded which is probably due to improper protein folding. This would pose a problem for the cell due to a possibility of misfolded protein buildup. This would be even more of a problem if the changes in this environment would occur. These changes will deter the overall ability to make properly folded proteins and more improper proteins will build.

UPR monitors and responds to changes in the ER protein folding environment. It monitors the protein-folding capacity of the ER and sends signals of cell responses to help maintain the folding capacity to prevent a buildup of unwanted protein products. For mammals, this response is the transient inhibition of protein synthesis to hinder the production of new proteins, followed by transcriptional induction of chaperone genes to initiate protein folding and induction of the activation of the ER-assoiciated degradation system. If this process fails, then the UPR tells the cell to go to a destructive pathway. The UPR has three main signaling systems: (IRE1), PERK, and ATF6.

## UPR Signaling

### IRE1 Pathway

IRE1 is a type I transmembraned protein that contained serine/threonine kinase activity as a stress sensor. Once activated, the enodribonuclease activity in the carboxyl terminus of IRE1 catalyzes splicing of the HAC1 (which is responsible for inducing the expression of ER stress response genes) mRNA.

In yeast organisms, the IRE1 contains nuclear localization sequences in the carboxyl terminus, which can interact with components of nuclear pore complex and target IRE1 to the inner nuclear membrane. The result is that the COOH-terminal domain is now facing the inside of the nucleus and can now have access to nuclear mRNA. HAC1 then moves into the nucleus and binds to a promotor element to induce the expression of genes required for various reactions.

In mammals, the IRE1 pathway is like that of yeast, except that two IRE1 genes have been cloned. Alpha and Beta -IRE1. It does not contain nuclear localization sequences like in yeast IRE1. IRE1 has also shown to mediate cleavage of additional mRNAs targeted to the endoplasmic reticulum as well as cleavage of the 28S ribosomal subunit. This leads to the beliefe that IRE1 has a role in translation attenuation by degrading these mRNA transcripts and/or the ribosomal subunits.

### PERK Pathway

When undergoing ER stress, the first response is transient global translation attenuation and this is mediated by PERK. PERK is a type I ER-resident transmembrane protein that detects stress though its lumenal domain. It also binds to chaperone protein Grp78, but when unfolded proteins start to build up during ER stress, this protein Grp78 starts to dissociate and PERK then autophosphorylates and dimerize. Once activated, PERK phosphorylates serine-51 of eukaryotic initiation factor 2α (eIF2α). eIF2α is unable start translation when phosphorylated, and this leads to inhibition of global protein synthesis. In reverse, phosphorylated eIF2α initiates translation of ATF4 mRNA. ATF4 upregulates ER stress genes. Translational recovery is mediated by the stress-induced phophatase growth arrest and DNA damage-inducible gene.

### ATF6 Pathway

ATF6 exist in to isoforms (alpha and beta ATF6) . These have fairly balanced tissue distributions. ATF6 pathway activation involves a mechanism called regulated intramembrane proteolysis (RIP). In RIP, the protein translocates from the ER to the Golgi for proteolytic processing. The stress-sensing mechanism of ATF6 dissociates the Grp78 from its lumenal domain (This is similar to the processes of IRE1 and PERK pathways). Frp78 signals to two Golgi localization signals to allow ATF6 to enter the COPII vesicles to translocate the Golgi compartment. Disulfide bonds in ATF6 lumenal domain are also believed to keep ATF6 inactive. During ER stress disulfide bonds are reduced and an increase ability of ATF6 to exit arises.

## Apoptosis

The three UPR pathways do not only contribute to fixing of improperly folded proteins, it also as can contribute to a cell's apoptosis if the UPR fails to restore folding capacity.

## References

1. Physiology Online [9]
2. Nature [10]

## Overview

Technology advances in sequencing and microarrays allow for us to better understand pre-mRNA splicing patterns in different cells. For example, cellular splicing changes when it is stimulated by factors such as DNA damage, neuron depolarization and or metabolic changes in cells. In the last few years, there have been more studies regarding patterns in mechanisms that relate cellular stimuli to downstream alternative splicing control. Some of these splicing events include degradation of splicing factors, altered nuclear translocation, and regulated synthesis of splicing factors.

## What is alternative splicing and how does it work?

Splicing overview

Alternative splicing is a process that occurs during gene expression and allows for the production of multiple proteins (protein isoforms) from a single gene coding. Alternative splicing can occur due to the different ways in which an exon can be excluded or included from the messenger RNA. It can also occur if portions on an exon are exclude/included or if there is an inclusion of introns. For example, if a pre-mRNA has four exons (A, B, C, and D) these fours exons can be spliced and translated in a number of different combinations. Exons A, B, and C can be translated together or Exons A, C, and D can be translated. This is what results in alternative splicing.

The pattern of splicing and production of alternatively spliced messenger RNA is controlled by the binding of regulatory proteins (trans-acting proteins that contain the genes) to cis-acting sites that are found on the pre RNA. Some of these regulatory proteins include splicing activators (proteins that promote certain splicing sites) and splicing repressors (proteins that reduce the use of certain sites). Some common splicing repressors include: heterogeneous nuclear ribonucleoprotein (hnRNP) and polypyrimidine tract binding protein (PTB). Proteins that are translated from alternatively spliced messenger RNAs differ in the sequence of their amino acids and this results in altered function of the protein. This is the reason why the human genome can encode a wide diversity of proteins. Alternative splicing is a common process that occurs in eukaryotes; most of the multi-exonic genes in humans are spliced alternatively. Unfortunately, abnormal variations in splicing are also the reason why there are many genetic diseases and disorders.

A complex

### Spliceosome

The splicing of messenger RNA is accomplished and catalyzed by a macro-molecule complex known as the spliceosome. The areas for ligation and cleavage are determined by the many sub-units of the spliceosome. These sub-units include the branch site (A) and the 5' and 3' splice sites. Interactions between these sub-units and the small nuclear ribonucleoproteins (snRNP) found in the spliceosome create a spliceosome A complex which helps determine which introns to leave out and which exons to keep and bind together. Once the introns are cleaved and removed, the exons are joined together by a phosphodiester bond.

### Regulatory Proteins

As noted above, splicing is regulated by repressor proteins and activator proteins, which are are also known as trans-acting proteins. Equally as important are the silencers and enhancers that are found on the messngerRNAs, also known as cis-acting sites. These regulatory functions work together in order to create splicing code that determines alternative splicing. The cis-acting sites will be discussed here.

Splicing repression

Splicing silencers are regulatory sites that are found in pre-messengerRNA's and are where the splicing repressor proteins bind to. When the repressor binds to the silencer site, it reduces the chance that a site close-by will be chosen as a splicing junction. These silencer sites can be found on introns or on exons. When found on introns, these sites are known as intronic splicing silencers and on exons they are called exonic splicing silencers. The sequences found on these sites are numerous and that allows for different kinds of proteins to bind.

Splicing activation

On the other hand, splicing enhancers are regulatory sites where splicing activator proteins can bind to. When the activator protein binds to the enhancer site, it increases the chance that a site close-by will be chosen as a splicing junction. Just like the splicing silencers, these sites can also be found in introns and exons. In introns they are called intronic splicing enhancers and in exons they are called exonic splicing enhancers. However, unlike their silencer counterparts, enhancer sites usually allow the binding of activator proteins that belong to the family of SR proteins. These proteins are rich in arginine and serine.

How is alternative splicing regulated by some specific signals? Alternative splicing has been recently revealed to occur in nearly all human genes. Most typically, a specific exon may be either included or excluded in different cell types or growth conditions when alternative splicing occurs. In each case, the pattern of splicing, the binding of regulatory proteins to cis-acting auxiliary sequences generally determines the pattern of splicing and these sequences in turn control where the binding occurs and/or how the enzymatic complex reacts at neighboring splice sites. (Combinatorial Regulation of Alternative Splicing) Importantly, the open reading frame of the resultant mRNA or the presence of cis-regulatory elements that control mRNA stability or translation can be altered by any of these above differential patterns. Therefore, shaping the proteome of any given cell requires the precise control of alternative splicing , and how the cellular function responses to changing environmental conditions can also be significantly altered by changes in splicing patterns.

Representation of intron and exons within a simple gene containing a single intron.

Combinatorial Regulation of Alternative Splicing The spliceosome is a macromolecular complex that catalyzes the removal of introns and the basic joining of extrons. The binding of various subunits of the spliceosome in order to sequence elements at the intron and extron boundaries in a pre-mRNA determines the precise sites of ligation and cleavage. Those subunits are the 5 splice site, the branch point sequence, a pyrimidine-rich track, and the 3 splice site. However, for mammals, the splice sites are poorly conserved; hence, they are typically not sufficient to bind the spliceosome with high affinity. The efficiency of spliceosomal binding via mechanisms can be impacted by proteins bound to non-splice site sequences within the exon or intron. Exonic or intronic splicing enhancers are the sequences that help promote spliceosomal recognition of an exon, while the splicing silencers are needed to inhibit recognition of the exon. Exon inclusion (green ovals) is promoted by the binding of the enhancers of members of the ubiquitously expressed SRSF protein family, while the exon usage is repressed by members of the hnRNP family of proteins via silencer elements (red ovals). FOX, CELF, neuro-oncological ventral antigen (NOVA) and muscleblind-like (MBNL) proteins are some other splicing regulators that are more tissue restricted and these regulators function equally as enhancers and repressors of splicing through mechanisms that are still largely undefined. Therefore, the ratio of mRNA isoform expression can frequently be altered by the binding of single regulatory proteins or the subtle changes in the balance of expression.

Post-Translational Modification of Splicing Proteins

Phosphorylation, acetylation, methylation, sumolylation and hydroxylation are involved in the modification of splicing regulatory proteins in many cases. The phosphorylation of the extensive Arg-Ser dipeptides found within SR proteins is the best characterized modification. The extensive post-translational modifications also includes the HnRNP proteins, along with other non-SR splicing factors.

## Alternative Splicing and its Signals

An example of regulated degradation of a RNA-binding protein modulating alternative splicing.

Recently, technical tools such as deep sequencing and sensitive microarrays have opened up for more knowledge of alternative splicing events. Almost all human genes go through some sort of alternative splicing, which includes differential exclusion or inclusion of a specific exon, exclusion of a part of an exon, and inclusion of introns and exons. These differential trends can change the reading frame of the processed mRNA or alter any cis-regulatory factors that monitor mRNA translation or stability. For that reason, the regulation of alternative splicing is crucial in shaping the proteome of cells; alterations in splicing patterns can change functions in cells in response to environmental changes. Observations in heart tissue in its development stage, pre and post depolarization of neurons and cells before and after apoptosis have showed that alternative splicing events play a large role in the functional outcome of the signaling and developmental processes.

Since alternative splicing is generally determined by binding regulatory proteins to auxiliary sequences that control the location of binding and activity of the enzymatic complex at neighboring sites of splicing, it is used in response to DNA damage and T cell activation. One case for DNA damage includes the alternative splicing of the E3 ubiquitin ligase murine double minute-2 (MDM2). MDM2 specifically controls levels of p53, a tumor-suppressing gene, by targeting it for proteasomal degradation. Once DNA damage is perceived, Mdm2 exons are skipped to reduce the functioning of MDM2, thus allowing p53 to accumulate. This induced regulation of MDM2 provides an example of how splicing that is coupled with transcription as the exon skipping mimics the damaged DNA. In this case, cells show a "tight control of alternative splicing that helps regulate protein expression due to changing conditions in the cell."[3]

Altering the interactions of proteins is another method in which alternative splicing can be achieved. One demonstration of this is T cell activation. In T cell activation, alternative splicing is used similarly in DNA damage where the altered protein interaction with other proteins regulate the splicing of, specifically, the CD45 gene during T cell activation. In resting T cells, PSF, a RNA binding protein, is phosphorylated by the enzyme GSK3 and this causes the phosphorylated PSF to form a complex with TRAP150. As a result, the PSF cannot bind to the CD45 RNA. This prevents any possible exon exclusion and results in no participation in splicing. However, in an activated T cell, there is little to none GSK3 due to an inhibiting phosphorylation because an antigen binds to the T cell receptor and causes GSK3 activity to drop. Without the GSK3, PSF is not bound to the TRAP150 and is free to bind to the RNA. This is a major example of how splicing is controlled by signal-induced changes in transcription.

## RNA-binding Proteins Regulate Splicing

Altering the level of expression of a regulatory protein is the most simple way that can affect alternative splicing. A small change in the expression of one splicing factor can change the elements that determine exon exclusion or inclusion, due to the complex influences on a given transcript. The control of transcriptional activators such as nuclear factor-kappa B and nuclear factor of T-cells have been proven to be altered by signaling pathways. Therefore, signaling induce transcription of genes encoding SR proteins or other splicing regulators that can change the splicing of genes that respond to these factors. In one instance, it is proposed that stimulation of T cells trigger the splicing signal of the gene that encodes tyrosine phosphatase CD45. Furthermore, the proteins PTB-associated splicing factor and hnRNP L-like activate the elimination of CD45 exons 4 and 6. Interestingly enough, inducible changes in protein expression do not only result from transcription. As shown in the splicing regulatory protein CELF1, its increased protein levels is due to an increase in the stability and phosphorylation of CELF1, which then leads to the overall up-regulated steady-state levels. This increase in phosphorylation is also responsible for the protein kinase C activity in DM cells. Not surprisingly, the increase in protein stability also has other regulations; it is also controlled by miRNAs during heart development. The two coupled- mechanisms highlight the idea that regulating regulatory protein expression is important to keeping a proper splicing pattern required for functions in cells. [3]

### Localization of RNA-binding proteins

In addition to the method of protein expression and stability mentioned above, alternative splicing can occur when signals are changed due to the localization of regulatory proteins. Many of the regulatory proteins, such as SR proteins and hnRNP mentioned above, have to travel to and from the nucleus and cytoplasm. As a result, the relative distribution of these regulatory proteins in the nucleus versus the cytoplasm can alter signaling pathways. These altered pathways will lead to splicing differences. Two regulatory proteins that have their distributions regulated include SRPK1 and hnRNP proteins (hnRNP A1 specifically). In the case of SRPK1, this regulatory protein is normally found in the cytoplasm due to interactions with heat shock proteins. However, when the cell undergoes osmotic shock the SRPK1 proteins move to the nucleus and cause phosphorylation of SR proteins. This phosphorylation results in different interactions between the proteins and their target genes and produces varying splicing patterns. In the case of hnRNP, osmotic shock actually has an opposite effect on the localization of this protein in relation to SRPK1. hnRNP is also normally found in the cytoplasm but as opposed to SRPK1, osmotic shock does not cause it to move to the nucleus. In fact, phosphorylation of hnRNP prevents it from entering the nucleus.

## Feedback Loops in Alternative Splicing

An example of a feedback loop in alternative splicing.

As all living things go through homeostasis, cells do the same. In order for cells to practice homeostasis, they must therefore turn off induced splicing signals once conditions are normal again. For example, these regulations can include getting rid of antigen, DNA repair and neurons repolarization. One way to reset gene expression is to deactivate signals by removing the initial receptors or signaling factors themselves. Of course, receptors such as phosphatases and kinases undergo autoinhibitory signal-induced alternative splicing. For instance, in response to T cell activation, alternative splicing of CD45 will reduce the sensitivity of the cell to receive antigen stimulation signals. In another example, molecules that encode kinases responsible for T-cell signaling activation such as the FYN proto-oncogene, signal-regulated kinase-1, and tyrosine kinase 2 beta protein, all go through alternative splicing due to T cell activation to lessen expression or change localization patterns.

Inducing expression of an opposing regulatory factor can help in resetting the induced splicing signals. Neuron chronic depolarization is an example of this, which results in increased skipping of exons controlled by CaRREs. Some of these CaRRE-reduced exons appear again in prolonged depolarization. This splicing pattern is related to CaMK-induced alternative splicing of FOX1 that encodes RNA-binding proteins. FOX1 regulates the splicing patterns of genes involved with synaptic activity. In addition, many genes controlled by CaRREs also have a FOX1 binding site which can have an antagonistic effect on exon inclusion like that of the CaRRE sequence. Since most studies only regulate a few genes, many further studies are needed to have a fuller grasp of alternative splicing that occurs in the downstream of a given pathway. [3]

## What is next for protein splicing?

Despite the stimulated factors mentioned above, the overall picture of how signaling pathways regulate alternative splicing is far from being complete. The study of these signaling pathways is still very much in progress. The methods introduced here usually correspond to the alternative splicing of only a few genes. As a result, more progress needs to be made in order to understand the alternative splicing of an entire pathway.

## References

1. Black, Douglas L. (2003). "Mechanisms of alternative pre-messenger RNA splicing". Annual Reviews of Biochemistry 72 (1): 291–336.

2. Clark, David (2005). Molecular biology. Amsterdam: Elsevier Academic Press.

3. Heyd, Florian, and Kristen W. Lynch. DEGRADE, MOVE, REGROUP: signaling control of splicing proteins Philadelphia: Trends in Biochemical Sciences, 2011. Print.

4. Matlin, AJ; Clark F, Smith, CWJ (May 2005). "Understanding alternative splicing: towards a cellular code". Nature Reviews 6 (5): 386–398.

5. Nilsen, T.W. and Graveley, B.R. (2010) Expansion of the Eukaryotic Proteome by Alternative Splicing. Nature 463, 457-463.

6. Pan, Q; Shai O, Lee LJ, Frey BJ, Blencowe BJ (Dec 2008). "Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing". Nature Genetics 40 (12): 1413–1415.

7. Images: Wiki-Media Commons

8. Heyd F, Lynch KW. Trends Biochem Sci. 2011 Aug;36(8):397-404. Epub 2011 May 17. Review. 2. PMID: 21596569 [PubMed - indexed for MEDLINE]

9. Barash, Y; et al (2010). "Deciphering the splicing code". Nature 465 (7294): 53–59.

10. Wang, Z; Burge, Cb (2008). "Splicing regulation: from a parts list of regulatory elements to an integrated splicing code"

## Introduction

To understand structure-function relationships, it is crucial to study the individual amino acid residues and each of their molecular interactions in protein structures. Experiments and work have been conducted, observing that residue networks created by a 3D protein structure provides more insight into structural and functional roles of interacting residues. There are software tools called the RINerator and RINalyzer to see the 2D visualization.

## Protein structure visualization and residue networks

Viewing a 3D protein structure has been accessible by using X-ray crystallography and NMR spectroscopy. Although 2D visualization is very important in terms of observing structures of proteins, 2D representations of RINs have started to become popular.
RINs simplify the visual complexity of 3D protein structures and allows the scientist to focus on individual residues and their interactions within the molecular level. RIN is derived from 3D coordinates of a protein model. Each RIN is composed of nodes, representing amino acid residues. RINs can study residue interactions in many application scenarios, like, with regard to protein dynamics.
Recently, RINs have been applied to study protein-ligand interactions and to observe the structural and functional effects of residue changes under drug use or disease.

## Visual analysis of RINs

The RINalyzer (http://www.rinalyzer.de) is a software tool that provides versatile structural analysis tools for RINs and one can observe the structure in either 2D or 3D. Residue nodes of interest are automatically highlighted in the RINalyzer.
Cytoscape plugin structureViz (http://www.cgl.ucsf.edu/cytoscape/structureViz/) analyzes and supports the structural analysis of protein-to-protein interactions.

## Network approaches to protein structure analysis

One software feature is the ability to perform analysis of residue interactions by comparing the residues with one another by loooking at the similarities and differences between two proteins. One can also observe the binding site similarities.

## Generation of RINs

The RINerator module generates RINs from a 3D protein structure. This provides a more realistic visual by sampling contacts on the Van der Waals surface on each atom. By doing this, different residue interaction types can be observed and the strength of the interactions can be determined as well.

## References

Doncheva, Nadezhda T, et al. "Analyzing and visualizing residue networks of protein structures" Trends in Biochemical Sciences 36.4 (2011) 179-182. Academic Search Complete. Web. 05 December. 2012.

## Introduction

Protein binding sites are the region where proteins interact with each other. This region usually contains the specific part of the three-dimensional of the protein. If we can identify their biding sites, we can proceed to study their function and the protein-protein docking by docking algorithms.

Protein Data Bank (or PDB) functions as storage of protein complex structures. Biochemists always try to obtain the structure of specific proteins, but under experiment condition, protein structures are really hard to be obtained under the condition when it needs for crystallization. Because of the disadvantages of constructing the experiment, biochemist leads to the development of protein-protein docking.

## Binding-site prediction and protein-protein docking

Protein-protein docking is a computational approaches to predict the three-dimensional structure of complex proteins. The success of this technique depends mostly on pre-knowledge of the protein-protein binding sites. In order to predict the structure, the computational approach must focus difference in binding sites between the interfaces of a set of proteins. Most of the time, there are some proteins interfacing at the same regions which then become a hotspots, whereas others might change.

With the requirement in the precision of the binding sites, biochemists developed the algorithm- which is used for predicting the protein binding sites by preserving the protein surface structure and the properties of the fundamental protein structures. We have to insert this algorithm to ProBiS which is a host to detect protein binding sites. The idea behind the algorithm is that most of the conversed parts of protein surface are somehow in accompanying with other proteins or ligands. In order to obtain the conserved part of protein surface, we have to find out the similar local surface between the concerned protein and other proteins.

To conduct the example of this method, we choose the two unbound interacting proteins: FKBP12 (immunophilin) and TBR-1 (a growth factor) with PDB codes of 1d6o and 1ias. Some of the proteins that seem to share the same similarities in structure with FKBP12 are: 1ix5, 1jvw, 1pbk, 1q6h, 1r9h, 1u79, 2awg, 2d9f, 2if4, 2ofn, 2pbc, 2uz5, and 3b7x; with TBR-1 are : 1ckj, 1kob, 1m17, 1o6k, 1o9u, 1u59, 1wak, 1yhv, 1yvj, 2b7a, 2bfy, 2csn, 2f4j, 2ivt, 2izs, 2j0l, 2jbo, 2pzy, 2qkw, 2qlu, 2qr7, 2v7o, 3bkb.

ProBis is now used to predict the binding sites. The fundamental protein has to interact with the polypeptide chain. Our goal is to find out the similar surfaces of these proteins, so we want to minimize the dissimilarities as much as we can. As you can see on the picture on the right, all the conversed regions are mapped over the other ones.

AutoDock 4.0 is then used for docking of protein FKBP12 to the protein TBR-1.This program requires computational interference since it workswith the whole protein structures, so it needs a precise image. The AutoDock uses a force field to give a stronger attraction between the atoms on predicted binding site. The success of docking depends on the comparison between the regions of predicted binding site residues with the corresponding ones. This force field affects the docking. As you can see in this chart, five time larger force field has the highest number of best docked structure.

This 5x force field has 9 different structures between the predicted and the actual binding site residues. The most preferable clustering also belongs to this one since it has the most best docked structure. This theory somehow states that the docking algorithm can be able to explain the structure of the complex protein.

## Reference

Scientific Paper. Binding-sites Prediction Assisting Protein-protein Docking. Acta Chim. Slov. 2011, 58, 396–401

## Proposed New Protein Structure Classification

Three scientist in the field of structural biochemistry from the University of California San Diego(Ruben E. Valas, Song Yang, Philip E. Bourne), have proposed a new method of protein classification. This idea comes as a consequence of the great breadth of macromolecular structures having been solved and the many, yet, to not have been illuminated. This poses a grave problem of assimilation of the large amounts of structural information available. Secondly, it seems that the present manner of classification seems insufficient to unveil the great network of structural lineages that evolution has paved and therefore, their strategy is to employ a reductionist approach to better interpret the evolutionary basis of protein structure and the lineage amongst the diverse populations of such structures.

Two methods of protein classification are readily used today:

## Bottom-up Approach

The bottom-up approach uses algorithms to in an attempt to compare proteins based on geometry, the ability to superimpose using a root means-square deviation(RMSD), length of alignment, number of gaps, and a score of statistical significance. The end result is a proteins domain comparison which renders very little biological significance.

Because of the diversity of methods available, there is usually more than one result for each sequence of amino acids analyzed. One drawback to the bottom-up approach is that, since sequences of amino acids in their primary state do not reveal much about the biological function of the protein, it is impossible to decide which one of the results is the most biologically important one. The benefit to the bottom-up approach is that it is a useful bit of reductionism that does give a representative comparison of different protein domains, which can prove useful.

## Top-down Approach

Top-down approaches are considered today's gold standards as exemplified by CATH and SCOP. These methods primarily utilize homogous sequence comparisons to reflect a relationship among different protein domains and as a result a biological context. The authors agree that this technique can be taken one step further based on the premise that structural classification is developed as a consequence the evolutionary links among species. Furthermore, the authors propose to incorporate issues of gene duplication, convergence versus divergence, and co-evolution in a functional context as ideas that should be used in the future for protein classification.

## The protein domain: a good unit of structural classification?

Both the bottom-up and top-down approaches rely on protein domains as the units of comparison. Domains are complicated units. Some domains have similar sequences and are evolutionarily related, some domains are vaguely related, with similar structures but different sequences, and some domains are similar topologies, but not enough to establish an evolutionary connection. The basic problem is that a domain can be a evolutionary or non-evolutionary unit. Many proteins are multi-domain proteins, which further increases the complexity.

The presence of folds, which are considered discreet components in most top-down classifications, further complicate matters. Folds are not a direct result of evolution, but they do provide insight into evolutionary practices. Folds sometimes change during evolution; it is possible for an alpha fold to change into a beta fold through a secondary structural change. It is also possible to create two peptides with similar sequences but different folds, leading to completely different functions. There are also chameleon sequences that can take on multiple different folds. Because of the diversity of structural variation in regard to folds, folds are not suitable units of classification. In essence, whether or not two proteins are in the same fold is really semantics, whereas determining which one led to the other evolutionarily actually gives insight into the relationship between proteins. The reason it has not been widely used is simply that it is more difficult than clustering similar structures.

## Examples of Evolutionary Selection

Valas et al. present the prevalence of evolutionary selection by give two examples that highlight this phenomena. The first, Basu et al. found in the genomic analysis of 28 different eukaryotic cells, that there were 215 strongly promiscuous domains. Basu et al. define strongly promiscuous as those domains that occur in diverse domain architectures, where these architectures are represented as a linear combination of these domains. "Domain architectures arise through domain shuffling, domain duplication, and domain insertion and deletion leading to new functions." The degree of dmain promiscuity depends on the frequency of being with different domain partners. The second example is by Vogel et al. which found over-representation of 2-domain or 3-domain combinations which were coined, "supradomains" or macrodomains. These are structure that throughout proteins evolution have proven to have stable internal domains. Over 1400 hundred of these macrodomains have been found which show a natural associativity which seems to be evolutionarily advantageous.

## Pluralistic Approach to Protein Classification

The protein domain has been the only manner of evaluating the of evolution protein structure. Although the evolutionary analysis of the protein domain alone has proven successful at evaluating protein structure, it seems that there needs to be other factors contributing the unknown pieces of the evolutionary network. Therefore, the authors propose using a pluralistic approach to protein structure classification which includes incorporating not just domains, but subdomains, macrodomains, and both convergent and divergent evolution. In regards to subdomains, the authors mention areas of subdomains that could be important components to connecting the evolutionary network of proteins.

There are many tools that can be used to compare proteins at the subdomain level. One database called Fragnostic facilitates analysis based on fragments from different proteins that share structural and/or sequence similarity. The edges of the fragments are ambiguous; that is, they are not defined as divergent or convergent evolution, but combined with other information the fragments can be tested for structural evolution.

Closed loops are another subdomain unit. Most protein structure consist of loops spanning 25-30 residues. Domain Hierarchy and closed Loops (DHcL) uses van der Waals energies to elucidate domains and closed loops from protein structures. Researches have discovered that fragments that correlate to closed loops were more likely to form large clusters, which have connections to one another. This description might represent a more detailed view of protein function. Similar closed loops in different structures can be evidence that those structures once shared a common ancestor.

Another subdomain unit is the functional site. Many different proteins can bind to the same ligand, which implies that perhaps they share a common ancestor that bound to the ligand in question. The proteins diverged in structure during evolution, but the functional site remained. SMAP can find such functional site that have both sequence and structural conservation, a perfect example of divergent evolution. On the other hand, different proteins can converge on the same ligand. The PROCOGNATE database uses information from the PDB to put together which proteins bind to which ligand. A combination of these methods could incorporate both divergent and convergent evolution.

Besides subdomains, macrodomains can also be used to aid in classification. Divergent evolution is evident in some protein–protein interaction sites (a macrodomain feature). In those cases, while the proteins differentiate over time, the domain interface stays the same. Many of the protein–protein interfaces in the PDB contain very similar interfaces in vastly different proteins.

In essence, a domain-based scheme would not be as efficient, as it would only be able to determine that the proteins evolved from a common ancestor, while an examination that includes analysis of both subdomains and macrodomains would provide an evolutionary hypothesis. One problem posing the pluralistic approach to protein classification would be convergent evolution. The fact that two proteins with completely different evolutionary lineages can come together to have very similar structures can pose a great problem for connecting the protein evolutionary network.

The authors argue that to obtain the last universal common ancestor(LUCA) of the protein, it is necessary to look at more than the amino acid sequence as has been done but incorporate other structural aspect to be able to mesh the evolutionary puzzle. Protein studies involves various step of sample preparation. The main protocols of protein studies are as followed:

1. Protein Synthesis
2. Purification
3. Evaluation of purified protein
4. Determination of Amino Acid Sequence
5. Calculation of Protein's Mass
6. Determination of Protein's 3-D Structure

There are many methods used to study proteins, including its shape and structure. For instance, X-Ray crystallography is used to give scientists the structure of the protein. Such information is used extensively in determining the characteristics of the protein as well as how it functions and under which circumstances. Other methods used includes amino acid sequencing, fluorescence microscopy, mass spectrometry, NMR, etc.

# Carbohydrate-Binding Proteins

Carbohydrate-Binding proteins (CBPs) are identified as important mediators for numerous different types of cellular events through interactions between carbohydrates and proteins. There are three main families of CBPs.

1. the C-type lectin family (including the Selectins)
2. the Siglec family
3. the galectin family

## C-type Lectins (including Selectins)

C-type Lectins and selectins are present in humans and murines (household rats and mice). Roles of these specific CBPs include

• promoting primary immune response
• mediating leukocyte trafficking to sites of inflammation
• mediating lymphocyte recirculation
• mediating platelet binding to neutrophils

### Clinical Use of Lectins

Pure forms of lectins are used for blood typing. Specifically, lectins are used to identify some glycolipids and glycoproteins on an individual's red blood cells. In the brain, PHA-L, a lectin from a kidney bean, helps to trace the path of efferent axons through the anterograde labeling method.

PHA-L

## Siglecs

Siglecs occur mostly in humans, but some are also found in murines. Some of the primary roles of these CBPs are:

• regulator in B cell activation
• maintenance of myelin
• inhibitor of axonal growth

## Galectins

Galectins are also found in humans, mice, and rats as well. These CBPs are abundant in most organs such as muscles, hearts, lung, liver, lymph nodes, thymus, and colon, stomach epithelial cells, gastrointestimal, erythroctyes, skin, brain, Hodgkin's lymphoma, kidney, and lens. Roles include:

• acting as a marker for cell recognition
• binding specificity

# References

http://web.mit.edu/glycomics/consortium/organization/program/program1.pdf

## Introduction

Two method of counting protein molecules have been used widely: stepwise photobleaching and ratio comparison to fluorescent standards.

Fluorescence takes place when light is given off from the fluorophore after light is absorbed, and GFP is able to fluorescence without enzymatic modification or a cofactor, which allows a single gene to be expressed in detectable emission in any organism. Counting the number of protein molecules in live cells allows researchers to determine the stoichiometry of functional protein complexes and to seek models of cellular structures. Since genome-wide studies may not recognize information about low-abundance proteins or local protein concentrations, single-molecule techniques, if successful, would be able to solve this problem.

## Stepwise photobleaching

Stepwise photobleaching is one of the fluorescence microscopy method for counting protein molecules, which “relies on the irreversible and stochastic loss of fluorescence from repeated exposure of fluorescent proteins (FPs) to a light source.” The sample would be continuously exposed to excitation light at low intensity to allow the sample to be “slowly bleached until its emission intensity reaches background level.” The number of florescent molecules present in the structure determines the suitable light intensity and exposure time. The missed bleaching events need to be minimized because it would show a step approximately twice the size of other steps. The bleaching method is only useful for low protein numbers as the probability of missed events increases exponentially with the number of molecules in a structure. “Das et al. estimated that a maximum of 15 bleaching steps can be directly detected without mathematical extrapolation, although they detected no more than seven steps in their experiments.” The maximum number of molecules that can be counted by photobleaching can be increased to approximately 30 molecules using mathematical aids. A background correction is needed to eliminate fluorescence from diffused proteins and calibrate the starting intensity. During photobleaching, regions of interest (ROIs) should be selected to avoid confusing multiple structures. It is also essential to filter the data to reveal the discrete drops as the raw data are noisy. For example, Chung-Kennedy filter is the most commonly used filter for quantification of the bacterial replisome. “It calculates the mean and standard deviation in two consecutive sets within the data from one photobleaching ROI, and reports the mean of the set with the lower standard deviation.” The number of averaged data points in the data set should be big enough to reduce the noise but small enough to make sure that few steps are missed.

## Quantification by ratio comparison to fluorescent standards

This method involves the measurement of ratio of the fluorescence intensities of a protein sample to a standard. It uses a serious of images of cells that express either the protein sample or the standard, with had obtained fluorescent properties by fusion with an FP. “If the standard can be distinguished from the protein of interest, it is desirable to include cells that express the standard and experimental fusion proteins on the same slide to ensure comparable illumination. If the standard is not distinguishable, images can be taken consecutively or another marker can be imaged separately to distinguish the control cells distant to eliminate Forster resonance energy transfer.” This method is advantageous in the way that a relatively larger number of protein molecules can be counted. Corrections need to be made to achieve more accurate measurements. For example, the uneven illumination in the microscope system needs to be corrected if the whole field is used. Also, if sample molecules are at different depths relative to the coverslip, calibrations on the effect of depth on intensity should be done using fluorescent beads. Different exposure times can be used to control the signal to noise ration and avoid saturation. Excitation intensity should be kept constant to avoid nonlinear changes to photon counts due to blinking molecules. “The background should be taken from a concentric area unles there are overlapping neighbouring signals or an inhomogenous cytoplasmic intensity.” It is important to use a trustworthy standard for this method. When proteins of different sizes or same structure proteins with very different intensities are compared, the sum of intensity of multiple z sections should be used. Additional verification using methods such as genomic DNA sequencing should be used to ensure accuracy of number measured. The number of molecules of each protein and their relative stoichiometries can be obtained using the ratio method at one or many time points.

## Important considerations in counting proteins

Genetically encoded FPs should be used in order to generate a 1:1 stoichiometry with the protein sample, which may affect the maturation efficiency or proportion of unfolded FPs.

## Properties of FPs

The best available FPs should be used by researchers to maximize the signal-to-noise ratio, especially for less abundant proteins. The folding and maturation efficiency, brightness, and photostability of the FPs that are going to be used in the fusions should be taken into consideration before constructing fusion proteins. Research conduced in the budding yeast Saccharomyces cerevisiae and the folded YFP in E. coli suggest that YFP maturation and folding efficiency are not major issues for counting proteins, in particular for proteins with low turnover rates.

## Functionality of fluorescent fusion proteins

It is advantages to use yeast, because fluorescent fusion protein can replace native protein using homologous recombination, which allows the functionality of the fusion protein to be determined. The functionalities for some proteins could be improved by using a flexible linker between the FP and protein sample. The fact that endogenous genes cannot be replaced with tagged version, alternative methods of protein counting need to be used. The local actin abundance in actin patches can be quantified by making corrections after immunoblotting. However, this method is only possible given the assumption that tagged and untagged actins are utilized with similar efficiency in actin patches. Engel et al uses stepwise photobleaching method in a mutant background to count exogenous tagged proteins in exogenous tagged proteins in green algae Chlamydomonas reinhardtii flagella. Since the endogenous genes do not localize, the ratio of tagged and untagged protein assumptions do not need to hold. The recent development of ‘genome editing’ techniques has allowed endogenous genes to be tagged in any model organism in which “the zinc finger nuclease or transcription activator-like effector nuclease genes can be introduced.”

## In vivo versus in vitro standards and quenching

The environment in which the number of proteins is measured is important. Early studies employed in vitro standards, where the effect of background on fluorescence intensity is unknown. This meant that immunoblotting or internal standards were needed to calibrate the fluorescence intensity inside the cell. Experiments were done recently to suggest that in vitro YFP/GFP is comparable to YFP/GFP in bacteria or yeast. Also, fluorescence quenching could take place if FPs were packed into very tight structures. The effects of quenching on counting proteins should be examined individually depending on the specific structures of interest. Fluorescence lifetime imaging with the aid of specialized equipment and analysis can be used to measure quenching due to environmental changes.

## Validation of protein quantification by complementary approaches

Cellular concentrations should be authorized by a cell sorting device called the flow cytometry or fluorescence correlation spectroscopy fro a higher resolution. It is also important to ensure that protein concentrations from fluorescence microscopy are consistent with quantitative immunoblotting. In any protein counting experiment, suitable fluorescent protein genes, suitable standards and controls for environmental changes or the possibility of quenching will ensure appropriate interpretations of the data, which can then be confirmed with complementary experiments.

## Future of counting proteins using fluorescence microscopy

Super-resolution microscopy techniques can produce high-resolution images of intracellular structures, which pinpoint exact locations of individual fluorescent molecules. For such techniques, it is most important to simplify the analysis of high-density images of FPs and minimize errors due to blinking or photobleach failure. Single-molecule techniques are now more commonly used due to the inability to observe stochastic events in average population behaviours. The advantage of using such techniques is that molecules can be counted directly without using collective images, or even determine different protein complexes that are within a diffraction limited area. Super-resolution imagining could lead to the quantification of higher numbers of proteins.

## Conclusion

Counting proteins molecules in a cell is essential in determining structural models and protein function. In vitro, protein numbers help determine the reaction rate and also give more of an understanding to multiproteins. The two methods introduced are stepwise photobleaching and ratio comparison to a given standard. This maybe used in any laboratory with a fluorescence microscope to isolate a particular protein. Of course, there are many advantages and disadvantages in every method including this one. The properties of FPs is of high significance to both methods. There are other methods that will help validate the quantity of proteins such as electron microscopy. Fluorescence microscopy has help determine exact numbers of proteins and also their binding ranges.

Source: Coffman VC, Wu JQ. Trends Biochem Sci. 2012 Sep 1. Is the term referring to "protein homeostasis" where a system of biological pathways leads to proper protein function. The system is called a proteostasis network, which will be responsible for successful protein transport, proper folding of proteins, and elimination of misfolded proteins. The factors responsible for in improper protein function are genetic diseases and environmental stress. More knowledge of the proteostasis network is still in need for development but researchers have studied some of the pathways to create pharmaceutical agents and provide therapy for such protein abnormalities. The pharmaceutical agents used to modify the network pathways are called protein regulators which affect a pathway in a specific manner. For example, the antibiotic geldanamycin is known to act as an inhibitor for the chaperone protein HSP90. The HSP90 chaperone is involved in network pathways for protein folding, the success of HSP90 in assisting protein folding results in cell proliferation. Cancer cells are more sensitive to HSP90 inhibitors, consequently, by using geldanamycin as a protein regulator to inhibit HSP90 function will lead to cancer cell death. More research on the effects of HSP90 inhibitors is still done to propose a therapeutic treatment for cancer. Although the number of pathways involved in protein regulation is great, detailed study of these pathways will result in a successful treatment to ensure proteostasis.

Some diseases that can be caused by protein homeostasis are Parkinson’s, Alzheimer’s and cystic fibrosis. These diseases can occur as the results of the proteostasis network’s decreased ability to cope with misfolding prone proteins, aging, or environmental stress.

The protein homeostasis network and its networks are also controlled by integrated signaling pathways. These signaling pathways have the ability to maximize the capacity of the network in order to ensure consistent and correct protein function. Some examples of signal pathways include those that regulate protein synthesis, aggregation, as well as the degradative pathways of proteostasis.

## Managing Proteostasis

For the proteostasis network to function correctly and in a stable condition, there are many interactions that help monitor and facilitate the process of successful protein folding.

1. The proteostasis network is made up of ribosomes, chaperons, aggregases, and disaggregases that control protein folding. There are also special pathways like the ubiquitin-proteasome system, endoplasmic reticulum-associated degradation systems, proteases, autophagic pathways, etc. that deal with the degradation of proteins.

2. There are the signaling pathways like mitochondria, aging, heat shock response, and unfolding protein response that affect the process of protein folding within the proteostasis network. This is perhaps the most direct influence that can alter the folding and stability of the proteins.

3. Outside influences include metabolities, physiological stress, genetics, and epigenetics that affect the overall activity of the proteostasis network. These influences can also alter the process of protein folding but some, like metabolites and physiological stress, can be prevented by the use of pharmacological chaperones and proteostasis regulators.

Within the cell the surroundings are compacted with many compartments and the lack of space triggers aggregation. Aggregation is related to the levels of toxicity and has to be balanced most importantly when the cell deals with stresses that are chemical, physical and metabolically related.

The overall energy of a protein is impacted by the folding aspect of the proteostasis network. The energy level of a protein achieves a good distribution by utilizing folding enzymes and chaperones to decrease the aggregation and improve folding. Chaperones and enzymes that help fold attach to the intermediate molecules and transition state.

The state and functionality of the proteostasis network directly influences the protein’s functional performance and proteins usually acquire intracellular help for protein folding.

## Pharmacologic Chaperones and Proteostasis Regulators

The proteostasis, as the “protein homeostasis”, must maintain a stable level of activity in order to function correctly within a cell. The proteostasis boundary refers to the folding energies that the protein must have in order to achieve some level of functionality in a given proteostasis network. This proteostasis boundary can be regulated by both pharmacologic chaperones and proteostasis regulators. By regulation, the proteostasis boundary can be expanded to envelop destabilized protein (known as the node) by proteostasis regulators or pharmacologic chaperones can move the node from outside of the proteostasis boundary to the inside in order to stabilize the node. If the proteostasis boundary is not regulated, there will be loss-of-function misfolding diseases, which could create potential life-threatening diseases.

The pharmacologic chaperones (otherwise known as the PCs) perform its regulation by binding to the outside destabilized node in order to stabilize it. After binding to the node, the PCs can move the now stabilized protein inside the proteostasis boundary, which then increases the function within the proteostasis, maintaining a stable level of activity. This stability then translates to less misfolding diseases. The PCs can correct a misfolding disease in three ways:

1. The destabilized node can be thermodynamically stabilized

2. The folding rate of the node can be increased in order to stabilize the transition state of folding

3. Decrease the misfolding rate by stabilizing the native state

On the other hand, the use of proteostasis regulators (known as PRs) allow for an expansion of the proteostasis boundary for a number of destabilized nodes (as long as the nodes all share the same proteostasis network). By expanding the proteostasis boundary, the PRs can favor folding of the proteins by adjusting composition, concentration, and capacity of the proteostasis network. Besides promoting a stable proteostasis for proteins to fold correctly, PRs can also prepare the proteostasis network to handle metabolic stress and aging. The expansion of the proteostasis boundary helps increase the protective capacity of the proteostasis, hence expanding helps prepare for future abuse.

The overall energy of a protein is impacted by the folding aspect of the proteostasis network. The energy level of a protein achieves a good distribution by utilizing folding enzymes and chaperones to decrease the aggregation and improve folding. Chaperones and enzymes that help fold attach to the intermediate molecules and transition state. Binding to the transition state helps stabilize the protein so that there is a decrease in wrong folding and aggregation.

Chaperones help encourage more folding and also plays a role of preservation in the cell due to increasing correct folding and decreasing aggregation and wrong folding. Chaperones are understood as a large molecule that attaches to exterior hydrophobic areas during aggregated mode. Chaperones are specific and different for different compartments.

In all, the use of pharmacologic chaperones and proteostasis regulators both aid the proteostasis network in preventing numerous loss-of-function misfolding diseases. However, the advantages of using either lies in whether it is to bring in one destabilized protein (via pharmacologic chaperones) or to bring in a collection of similar destabilized proteins by expanding the proteostasis boundary (via proteostasis regulators).

## Models for the Proteostasis Network

FoldEX and FoldFX are both models representing the proteostasis boundaries. FoldEX is a model that shows when a protein would get exported from the endoplastic reticulum, whereas the FoldFX model shows when proteins would have its function, hence where proteostasis working. FoldFX stands for Folding for the Function of Protein X. The models have three dimensions and they include the folding rate, the misfolding rate, and the stability of the protein.

The FoldEX model is important because it establishes a threshold for protein export. This boundary is characterized by the protein’s correct and wrong folding rate and its stability. Proteins will be exported if their energy level matches the energy level of the threshold.

In a healthy cell, all the proteins would be situated usually well within the boundaries of the FoldFX model and all the enzymes would be working. However, when there is a disease that affects protein folding or if proteostasis is not quite working well, there could be proteins represented that fall outside the boundaries, which would mean that the proteins are not functioning properly.

In conservative mutation the substitution that occurs does not have a heavy impact on the kinetics or thermodynamics of folding. It does not really affect the functional aspects that much because the replacement of a similar amino acid is not too different from the amino acid that was changed. In a slightly conservative missense mutation and elimination of an amino acid does affect the thermodynamics and kinetics of protein folding because the change of a base in the genetic sequence does not alter the functional aspect.

However, there are ways to correct this. One way is with the application of PC’s, or pharmacologic chaperones. Pharmacologic chaperones specifically target proteins that fall outside of the proteostasis boundary and push it within the boundaries giving it the ability to fold properly and function. It does so by either increasing the folding rate, decreasing the misfolding rate or stabilizing the structure of the protein. Another way to correct this is by way of PR’s, or proteostasis regulators. Proteostasis regulators can either expand or retract the proteostasis barrier allowing more or less proteins to be correctly folded.

## References

Powers, T Evan. Morimoto, Richard. Dillin, Andrew. Kelly, W Jeffrey. Balch E William. Biological and Chemical Approaches to Diseases of Proteostasis Deficiency. 2009. Annual Review of Biochemistry The most popular method to synthesize peptides of more than 50 amino acids in length is automated solid-phase peptide synthesis. R. Bruce Merrifield first developed this method, and it can be used for both DNA and RNA. To begin the process, the carboxyl-terminal amino acid of the desired sequence is anchored to polystyrene beads, and the peptide is synthesized backwards from the C-terminal end to the N-terminal end (contrary to the usual sequence from the N-terminal end to the C-terminal end). The t-Boc protected group of this amino acid is then removed by a wash with trifluoroacetic acid (CF3COOH) and methylene chloride (CH2Cl2), which does not break covalent bonds. The next amino acid with t-boc (di-tri-butyl dicarbonate), a protected N-terminal, and a DCC (dicyclohexylcarbodiimide)-activated C-terminal is added to the reaction column. After the formation of the peptide bond, the excess reagents and dicyclohexylurea are washed away with an appropriate solvent. For the elongation of the peptides, the next amino acids continue to be added in the same manner. At the end of the synthesis, the peptide is released from the polystyrene beads by adding hydrofluoric acid (HF), which cleaves the ester bond without destroying the peptide bonds. Protected groups on the reactive side chains, such as lysine or histamine, also are removed at this time. The huge advantage of this method, besides the fact it is automated, lies in the purification step. Because the impurities are not bound to the reaction column, they can be washed away without losing the synthesized product. In the laboratories, this technique is used to synthesize drugs, such as insulin.

## Processes

### Transcription

It starts in the nucleus. It is very similar to the DNA replication process in which the DNA is "unzipped" by helicase, producing one nucleotide chain ready to be replicated.

Transcription 3 Steps summary –> Producing an RNA message from DNA

(A) Binding and Initiation

DNA transcription unit divided into TATA Box and Enhencer region. TBP is bind to TATA region, other transcription factors (a protein has bound to the region) such as TFIIA and TFIIB are bonded to TATA regions as well. The RNA polymerase cannot bind to the DNA directly unless a transcription factor is bind first. Transcription begins when RNA polymerases bind to the enhancer region( or called the initiation site), separate it into two strands by requiring ATP energy Initiation initiate the location of the DNA strand to begin transcription.

(B) Elongation

RNA polymerase moves along the DNA promoter region by performs two elongate steps:

1) it untwists (unwind) the double helix DNA about 10 bases at a time at 3.4 A.

2) adds nucleotides to the 3’ end of the growing RNA.

As the RNA polymerase moves along, the growing mRNA molecule was replicated base on base. Transcription goes about 60 nucleotides per second. DNA’s nucleotides Adenine will be complimentary to RNA’s Uracil base. DNA’s nucleotides Guanine will pair with Cytosine.

(C) Termination

Transcription proceeds until the RNA polymerase reaches a termination site. No more RNA nucleotides will be added and the mRNA is released. So, mRNA will move out of the nucleus into the cytoplasm for the further use in protein synthesis.

### Translation

The mRNA codons translates to amino acid polypeptide chains in three steps.

3 steps general guidance of translation

Initiation 2. Small subunits ribosomal attaches to mRNA. Large Subunit of ribosome is bind to small subunit with A site (entry for tRNA.)and P site ( leaving door for tRNA.) first attach to a tRNA. anticodon( nucleotide triplet in tRNA) is attaching to A site (entry site) to paired with 3 nucleotide codons from mRNA. tRNA carries an amino acid. As shown by the graph below, tRNA. carry an amino acid on the top

Elongation 3. Initiator tRNA. then moved to P site and A site is opened for the second triplet coded tRNA. to enter along with another amino acid. After the second tRNA. is bind to A site. The amino acid is then bonded together by peptide bonds. Afterwards the third tRNA comes in right after the second tRNA. move to P site. (Moving along from 3’’ to 5’’) 4. ribosomal enzymes link the amino acid into a chain. The process will continue until the stop codon (UAA) is reached.

Termination

5. a stop codon is reached (UAA, UAG, or UGA). A protein called a release factor binds in the A-site to the termination codon. The ribosomes adds a wtaer molecule to the end of the polypeptide chain. 6. ribosome dissociates into its component parts

Good yield and high purity. All reactions are carried out in the single vessel, eliminating losses caused by the repeated transfer of products. This method is good for synthesizing long chain of peptide (50 residues and above).

## Synthetic Peptides

Peptides can be made synthetically by linking an amino group of one amino acid to the carboxyl group of another; this being an example of a condensation reaction. A condensation reaction is the reaction when two molecules come together, releasing water, to form one molecule.

Peptide synthesis can be specific; meaning specific/desired products can be formed. To make unique products and to prevent side reactions, protecting groups such as tert-butyloxycarbonyl (t-Boc) are used. T-Boc is used in the first step of the formation of simple peptides. This protecting group, in order to block the alpha-amino group, reacts with the alpha-amino group forming a complex [[Image:known as t-butyloxycarbonyl amino acid. The blocking of the amino group is followed by the activation of the carboxyl group of the same amino acid. The carboxyl group is activated by dicyclohexylcarbodiimide (DCC).

Now, with the alterations being done to the amino group and the carboxyl group of the first amino acid, a second amino acid can be linked to the first amino acid. The second amino acid has a free amino group, meaning not blocked, and it links to the activated carboxyl group of the first; forming a rigid peptide bond and releasing dicyclohexylurea. The carboxyl group of the newly formed dipeptide is activated with DCC and ready to react with a third amino acid which has a free amino group. Again, a new peptide bond is formed and dicyclohexylurea is released. This process can be performed continuously until the desired peptide is synthesized. To end the synthesis, dilute acid, which removes the t-Boc and leaves the peptide undisturbed, is added.

Dicyclohexylcarbodiimide (DCC)

Solid-phase method is used to form synthetic peptides that contain more than 50 amino acids. It involves binding the last amino acid's carboxyl group to polystyrene beads. The anchored amino acids t-Boc is removed, and the next amino acid with t-Boc protected amino group and DCC activated carboxyl group is added to the amino acid with polystyrene beads. The peptide bond forms, and the peptide with polystyrene beads is filtered and washed, so the peptide is pure before the synthesis is continued. The following amino acids are linked with the same process until the desired peptide is synthesized. Finally, the finished peptide is removed from the beads by using hydrofluoric acid(HF).

Peptide ligation is used to synthesize peptides with more than 100 amino acids. The long peptide is formed from two or more smaller sized peptides with no protecting groups on them. Native thiol ligation is the most powerful and widely used peptide ligation. The long peptide is formed from peptides with thioester on C-terminal carboxyl group and the other peptides with cysteine on N-terminal. The thioester on C-terminal carboxyl group of one peptide reacts with the cysteine on N-terminal of another peptide to form a thioester-linked intermediate. The intermediate is then rearranged(S->N acyl shift) to form a peptide bond. The small sized unprotected peptides are linked by this process to synthesize the long peptide.

## Utilization

Synthetic peptides are made for many purposes. These peptides can act as antigens, which will stimulate the immune system of the body to produce antibodies that target such peptide. These antibodies can then be used to isolate a protein. Peptides can also isolate receptors for hormones.

Synthetic peptides can also be used as drugs. Such example is the synthetic analog of Vasopressin, also known as 1-Desamino-8-D-arginine vassopressin. This synthetic peptide is used to treat patients with diabetes insipidus who lacks the peptide hormone vasopressin, which cause them to urinate excess liquid from their body. By using the analog of vasopressin to substitute for the natural vasopressin, such patients can be treated.

File:Vasopressin.jpg

Lastly, synthetic peptides can be used to gain a greater understanding of the 3D structure of proteins. Using synthetic proteins to study the 3D structure of proteins is extremely helpful because such peptides can include many amino acids that are not found in normal proteins; meaning these peptides are not limited to just the 20 standard amino acids. This result in a much greater variety of structures.

## Solid-Phase Peptide Synthesis

Polypeptide synthesis can be automated, known as the Merrifield solid-phase peptide synthesis, which uses a solid support of polystyrene to support a peptide chain. Polystyrene is a polymer whose subunits are derived from ethenylbenzene.

Polystyrene

The beads of polystyrene are insoluble and rigid when they are dry; however, they swell in certain organic solvents, dichloromethane for example. Therefore, reagents are able to move in and out of the polymer matrix easily. The phenyl groups on polystyrene are functionalized by electrophilic aromatic substitution.

File:Electrophilic Chloromethylation of Polystyrene.jpg

Using a dipeptide as an example, the solid-phase synthesis of peptide on chloromethylated polystyrene proceeds as follows.

1. Attach protected amino acid

2. Deprotect amino terminal

3. Coupling to the second protected amino acid

4. Deprotect amino terminal

5. Disconnect dipeptide from polystyrene

## Purpose of dicyclohexylcarbodiimide (DCC)

Dicyclohexylcarbodiimide (DCC) is used specifically in peptide synthesis in order to activate the electrophilicity of the carboxylate group. This allows the C-terminus to be more favorable as an attachment site for other amino acids. Then the negatively charged oxygen will act as a nucleophile which attacks the center carbon in DCC. This intermediate will eventually be converted into urea, a stable end product that is relatively unreactive throughout the remaining peptide synthesis process. In addition, DCC's activation ability may sometimes racemize peptide bonds if not monitored correctly, therefore sometimes triazoles may be used instead which do not racemize the stereochemistry of peptides.

File:Solid-Phase Synthesis of Peptide.jpg

The advantage of solid-phase synthesis is that the products can be isolated easily since all the intermediates are immobilized on polystyrene. Thus, the products can be purified by filtration and washing. Repetition of the deprotection-coupling process will be able to synthesize larger peptides. A machine, designed by Merrifield, is able to carry out the series of manipulations automatically.

## Protecting Groups

Peptide bond can be formed from the carboxyl group and amino group on the main chains of amino acids. It also can be formed from the side chains to synthesize an undesired peptide. In order to synthesize a desired peptide, protecting groups are used to prevent the formation of undesired products. They also prevent the polymerization from the excess amino acids used in the reaction. Protecting groups also aid in ensuring that the stereochemistry of certain amino acids remain unchanged. Configurations of amino acids may have their stereoisomers changed or racemized if not properly protected as well.

## t-butyloxycarbonyl(t-Boc) protecting group

It is used to protect the N-terminal amino groups as well as the side chains of lysine, arginine, asparagine, and glutamine. Di-t-butyldicarbonate reacts with the NH2 of amino acid to form a t-Boc-amino acid. t-Boc group can be removed under acidic condition. Typically, they are treated with strong acid or Trifluoroacetic acid(TFA), CF3COOH. In the lab, Boc-amino acids are also available to buy since it can be synthesized easily in large quantity. People who synthesize peptides do not have to make Boc-amino acid on their own. Solid phase synthesis is effective because it allows the protein to remain in a primary structured configuration rather than being complicated by secondary or tertiary intermolecular interactions.

Boc-group, synthesized and removed
Mechanism of how T-boc is added to the amino acid
Mechanism of how T-boc is removed from the amino acid using HCl
Trifluoroacetic acid used to remove t-Boc group

## Solution-Phase Peptide Synthesis (Using Benzyloxycarbonyl(Z) as protecting group)

Benzyloxycarbonyl is used to protect the N-terminal amino groups as well as the side chains of lysine, arginine, asparagine, and glutamine. The synthesis starts at the N-terminus and ends at C-terminus. For example, here are steps to synthesize a simple peptide such as Ala-Val:

First Step: Benzyl choloroformate react with the N-terminus of alanine, forming benzyloxycarbonyl alanine (alanine with the N-ternimus protected by Z-group). Typically, triethylamine is used as catalyst for this reaction.

Second Step: The protected alanine is treated with ethyl choloroformate. Carboxyl group of the alanine was activated by forming anhydride. It is sensitive to any nucleophilic attack from the N-terminus of Valine.

Third step: Valine is added to the protected, activated alanine. This forms peptide bond, connecting Valine and Alanine. We'll have the product of Z-Ala-Valine. Notice that the N-terminus is still being protected after this step.

Final Step: The Z-protected group was removed by hydrogenolysis under mild condition with metal such as Pd acting as catalyst. (check the image for detailed reactions in each step)

Synthesis of Ala-Valine, using solution-phase synthesis

In order to synthesize a larger protein, we have the repeat the second and third step. Activating the C-terminus and then, coupling the next amino acid. The advantages of this synthesis are it works very fast, and have a good percentage yield of the product. However, it can only be used for small protein chain. The yields become smaller with larger protein. Therefore, solid-phase is more preferred with large protein.

Z-group protecting group

## 9Fluoronylmethyoxycarbonyl(Fmoc) protecting group

It is used to protect the N-terminal amino groups as well as the side chains of lysine, arginine, asparagine, and glutamine. Fmoc can be removed by piperidine/DMF.

Fmoc protecting group
Piperidine. Used to remove Fmoc group

## t-butyl and benzyl protecting groups

File:Type I NRPS production of the antibiotic tyrocidine.png
type I NRPS production of the antibiotic tyrocidine

## Domains found in NRPS

• F: Formylation (optional)
• A: Adenylation (required in a module)
• PCP: Thiolation and Peptide Carrier Protein with attached 4'-phospho-pantetheine (required in a module)
• C: Condensation forming the amide bond (required in a module)
• Cy: Cylization into thiazoline or oxazolines (optional)
• Ox: Oxidation of thiazolines or oxazolines to thiazoles or oxazoles (optional)
• Red: Reduction of thiazolines or oxazolines to thiazolidines or oxazolidines (optional)
• E: Epimerization into D-amino acids (optional)
• NMT: N-methylation (optional)
• TE: Termination by a thio-esterase (only found once in a NRPS)
• R: Reduction to terminal aldehyde or alcohol (optional)

After the peptide chain is synthesized, it can then be modified by halogenation, hydroxylation, acylation or glycosylation, which is typically carried out by an enzyme coded for in the same operon or gene cluster that was associated with the carrier protein. Since NRPS is similar to PKS and FAS, components of the other methods of metabolite synthesis are often cross-linked to each other and combine to form natural products. --A08954805 (discusscontribs) 22:32, 15 November 2011 (UTC)

## References

1. a b c d Invalid <ref> tag; no text was provided for refs named Campbell
Bacterial Gene to Protein

## Overview of Bacterial Gene to protein

The DNA has two strands, a sense strand and a template strand. The sense strand has the same sequence as the mRNA that will be transcribed, except the T on the DNA will be replaced with U’s on the mRNA. RNA Polymerase will make a complementary mRNA transcript from the template strand of DNA.

#### Transcription

1. Initiation: RNA polymerase will move along the DNA, looking for the -35 region and -10 region of the sigma-70 promoter in E.Coli. Once it finds the promoter, RNA polymerase will bind to the promoter, loosely at first then more tightly once DNA starts to unwind. RNA polymerase will then add a ribonucleoside triphosphate (rNTP), usually a purine. This rNTP will be complementary to the nucleotide on the +1 position of the DNA template. [1]
2. Termination: The transcription termination site is located downstream from the translation stop codon. In bacteria, there are two types of terminations possible:
• Rho dependent:
A Rho factor will bind to the RNA in a region, called the transcription terminator pause site-- this is rich in guanine and cytosine and is after the part of the gene that codes for protein. Rho will then wrap the downstream RNA (the RNA between where Rho binds and the RNA polymerase) around itself and slowly pull itself to the RNA polymerase, which is now paused. When Rho comes into contact with the RNA polymerase, termination occurs and the mRNA transcript and RNA polymerase are released from the DNA template. [1]
• Rho independent-
A region of the mRNA transcript that is rich in guanine and cytosine forms a RNA stem loop that will hold onto the RNA polymerase and cause it to pause. During this pause, the poly-U and poly-A base pairs on the 3’ end of the mRNA is weak and therefore easy to melt. Transcription is stopped when the molecule is melted, and the mRNA transcript and RNA polymerase will be released. [1]

#### Translation

1. Initiation: For bacteria, initiation factors (IF) are involved in the initiation of translation. IF3 will bring mRNA and the 30S subunit of ribosome together. The ribosome binding site on the mRNA can then bind the complementary sequence on the 16S rRNA. IF1 will bind to the A site of the 30S ribosomal subunit and block that A site. IF2 that is attached to GTP can then bring the initiatior fMet-tRNA (N-formylmethionyl-tRNA) to the start codon on the P site of the 30S ribosomal subunit. With the attachment of the initiator tRNA, IF3 will be released and then the 50S subunit of the ribosome will be attached to the 30S. This leads to the hydrolysis of the GTP and therefore the release of the IF2 and IF1. The ribosome continue through translation. [1]
2. Termination: The ribosome will encounter a stop codon-- either UAA, UAG, or UGA, which appears in the A site of the ribosome. Instead of a tRNA binding, a protein release factor, either RF1 or RF2, will enter the A site of the ribosome. Peptidyltransferase will then cut the bond between the finished protein and the P site. Once the protein is released from the ribosome, RF3 will cause the protein release factor used to leave the ribosome. After, a ribosome recycling factor (RRF) and a bound EF-G will bind at the A site of the ribosome. GTP hydrolysis will take apart the 30S and 50S ribosomal subunit. IF3 will then bind to the 30S to remove any tRNA or mRNA left on the subunit. There is now a synthesized bacterial protein and ribosomal subunits that can help in further translations. [1]

## References

1. Slonczewski, Joan L. Foster, John W. Microbiology: An Evolving Science, Second Edition, W.W. Norton & Company. 2009.

## General Information

Protein Purification is the process of separating proteins for individual analysis. Protein purification is the second step of studying proteins, the first being the process of an assay. An assay is a procedure to measure the activity enzyme activity thus confirming the presence of the protein or proteins in interest. Popular assays include Western Blotting and ELISA(Enzyme-linked immunosorbent assay). Before the purification process, Cell Disruption is utilized to homogenize the cell's content. After the cell has been opened up, the process of purifying proteins from one another and the other organelles can be approached in several different methods. Protein mixtures are normally separated multiple times, each based on a different property, such as:

• Solubility
• Size
• Molecular Weight
• Charge
• Binding affinity

The intended reason for purifying a specific protein governs the level and degree of protein purification. At times, a sample of protein that is only moderately purified suffices for its intended application; however, other situations require a higher degree of purification, especially if the fundamental ambition is to study the characteristics and tendencies of the specific protein in interest. By considering solubility, size, molecular weight, charge, and binding affinity, the goal of the scientist that conducts protein purification is to find a level of purification necessary and create a protein yield that is ample for further research and application. This means using the fewest amount of steps in order to keep the yield high, as each protein purification step incurs a degree of product loss. Therefore two factors serve as obstacles in protein purification: yield and purification level. The main goal of each protein purification project falls under two categories: analytical (for studying and research purposes) and preparative (for production and creation of commercial products).

There are many methods of purification including:

Proteins Purification Methods

.

Differential Centrifugation Salting Out Gel-Filtration Chromatography Ion-Exchange Chromatography Affinity Chromatography Hydrophobic Interaction Chromatography Gel Electrophoresis Isoelectric Focusing Two-Dimensional Electrophoresis Dialysis
Proteins are separated based on masses or densities by a centrifugal force. Centrifugation enables the separation of proteins in different cell compartments. Different proteins precipitate at different salt concentration. When the concentration of salt increases, more proteins are able to separate. Large molecules flow more rapidly to the bottom of the column. Proteins are separated according to its charge. Positively charged proteins bind to negatively charge bead, and negatively charge proteins are released. The negatively charged proteins flow through faster. Many proteins have high affinity for specific chemical groups. Proteins separate according to different levels of hydrophobicity. Electrophoresis separate protein while the gel enhances the separation. Small proteins move more rapidly through the gel. Different proteins have different pI (isoelectric point). Proteins are separated horizontally based on pI and vertically based on mass Proteins are separated through a semi-permeable membrane. Since the dimensions of proteins are generally larger than the pores of the membrane, proteins do not pass through and separate.

Purpose:you have the protein in some cells. Then, you want to remove the other protein to get the one you one.

## General Information

Differential centrifugation is a method used to separate the different components of a cell on the basis of mass. The cell membrane is first ruptured to release the cell’s components by using a homogenizer. The resulting mixture is referred to as the homogenate. The homogenate is centrifuged to obtain a pellet containing the most dense organelles. Compounds that are the most dense will form a pellet at lower centrifuge speeds while the less dense compounds will likely remain in the liquid supernatant above the pellet. Each time, the supernatant may be centrifuged at faster speeds to obtain the less dense organelles. Performing centrifugation in a stepwise fashion, in which the centrifugation speed is increased each time, allows the components to be separated by mass. The rather dense nucleus is most likely to be found after the first centrifugation step, followed by the mitochondria, then smaller organelles, and finally the cytoplasm, which may contain soluble proteins.[1]

The result of the centrifugation of blood- compounds are separated by their weight.

Sedimentation equilibrium is quite useful because a pellet is not formed. The speed of rotation creates enough force to make the protein leave the rotor, but it doesn’t condense it into a pellet. This is because a gradient in the concentration of the protein is produced. Diffusion reacts to counter the creation of the gradient and after a certain amount of time, a perfect balance between sedimentation and diffusion is achieved.

Sedimentation equilibrium is also practical to study the interactions between proteins. In particular it is used to ascertain the native state or native conformation of the protein. The native state tells us the exact structure in three dimensions. This information includes if it is a monomer, dimer, trimer, tetramer, etc. A monomer is a protein made up of one subunit. A dimer is two protein subunits that are rotated 180 degrees. A trimer is three subunits etc. This type of experimentation also allows us to determine whether the proteins can form oligomers (identical polypeptide chains tha make up two or more units of a protein). Additionally, the use of sedimentation equilibrium is that it determines equilibrium constants for protein-protein and protein-ligand interactions. The value of this Kd is often between 1nM-1mM. This is calculated by measuring the equilibrium constant (Kd). A final use of this is to determine stoichiometric ratios between protein complexes. An example of this is a ligand and its receptor or an antigen-antibody pair

## References

1. Berg, Jeremy (2006). Biochemistry (6th Ed. ed.). W. H. Freeman. ISBN 0716787245.

Durdik, Jeaninne. "Sedimentaion Equilibrium". Alliance Protein Laboratories. Retrieved 2009-10-10.

## Introduction

The process of "salting out" is a purification method that relies on the basis of protein solubility. It relies on the principle that most proteins are less soluble in solutions of high salt concentrations because the addition of salt ions shield proteins with multi-ion charges. Those charges help protein molecules interact, aggregate, and precipitate. The exact concentration resulting in precipitation varies from protein to protein, allowing for the separation of different proteins (as proteins will precipitate at different points with increases in salt concentration). Salting out can also concentrate dilute solutions of proteins; once the protein precipitates, the remaining liquid can be removed. However, the salt can pose a problem to the purity of protein.

"Salting in" refers to the observation that at solutions of low salt concentrations, the solubility of a protein increases. As the solubility of the salt is higher than that of the protein, it is more likely dissolve and take up space in the solution; therefore, proteins aggregate and precipitate. By contrast, "salting out" requires high salt concentration for the precipitation of the protein. There are two ways of "salting out". In one method, proteins are exposed to high concentrations of salt solutions, and in the other, the proteins are exposed to a series of low concentrated solutions.

Proteins contain various sequences and compositions of amino acids. Therefore, their solubility to water differs depending on the level of hydrophobic or hydrophilic properties of the surface. Proteins with surfaces that have greater hydrophobic properties will readily precipitate. The addition of ions creates an electron shielding effect that nullifies some activity between water particles and the protein, reducing solubility as the proteins bind with each other and begin to aggregate. Generally, larger proteins require less ionic input than do smaller proteins with lesser weight.

In the process of using low concentrations of salt solutions, the proteins are precipitated early in the process. In order to extract the proteins from the solution, cold solutions of ammonium sulfate at a series of decreasing concentrations are used on the precipitate. In order to recover the extracted protein, it is then recrystallized by warming the cold solution to room temperature. This process has many advantages because depending on the extracted protein, the efficiency rate can run anywhere from 30-90%, and rarely fails.

Ammonium sulfate is common substance used to precipitate proteins selectively since it is very soluble in water, it allows high concentration about 4M. At this state, harmful effects of proteins like irreversible denaturation are absent and NH4+ and SO42- are both favourable, non-denaturing, end of the Hofmeister series. Ammonium sulfate provides quantative precipitation of one protein from the mixture. This method is very useful to purify soluble proteins from the cell extracts.4

While proving itself to be an efficient method of protein separation, salting out requires that the solubility of the protein to be calculated or known initially. Proteins have differing amino acid chains and solubility. In trying to change the salt concentration to the point where the protein becomes insoluble, different ions can either increase or decrease the solubility of the protein. Hence, one must be careful in selecting the correct ions to alter salt concentration. A protein is typically least soluble near its isoelectric point, pI, or where it contains minimal net charge. The precipitation by salting out results in fractionation. An amount of precipitated protein is collected at one salt concentration and another amount from a different concentration. This is because some parts of the protein may be more soluble than another region.

Proteins with different pI values can be separated with salting out techniques via dynamic pH values in varying salt concentration. Since proteins are least soluble near their isoelectric point (pI), it is possible to cause them to precipitate them out of solution by increasing the salt concentration. This is possible since the hydration shell surrounding the protein structure is displaced by the increasing ionic concentration in the solvent. Thus by replacing the hydration shell with other ions, the water networks that solubilize proteins and allow for aggregation at high salt concentration due to hydrophobic groups coming together become destabilized. Ultimately proteins are precipitated with aggregation (or "crashed out"). This technique can be used to separate proteins that initially have similar precipitation points. By modifying the pH of the solution, one can increase or decrease the solubility of one protein without affecting the target protein. Furthermore, the solution can later be purified by using dialysis to remove the salt ions in solution.

Dialysis Process

## Hofmeister Series

The effectiveness of the different ions was established by Franz Hofmeister in 1888. The first ion in the anion and cation series is the most effective in precipitating a protein out (dubbed "kosmotropes": ions that interact well with water, forming H-bonds and dehydrating proteins), and the ions at the end are the least ("chaotropes": ions that free up water by breaking H-bonds between water molecules, increasing protein solubility). ^

Cations: N(CH3)3+ > NH4+ > K+ > Li+ > Mg2+ > Ca2+ > Al3+ > guanidinium

Anions: SO42- > HPO42- > CH3COO- > citrate > tartrate > F- > Cl- > Br- > I- > NO3- > ClO4- > SCN-

The starting molecules strengthen hydrophobic interactions by decreasing solubility of the nonpolar molecules, thus salting out the system. However, the later molecules begin to denature the structure of the protein because of strong ionic interactions that disrupt hydrogen bonding. Although the later molecules can be salted out through solutions such as Ammonia Sulfate, certain molecules can also experience salting in, where the solubility of the protein increases through the later molecules of the list.

## Dialysis

Dialysis is a protein purification process that separates proteins from other small molecules, such as salt, by using a semipermeable membrane. This membrane contain micro pores through which the small molecules will escape. Therefore, protein molecules having dimensions significantly greater than the pore diameter are retained inside the dialysis bag. The small molecules and salt will diffuse out through the membrane and into the dialysate outside of the bag. This technique is useful to remove salt ions and other small molecule but can not be used to distinguish proteins. To enhance the separation of the proteins in the bag from other impurities such as salt we can also take advantage of the equilibrium constants. In an aqueous environment the salt will flow through the plasma membrane until its concentration outside the dialysis bag is equal to the concentration inside the bag. At this point there is no net flow of salt through the membrane because equilibrium is reached. But if we add in a new solution of buffer, then the remaining amount of salt will then flow out of the dialysis bag until the concentration of salt in the new buffer equals the concentration in the dialysis bag. If we keep replacing the buffer solution this will enhance the purity of the proteins inside the dialysis bag because each time we replace the buffer the salt has to flow out inorder to attain its equilibrium constant. This principle can also be applied for other impurities that are able to escape through the membrane.

## Dialysis in human body

In kidney-compromised patients, dialysis is often used as a procedure for removing undesirable solutes in the blood. For example, the calcium, potassium, and urea concentration of the dialysate is kept at low concentrations, enabling the target solutes in the blood to diffuse across the semi-permeable membrane. However, this entails the dialysate to be constantly cleaned in order to prevent concentration equilibrium, which would ultimately lead to a rising concentration of unwanted solutes in the blood. In another case, solutes can also be introduced into the blood. For example, bicarbonate ions are in high concentration in the dialysate, which diffuse across the membrane. This is done to prevent metabolic acidosis.

## References

1. Berg, Jeremy M. 2007. Biochemistry. Sixth Ed. New York: W.H. Freeman. 68-69, 78.

2. Voet, Voet, Pratt (2004). - Fundamentals of Biochemistry

3. [[11]] Atlas of Diseases of the Kidney, Volume 5, Principles of Dialysis: Diffusion, Convection, and Dialysis Machines

4 [12] "Chapter 9: Protein expression, purification and characterization", Proteins: Structure and Function, Whitford, 2005, John Wiley & Sons, Ltd

## Capillary Electrophoresis

Capillary Electrophoresis is a family of techniques that use narrow-bore capillaries to perform high efficiency separations of both large and small molecules. Using a high voltage power supply, the solution travels from the anode to the cathode through the capillary. By doing so, the solution passes through the detector and based on the flow of the molecules, the integrator computes the separation of the molecules from the original solution. There are five modes of capillary electrophoresis which include capillary zone electrophoresis, isoelectric focusing, capillary gel electrophoresis, isotachophoresis, and micellar electrokinetic capillary chromatography.

### Capillary Zone Electrophoresis

Capillary zone electrophoresis is a separation mechanisms that is based on the differences in the charge-to-mass ratio of the molecules. The homogeneity of the buffer solution as well as the constant filed strength are fundamental to the capillary zone electrophoresis process. It can be used to separate both large (DNA) and small (drugs) molecules. Capillary Zone Electrophoresis is the simplest form of capillary electrophoresis.

Capillary Zone Electrophoresis

### Isoelectric Focusing

Isoelectric focusing is when the solution tested is run through a pH gradient where the pH is low at the anode and high at the cathode. Therefore, when a voltage is applied, the ampholyte mixture separates in the capillary.

### Capillary Gel Electrophoresis

Capillary gel electrophoresis is conducted in an anticonvective medium, oftentimes such as polyacrylamide or agarose gel. The composition of the media thus serves as a molecular sieve for size separations.

### Isotachophoresis

In isotachophoresis, there is zero electroosmotic flow with the heterogeneous buffer. In fact, the capillary is filled with a leading electrolyte with a higher mobility than any of the sample components as well as a terminating electrolyte where the ionic mobility of the electrolyte is lower than any of the sample components. As a result, the solution is separated based on the leading and terminating electrolytes.

### Micellar Electrokinetic Capillary Chromatography

In Micellar Electrokinetic Capillary Chromatography (MECC or MEKC), the use of micelle-forming surfactant solutions can give rise to separations that resemble reverse-phase liquid chromatography. Based on the hydrophobic and electrostatic interactions, the analytes are organized at the molecular level.

## Electroosmotic Flow

In comparison to HPLC which uses hydrodynamic flow, capillary electrophoresis is based on electroosmotic flow (EOF). The factors that influence the rate of electroosmotic flow are pH, voltage, temperature and the concentration of the buffer.

### pH

At neutral to alkaline pH, the electroosmotic flow is sufficiently stronger than the electrophoretic migration such that all species are swept towards the negative electrode. At high pH, the electroosmotic flow is large and the peptide is negatively charged; despite the peptide’s electrophoretic migration towards positive electrode (anode), the EOF is overwhelming and the peptide migrates towards negative electrode (cathode). At low pH, peptide is positively charged and EOF is very small, thus resulting in peptide electrophoretic migration and EOF towards the negative electrode. However, most solutes migrate towards negative electrode regardless of charge when buffer pH is above 7.0. Oftentimes, the pH selected is at least two units above or below pKa of the analyte in order to ensure complete ionization.

### Voltage

High voltages provide for greatest efficiency by decreasing the separation time.

### Temperature

At high temperatures, the viscosity of the solution is lower and the electroosmotic flow increases as a result. However, some buffers are known to be pH-sensitive with temperature.

### Buffer Concentration

When the buffer concentration is reduced, the peak efficiency of the results is reduced by decreasing the focusing effect.

## References

Wätzig, H., Degenhardt, M. and Kunkel, A. (1998), Strategies for capillary electrophoresis: Method development and validation for pharmaceutical and biological applications. ELECTROPHORESIS, 19: 2695–2752. doi: 10.1002/elps.1150191603

## Introduction

High Pressure Liquid Chromatography (also known as High Performance Liquid Chromatography, or simply HPLC) is an enhanced form of column chromatography that is commonly used in biochemistry to separate and purify compounded samples. Instead of the solvent dripping through the column as a result of gravity as is the case in other methods of chromatography, the solvent is pushed through with high pressures.

The column materials of HPLC are much more neatly and greatly divided, and so there are more interaction opportunities and greater resolving (separating) power. Since the columns are made of materials of better quality, constant pressure must be applied to the column to obtain acceptable flow rates. Therefore, the final result is high resolution and very fast separation.

## History

HPLC was developed and improved with new column technologies in the mid-1970's, replacing the other primeval column chromatographic techniques which failed when it came to quantifying and purifying similar compounds. Pressure liquid chromatography proved to be much less time consuming than the old methods. Compared with classical column chromatography, in which the columns are powered by gravity and a separation can take hours or even up to days, HPLC was able to produce results as fast as five to thirty minutes.

HPLC was used frequently for the compound purification by the 1980's. Computers and other improved technology added to the convenience of HPLC. Improvements in the types of columns and consequently, reproducibility of HPLC, led to developments of micro-columns, affinity columns, and fast HPLC.

The past decade has seen a vast advancement in the development of micro-columns, now commonly used for HPLC, and other specialized columns. The diameter of the typical HPLC column is about 3-5 mm. But the usual diameter of micro-columns, or capillary columns, ranges from 3 µm to 200 µm, so it is considerably smaller. Fast HPLC utilizes a column that is shorter than the typical column, and so they are packed with smaller particles.

These days, one has the option of considering several types of columns for the Purification of mixtures, as well as a variety of detectors to work with the HPLC in order to get the best possible analysis of the compound.

## Theory

A small volume of the sample is put into the High-Pressure Liquid Chromatography where a mobile phase will move it through the stationary phase. The mobile phase is usually a gas or a liquid and the stationary phase is immobile and immiscible. The stationary phase will slow down the flow of the sample because of it physical or chemical properties (size, net charge, or other differences depending on the type HPLC) where it will be filtered or purified. Because of the difference in how the stationary phase affects the impurities from the desired compound, the different components of the sample will come out at different times. The time that a component comes out of the column is called the retention time. The retention time should be unique to the component in the particular sample, so that no two components being analyzed elute at the same time and obscure each other. If solvent composition cannot be tweaked to effectively separate components in HPLC analysis, then a different type of chromatography might be better suited. HPLC, unlike other column chromatography techniques, uses pressure via pumps to push components through the more finely packed columns to speed up analysis and enable analysis of component and column combinations that take longer to elute on their own.

## Mobile Phase

The mobile phase is a solvent or mixture of solvents that carries the sample through the stationary phase. As it moves through the stationary phase, molecular interactions between the sample's components and the column material determine the retention time of the different components. The components that have stronger interactions with the mobile phase than the column will "prefer" the mobile phase and elute quicker with a shorter retention time while components that have stronger interactions with the stationary phase than the solvent will "prefer" the column and elute slower with a longer retention time. This is how HPLC separates, filters, and aids in purification of the compound. There are different techniques in regards to mobile phases that are tweaked to optimize retention time, separation, and peak clarity. These are isocratic, gradient, and polytyptic.

### Isocratic

Isocratic elution involves a constant mobile phase composition. For example, a mobile phase of 50% acetonitrile and 50% water for a reversed phase HPLC (RP-HPLC) run that remains unchanged through the entire analysis. A solvent system is chosen and it will be used for the entire duration of the HPLC run. The sample is injected as the mobile phase flows through, enters the HPLC at a constant flow rate, and passes through the chosen column. This method is generally used when the sample being analyzed is simple enough that all the components of the sample come out at different times with sufficient clarity, and do not have impractically long retention times.

Most samples are not so easy to work with. In these cases, a gradient elution method is set up. The mobile phase mixture will shift as the run proceeds, and the concentrations of the solvents are modified so that the run begins with the "weaker" solvent, and the "stronger" of the solvents will be the most concentrated at the end. One such example is a reversed phase HPLC run that begins with more mobile phase A, which is composed of a 95% water and 5% acetonitrile mixture, and will gradually increase mobile phase B, which is a 100% acetonitrile mixture, until at the end of the run the majority of mobile phase flowing through the column is mobile phase B. Usually for reversed phase HPLC, the mobile phase will begin with the more polar solvent combination and increase the concentration of the less polar solvent combination as the run proceeds. This is so that the less polar molecules (relative to the mobile phase and stationary phase being used) will eventually elute due to a higher concentration of a less polar solvent and the necessary run time for the analysis can be shortened. An isocratic mobile phase can have a polarity too close to the stationary phase, resulting in components eluting out together immediately and their peaks overlapping, or a polarity too different from the nonpolar stationary phase, resulting in nonpolar components taking too long to elute. This is why a gradient mobile phase is often used in analysis, where concentration of less-polar to more-polar solvents can be modified to obtain optimal peak separation.

### Polytyptic

The polytptic elution, also known as mixed-mode chromatography, involves the use of a special column that can switch modes of analysis depending on the solvent. The same column can perform size exclusion, ion exchange, or affinity chromatography depending on the type of solvent that flows through it.

## Retention Time

Retention times depend on the interaction of the component of the sample, the mobile phase, and the stationary phase to each other. Therefore, a well-designed HPLC run relies on choosing the correct type of column for the analysis desired and the right combination of mobile phases for the analyte and the column.

## Column Efficiency

Column efficiency describes how well the stationary phase filters or purifies, basically how packed it is and how well things move along it. There are a couple of ways to measure column efficiency but they all use the same formula:

N=atr2/W2

N=number of theoretical plates
a=constant that depends on the height of a graph
tr=retention time
W=width of a peak

## Applications of HPLC

### Normal phase chromotography

Normal phase chromatography, or NP-HPLC is the first kind of HPLC developed. In this method a polar stationary phase and a non-polar mobile phase is used in order to separate analytes based on their polarity. Since the polar phase is stationary, polar analytes will bind to that phase. Their adsorption strength and elution time depend on the strength of the analyte polarity and the analyte’s steric factors. Since the elution time depends on steric clashes, it is then possible to differentiate and separate structural isomers since each isomer has a different steric clash. One can increase the elution time by adding a non-polar solvent to the non-polar mobile phase. One can also able to decrease the retention time of the analytes by adding polar substances to the non-polar mobile phase and even occupy the stationary phase surface preventing the polar analytes from binding to the polar surface.
In the past, this method is unfavorable due to the fact that water or protic organic solvents changed the hydration state of the media in the system. However, this problem was solved with another version of NP-HPLC called hydrophilic interaction liquid chromatography, which uses a variety of phases that had better retention times.

### Reversed phase chromotography

Reverse phase chromatography, as the name suggests, is the opposite of normal phase chromatography, where it now has a non-polar stationary phase and a polar mobile phase. Consequently, the non-polar analytes will bind to the non-polar phase, and its elution time will also depend on how non-polar it is. One can still also increase the elution time by adding a polar solvent to the mobile phase or decrease the elution time by adding a non-polar solvent to the same phase. However, unlike NP-HPLC, the method depends on hydrophobic interactions.

Some factors can influence hydrophobic interactions. One of those factors is surface area. An analyte with a larger hydrophobic surface area would consequently have a longer retention time since there would be more bonds interacting between the analyte and the non-polar surface. However, too large of an analyte surface won’t be able to enter the pores of the non-polar phase and have no interactions with the phase. This strengthening in bonds is also due to the force of water for “cavity-reduction” around the analyte, and the energy released in this process depends on the surface tension of the eluent, which in this case is water.

Another factor that can affect the hydrophobic interactions is the pH. An ideal environment is one that is uncharged. As a result, chemists use buffering agents, such as sodium phosphate, to regulate the pH and neutralize the charge on exposed media, which usually is composed of silica, on the stationary phase and the charge on the analyte.

Reverse phase columns are stronger than normal silica columns, but still have some weaknesses. Aqueous bases shouldn’t be used with columns consisting of alkyl derivatized silica particles since the base will destroy the underlying silica particle. Also, if an aqueous acid is used, it should be exposed too long to the column in order to prevent corrosion.

### Gel filtration

Gel-filtration chromatography separates proteins based on differing in size. The process involves a gel in a buffer solution that is packed into a column. This gel has many porous carbohydrate polymer bead-like particles. The size of the pores is selected so that it can only allow proteins with a certain size to diffuse through them. The movement of the molecules that are small enough to enter through the pores of the beads is then slowed down because it is forced to enter the stationary phase of the column. The larger molecules on the other hand, end up moving through the column faster because they cannot enter the internal volume of the beads.

The most important advantage of gel-filtration chromatography is its ability to separate the proteins in its original, non-denatured condition, giving you a sample that is in a suitable form for possible further analysis. Another advantage as well is the high resolution that is obtained by applying pressure into the column to get adequate flow. Improved resolution is achieved with slower flow rates. An optimum flow rate for protein fractionation of approximately 5mL/cm2/h is recommended for most gels.

Reference: Aguilar, Marie-Isabel. HPLC of Peptides and Proteins Methods and Protocols. volume 251. Humana Press.

### Ion exchange

Ion-exchange chromatography separates proteins based on their charge. It is efficient enough to be able to resolve proteins that differ only by one single charged group. It depends on the formation of ionic bonds between the charged groups on the proteins and an ion-exchange gel carrying the opposite charge in a column. Proteins that do not have an electrical charge and are neutral are removed by washing. Those proteins that can form ionic bonds, though, are recovered by elution with a buffer of either higher ionic strength or changing pH. An increase in oppositely charged ions (those of the protein being analyzed and those of the gel medium) increases the retention time, which is based on the attraction between the protein ions and charged ions of the gel medium.

There are two types of ion-exchangers. One is the anion exchanger, which has positively charged groups that are stationary in a gel-medium and will interact and bind to negatively charged ions in the protein. The other is the cation exchanger, which has negatively charged groups that are stationary in a gel-medium as well but interact and bind to positively charged ions in the protein.

The pH of the solution can also alter how the ionization process between the protein ions and the ions in the gel-medium. When the pH is equal to the isoelectric point of the protein (the point where the net charge is zero). However, when the pH is less than the isolectric point, the net electric charge on the protein will be positive and it will bind to the cation exchangers. Finally, if the pH is greater than the isoelectric point, the net charge on the protein will be negative and it will bind to the anion exchangers. Therefore, by controlling the pH of the solution we can control how the protein gets separated since it is these exchangers that separate the protein

Reference: Aguilar, Marie-Isabel. HPLC of Peptides and Proteins Methods and Protocols. volume 251. Humana Press.

### Affinity chromatography

Affinity chromatography is the method of the separation of biochemical mixtures, based on a highly specific biologic interaction. It is used to purify a molecule from a mixture and concentrate it into a buffering solution, and also to recognize what biological compounds bind to another molecule, like drugs. It was discovered in 1968 by Pedro Cuatrecasas and Meir Wilcheck.

The process involves the trapping of the target protein (or molecule) that one wants separated from the mixture onto a solid or a medium. A column is filled with beads that contain covalent glucose residues, which are chosen to correspond with the target protein. The proteins will travel down through the beads as they are poured into the column, and when the target protein is recognized, it will get trapped to the column by covalent bonds due to its affinity for glucose. The rest of proteins will run down the column and become successfully separated. The portion of buffer will be added to the column to wash out the unbounded protein. Lastly, a concentrated solution of glucose is added to separate the target protein from the column-attached glucose residues, resulting with the protein being completely purified out of the mixture.

Adsorption, meaning the accumulation of solutes of the surface of a solid or liquid, chromatography is useful in separating a mixture of solutes based on their different polarities. It is based on the notion that polar solute will form a tighter bond with the polar stationary phase than a less polar solute will. An insoluble, polar material like silica gel (a derivative of silica gel, Si(OH)¬4) is filled into a glass column, making it the stationary phase. The sample containing the mixture is the mobile phase, which can be a liquid or gas, is poured onto the glass column, where each solute with a different polarity will bind differently to the solute. The polar solutes will bind tightly to the stationary phase, the less polar ones will bind more loosely, and the neutral ones will pass right through the column. The solute can be eluted with solvents of progressively higher polarity, where the solutes will be eluted with increasing polarity. So, neutral solutes will pass right through the column, the less polar ones will be eluted first, and very polar solutes will be eluted last.

Reference: Principles of Biochemistry 4th Edition.Nelson, David L.; Cox,Michael M.W.H Freeman and Company. New York

• Practical HPLC Method Development 2nd Edition. Snyder, Lloyd R.; Kirkland, Joseph Jack; Glajch, Joseph L. New York.
• Handbook Of Pharmaceutical Analysis By HPLC. M. W. Dong. Elsevier.
A Gel Filtration column

Gel-filtration chromatography, also known as 'size exclusion chromatography', 'molecular exclusion chromatography' or 'molecular sieve chromatography' is the simplest and mildest technique that separates molecules based on their size difference (hydrodynamic volume). This approach allows each polypeptide to be purified from other different sized polypeptides by passing through a gel filtration medium packed into the column. Unlike ion-exchange or affinity chromatography, fractions passing through the column do not bind to the chromatography medium. The big advantage of Gel-filtration chromatography is that the medium can be varied to suit the properties of a sample for further purifications.

When an organic solvent is used as a mobile phase, chemists tend to call it Gel permeation chromatography. The buffer or organic solvents used as the mobile phase are chosen based on the chemical and physical properties of the specific protein sample. The stationary phase of the column is simply the carbohydrate polymeric beads and the mobile phase goes through the stationary phase at a different speed depending upon the size of the molecule. This technique is used to analyze the molar mass distribution of organic-soluble polymers. It was invented by Grant Henry Lathe and Colin Ruthren who were working at Queen Charlotte's Hospital in London, United Kingdom.

Gel-filtration chromatography can be applied in two different ways: for group separations and high resolution fractionation of biomolecules. The group separation technique separates compounds in a sample into groups based on the size range. This technique is used for purification of a sample from high or low weight contaminants. The high resolution fractionation of biomolecules is a more precise technique. It can be used for isolation of one or more components in a sample, separation of monomers from aggregates, to determine molecular weight, or to perform molecular weight distribution analysis. Gel-filtration chromatography is very suitable for biomolecules which are very sensitive to pH changes, concentration of metal ions, or co-factors.

Within the size range of molecules that are subjected to gel-filtration chromatography and are separated by a particular pore size of beads in the column, there is a linear relationship between the relative elution volume of a substance (i.e., the volume of the fractions in which the molecule is found)and the logarithm of its molecular mass (this is assuming that the molecules have similar shapes). If a given gel filtration column is calibrated with several proteins of known molecular mass, the mass of an unknown protein can be estimated by its elution position.

## Analogy

An analogy to understand (this is CONCEPTUAL, not even remotely a literal representation of what happens in ME chromatography) why gel filtration works is to picture several whiffle balls (or sponges or Swiss cheese-whatever cratered object works for you) suspended in a glass tank. Now imagine that you have a mixture of sand, small marbles, and golf balls in a bucket; you dump it in. As you watch, first the golf balls reach the floor of the tank, then the marbles, and finally a layer of sand settles. Why? Essentially all of the sand goes into the holes of the whiffle balls(or Swiss cheeses or sponges), and it tends to fall from the interior of one whiffle ball to the interior of another, significantly slowing passage of the sand to the bottom of the tank. The marbles are only slightly smaller than the holes in the whiffle ball, so they sometimes fall into the holes on the way down but also sometimes bounce off; again, the whiffle balls slow their progress, but to a lesser extent. The golf balls are way too big to fit the holes of a whiffle ball, and so they push straight through the whiffle balls—the fastest and most direct route. Key: sand=small molecules; marbles=medium molecules; golf balls=large molecules; whiffle balls=porous beads; tank of water=column & aqueous solution

## Utilization

Gel-Filtration Chromatography is commonly used for analysis of synthetic and biological polymers such as nucleic acid, proteins, and polysaccharides. A downfall to this technique is that the stationary phase may also interact in an undesirable way with a molecule and affect its retention time. A major drawback to this method is its difficulty in producing a high-resolution image. An alternative to this may be Discontinuous Electrophoresis. Disc electrophoresis uses gels with different pHs and the proteins produce sharp bands when they go from one gel to the other, which creates high-resolution images.[1] This technique requires three different gels: the sample gel, the stacking gel, and the running gel. The proteins moves through the stacking gel and between the sample and running gels before the proteins enter them. This compresses the proteins and increases the resolution.[2]

Gel-Filtration Chromatography should not be confused with gel electrophoresis, where electricity is applied to create an electric field to separate molecules through the gel towards the electrode (anode and cathode) depending on their electric charge. Besides, large molecules in Gel-filtration Chromatography migrate down the column first whereas small molecules in gel electrophoresis migrate down the gel first.

## References

1. "Discontinuous Electrophoresis." The University of Adelaide, Australia, Department of Chemistry. http://www.chemistry.adelaide.edu.au/disciplines/chemistry.
2. "EXPERIMENTAL TECHNIQUES, ELECTROPHORESIS." Department of Biochemistry and Molecular Biophysics. 2006. http://www.biochem.arizona.edu/classes/bioc462/462a/462a.html.

Viadiu, Hector. Biochemistry 114A Lecture. "Protein Techniques." 10/15/12 Purpose: To separate a specific protein from its mixture by using the property of ion-charges.

## General information

An ion exchange column.

Ion Exchange Chromatography (IEC) is a purification method aimed at separating proteins based on charge, which is dependent on the composition of the mobile phase (a separation of mixtures that is dissolved). What adjusting the pH, or the ionic concentration, of the "mobile phase" does is that it allows for separation. For example, if a protein has a net positive charge of pH 7, then it will bind to a column of negative charge beads. On the other hand a negatively charged protein would not. A summary of the ion-exchange chromatography include: If a proton has a net positive charge at pH of 7 then it will bind to a column of beads that contain the carboxyl groups, where as a negatively charged proteins will not. Then, the bound protein is eluted by increasing the contradiction of sodium chloride. The movement of protein is depended on the density of the net charge. So, the proteins that have a low density of net positive charge will emerge first. Protein binds to ion exchangers by electrostatic forces between the surface of the protein charges and cluster of the charged group on the exchangers. A column is packed with a resin (usually cellulose or agarose) with a charged group bonded to it. This allows positively charged proteins, for example, to bind to the negatively charged beads on the column and the negatively charged proteins to flow through the column. Therefore ion exchange chromatography consists of cation exchange chromatography and anion exchange chromatography. In addition, a protein must displace the counterions and become attached; in other words, the net charge on the protein will be the same sign as that of the counterions displaced-therefore "ion exchange. The protein molecules in solution are neutralized by counterions also; the overall reaction must be electrically neutral. Whatever one wants to purify is known as the sample and the parts that are separated are known as the analytes. The sample is added to the top of the column and a buffered solution is used to elute it.

## Anion-Exchange Chromatography

Anion-Exchange chromatography involves the use of positively charged beads. In the purification of acids, which often has the negative charge on its carboxyl group, anion-exchange chromatography is utilized. Anion-exchange chromatography mainly recollects biomolecules by the interaction of amine groups on the ion-exchange resin with aspartic or glutamic acid sidechains, which have pK of ~ 4.4. The mobile phase is buffered at pH > 4.4, below which acid sidechains start to protonate and retention declines.

Above pH 4.4, retention is fundamentally reliant on on the number of anionic sidechains existing in the protein. Proteins including the same number of anionic sidechains can often be separated by modification of the mobile phase pH between 7 and 10 where histidine is not protonated and lysine starts to deprotonate.

Delicate changes occur to proteins in this pH region which affect the interaction of the protein with the resin and which allow fine-tuning of the anion-exchange separation. A mobile phase, pH > 10, is not usually suggested because of possible protein deprivation, such as deamination, at higher pH's.

## Techniques

Ion-exchange chromotography.

In cation exchange chromatography, a sample consisting of a certain protein that bears a net positive charge at a certain pH is a added to a column. In anion exchange chromatography, a sample with a protein that bears a net negative charge at a certain pH is added to a column. Recall that a net charge is the sum of partial charges for each amino acid's particular R group at a given pH. The columns have resin that consists of cellulose (or agarose) beads, which have a function group covalently bonded to it. For cation exchange a carboxylate group is used, and for anion exchange a diethylaminoethyl group is used. A buffer solution, also called a mobile phase, has its pH set between the pl or pKa of protein and the pKa of the beads on the columns. The buffer solution then runs the sample through the column. Molecules with no charge or the same charge as the beads will pass through, while molecules with the opposite charge will bind to the column of beads. Like a magnet, it'll stick and stay there. To elute the bound proteins, the column is flushed with a salt, usually excess NaCl. In cation exchange chromatography the Na+ ion will compete with the bound protein for the negative functional group, and in anion exchange chromatography, the Cl- ion will compete to bind the columns. Another way to flush the system would be with a low pH buffer. The more acidic conditions will lower the net charge (or make it more positive) of the protein. Since the protein now bears a positive net charge, it no longer feels compelled to be around the like-charged resin (since like charges repel), and thus will come out of the column pure. Knowing the isoelectric point (pI) of the protein sample can be helpful in ion-exchange chromatography. Recall that pI is the pH at which a compound's net charge is zero. So if we have a compound with a high pI, for example 10, then to get the pH gown to 7 would cause the compound to become positive. Conversely, if the pH of the solution is higher than the pI, the protein becomes negative overall, thus more anion formation. Thus, depending on the pI of the protein, different solvents at specific pH's can be targeted to purify protein. This also implies that proteins with two significantly different pI's are the most successful in ion-exchange.

If there are impurities in the sample that have a similar charge of the protein being isolated, a pH gradient buffer solution is needed. Unless the proteins have exactly the same amino acids, it is unlikely that they will have exactly the same charge at the same exact pH. Raising (or lowering) the pH, which is in effect causing more molecules to be deprotonated (or protonated), will cause the molecule to have a slight change in charge negatively (or positively). This will affect the ionic interaction between the molecule and the resin, causing some of the molecules to elute from the column. By changing the pH, different molecules will have different charge densities (or degree of negative charge; -2,-1,-3, etc.). So at a certain pH, a protein might have a higher or lower charge density and will thus bind to the resin differently, and those with a lower charge density will elute first.

For another example, say we are analyzing an air sample that has been collected onto an air filter and put through filter extraction (adding water to the filter, purifying by putting through another filter, and extracting the water to be the sample). The samples are then further prepared to put into the IC (ion chromatograph) by adding a given amount of the sample and a given amount of a water. A series of standard solutions and water are first put through the IC in order to calibrate the instrument. The standard solutions consist of certain cation or anion, depending on which ion chromatography is being performed, that are to be detected in the samples. Once all the samples have been put through the IC an ion chromatrogram (see image)is created for each standard and sample solution. In the ion chromatogram the analyte separation can been seen. Each analyte travels through the column at a different rate due to the positively or negatively charged resin. In the ion chromatogram the time at which it takes each analyte to pass through as well as the amount present can be seen. Each analyte will travel through the column at a consistent time in each sample thus each peak can be determined to be certain analytes.