What is protein folding?

Protein folding is commonly a fast or very fast process, often but not always reversible, taking no more than a few milliseconds to occur. It can be viewed as a complex compromise between the different chemical interactions that can happen between the amino acid sidechains, the amidic backbone and the solvent. There are literally millions of possible three-dimensional configurations, often with minimal energetic differences between them. That's why we're still almost unable to predict the folding of a given polypeptide chain ab initio.Protein folding problem is that scientist still has failed to crack the code that governs folding. Moreover, the ability of biological polymers such as protein fold into well-defined structures is remarkable thermodynamically. An unfolded polymer exists are random coils, each copy of an unfolded polymers will have different conformation, yielding a mixture of many possible conformation.

Folding depends only on primary structure

In 1954 Christian Anfinsen demonstrated that the folding of a protein in a given environment depends only on its primary structure - the amino acid sequence. This conclusion was by no means obvious, given the complexity of the folding process and the paucity of biochemical knowledge at the time. The process of folding often begins co-translationally, so that the N-terminus of the protein begins to fold while the C-terminal portion of the protein is still being synthesized by the ribosome. Specialized proteins called chaperones assist in the folding of other proteins. Meanwhile, protein folding is a thermodynamically driven process: that is, proteins fold by reaching their thermodynamically most stable structure. The path followed by the protein in the potential energy landscape is far from obvious, however. Many local and non-local interactions take part in the process, and the space of possible structures is enormous. As of today molecular dynamics simulations are giving invaluable hints on the first stages of the folding process. It is known now that the unfolded state still retains key long-range interactions and that the local propensity of the sequence to fold in a given secondary structure element narrow the "search" in the so-called conformational space. This seems to mean that biological proteins somehow evolved to properly fold. In fact, many random amino acid sequences only acquire ill-defined structures (molten globules) or no structure at all.

There are some general rules, however. Hydrophobic amino acids will tend to be kept inside the structure, with little or no contact with the surrounding water; conversely, polar or charged amino acids will be often exposed to solvent. Very long proteins will often fold in various distinct modules, instead of in a single large structure.

The Ramachandran plot

The Ramachandran plot was invented by professor G.N.Ramachandran, a very eminent scientist from India. He discovered the triple helix structure of collagen in 1954 a year after the double helix structure was put forth. In a polypeptide the main chain N-Calpha and Calpha-C bonds relatively are free to rotate. These rotations are represented by the torsion angles phi and psi, respectively.

G N Ramachandran used computer models of small polypeptides to systematically vary phi and psi with the objective of finding stable conformations. For each conformation, the structure was examined for close contacts between atoms. Atoms were treated as hard spheres with dimensions corresponding to their van der Waals radii. Therefore, phi and psi angles which cause spheres to collide correspond to sterically disallowed conformations of the polypeptide backbone

Intramolecular forces in protein folding

The tertiary structure is held together by hydrogen bonds, hydrophilic interactions, ionic interactions, and/or disulfide bonds.

The protein folding problem

The protein folding problem relates to what is known as the Levinthal paradox. Levinthal calculated that if a fairly small protein is composed of 100 amino acids and each amino acid residue has only 3 possible conformations (an underestimate) then the entire protein can fold into 3¹⁰⁰-1 or 5x10⁴⁷ possible conformations. Even if it takes only 10^-13 of a second to try each conformation it would take 10²⁷ years to try them all. Obviously a protein doesn't take that long to fold, so randomly trying out all possible conformations is not the way proteins fold. Since most proteins fold on a timescale of the order of milliseconds it is clear that the process is directed in some manner dependent on the constituents of the chain. The protein folding problem which has perplexed scientists for over thirty years is that of understanding how the tertiary structure of a protein is related to its primary structure, because it has been proven that the primary structure of a protein holds the only information necessary for the protein to fold. Ultimately the aim is also to be able to predict what pathway the protein will take.

Folding in extreme environments

Most proteins are not capable of maintaining their three-dimensional shape when they are exposed to environmental extremes such as a low or high pH, or a highly variable temperature.

Changes in the pH of the proteins environment may alter the charges on the amino acid side chains that form the whole protein, so that repulsive or attractive forces may form, altering the secondary and tertiary structure of the protein as a whole, as a result the shape of the enzyme is warped, and the now non-functional protein is said to be denatured.

A high or exceptionally low temperature can cause the constitutive bonds on the protein to be broken, again resulting in the protein being rendered non-functional.

It is important to note that certain proteins, mainly digestive enzymes such as trypsin, are capable of with-standing a pH as low as 1. If the pH of such an enzymes environment were to increase to approximately pH5, it would be inactivated.

Protein misfolding

The way that a protein folds is one of the most important factors influencing its properties, as this is what determines which active groups are exposed for interaction. If the protein misfolds, its properties can be markedly changed. One example of this is in Transmissible Spongiform Encephalopathies, such as BSE, and Scrapie. In these, the prion protein, which is involved in the brain's copper metabolism, misfolds, and starts forming plaques, which destroy brain tissue.

Protein structural levels

Biochemists refer to four distinct aspects of a protein's structure:

Primary structure

Primary structure is practically a synonym of the amino acid sequence. It can also contain informations on amino acids linked by peptide bonds. Primary structure is typically written as a string of three letter sequences, each representing an amino acid. Peptides and proteins must have the correct sequence of amino acids.

Secondary structure

Secondary structure elements are elementary structural patterns that are present in most,if not all,known proteins. These are highly patterned sub-structures --alpha helix and beta sheet-- consisting of loops between elements or segments of polypeptide chain that assume no stable shape. Secondary structure elements, when mapped on the sequence and depicted in the relative position they have in respect to each other, define the topology of the protein. It is also relevant to note that hydrogen bonding between residues is the cause for secondary structure features; secondary structure is usually described to beginning biochemists as (almost) entirely independent of residue side-chain interactions.

Tertiary structure

Tertiary structure is the name given to refer to the overall shape of a single protein molecule. Although tertiary structure is sometimes described (especially to beginning biology and biochemistry students) as being a result of interactions between amino acid residue side chains, a more correct understanding of tertiary structure is the interactions between elements of secondary protein structure, i.e. alpha-helices and beta-pleated sheets. Tertiary structure is often referred to as the "fold structure" of a protein, since it is the result of the complex three-dimensional interplay of other structural and environmental elements.

Super-tertiary structure (protein modules)

Some literature refers to elements of super-tertiary structure, which often refers to elements of folding that, for whatever reason, do not neatly fit into the category of tertiary structure. Often this level of distinction is saved for graduate level coursework. Protein denaturation can be a reversible or an irreversible process, i.e., it may be possible or impossible to make the protein regain its original spatial conformation.

Quaternary structure

Quaternary structure is the shape or structure that results from the union of more than one protein molecule, usually called subunit proteins subunits in this context, which function as part of the larger assembly or protein complex.

And it refers to the regular association of two or more polypeptide chains to form a complex. A multi-subunit protein may be composed of two or more identical polypeptides, or it may include different polypeptides.

Quaternary structure tends to be stabilized mainly by weak interactions between residues exposed on surfaces polypeptides within a complex.

Secondary structure elements

The alpha helix

The alpha helix is a periodic structure formed when main-chain atoms from residues spaced four residues apart hydrogen bond with one another. This gives rise to a helical structure, which in natural proteins is always right-handed. Each turn of the helix comprises 3.6 amino acids. Alpha helices are stiff, rod-like structures which are found in many unrelated proteins. One feature of these structures is that they tend to show a bias in the distribution of hydrophobic residues such that they tend to occur primarily on one face of the helix.

The amino acids in an α helix are arranged in a helical structure, about 5 Å wide. Each amino acid results in a 100° turn in the helix, and corresponds to a translation of 1.5Å along the helical axis. The helix is tightly packed; there is almost no free space within the helix. All amino acid side-chains are arranged at the outside of the helix. The N-H group of amino acid (n) can establish a hydrogen bond with the C=O group of amino acid (n+4).

Short polypeptides usually are not able to adopt the alpha helical structure, since the entropic cost associated with the folding of the polypeptide chain is too high. Some amino acids (called helix breakers) like proline will disrupt the helical structure.

Ordinarily, a helix has a buildup of positive charge at the N-terminal end and negative charge at the C-terminal end which is a destabilizing influence. As a result, α helices are often capped at the N-terminal end by a negatively charged amino acid (like glutamic acid) in order to stabilize the helix dipole. Less common (and less effective) is C-terminal capping with a positively charged protein like lysine.

α helices have particular significance in DNA binding motifs, including helix-turn-helix motifs, leucine zipper motifs and zinc finger motifs. This is because of the structural coincidence of the α helix diameter of 12Å being the same as the width of the major groove in B-form DNA.

α helices are one of the basic structural elements in proteins, together with beta sheets.

The peptide backbone of an α helix has 3.6 amino acids per turn.

The beta sheet

Diagram of Β-Pleated sheet and bond structure of protein

The β sheet (also β-pleated sheet) is a commonly occurring form of regular secondary structure in proteins, first proposed by Linus Pauling and Robert Corey in 1951. It consists of two or more amino acid sequences within the same protein that are arranged adjacently and in parallel, but with alternating orientation such that hydrogen bonds can form between the two strands. The amino acid chain is almost fully extended throughout a β strand, lessening the probability of bulky steric clashes. The Ramachandran plot shows optimal conformation with angle phi = -120 to -60 degrees and angle psi = 120 to 160 degrees. The N-H groups in the backbone of one strand establish hydrogen bonds with the C=O groups in the backbone of the adjacent, parallel strand(s). The cumulative effect of multiple such hydrogen bonds arranged in this way contributes to the sheet's stability and structural rigidity and integrity: for example, cellulose's beta-1,4 glucose structure. The side chains from the amino acid residues found in a β sheet structure may also be arranged such that many of the adjacent side chains on one side of the sheet are hydrophobic, while many of those adjacent to each other on the alternate side of the sheet are polar or charged (hydrophilic). Some sequences involved in a β sheet, when traced along the backbone, take a hairpin turn in orientation (direction), sometimes through one or more prolines. The α-C atoms of adjacent strands stand 3.5Å apart. In addition to the parallel beta sheet, there is also the anti-parallel beta sheet. The hydrogen-bond still exists in this conformation but the main difference lies in the directionality of the protein. In this conformation the proteins run in opposite directions, but the resulting hydrogen bonds connect directly across from one another instead of the diagonal. In short, beta sheets can be purely parallel, anti-parallel, or even mixed.

The random coil

A protein that completely lacks secondary structure is a random coil. In random coil, the only fixed relationship between amino acids is that between adjacent residues through the peptide bond. As a result, random coil can be detected from the absence of the signals in a multidimensional nuclear magnetic resonance experiment that depend on particular peptide-peptide interactions. Likewise in the images produced in crystallography experiments, pieces of random coil appear simply as an absence of "electron density" or contrast. Random coil is also easily distinguished by circular dichroism. Denaturing reduces a protein entirely to random coil.

Less common secondary structure elements

Certain other periodic structures rarely appear in proteins, some of which are similar to the more common types. For example, two variants of the alpha-helix also occur, the 3-10 helix and the pi helix. These have helical pitches of 3 and 4.4 residues per turn respectively, corresponding to hydrogen bonds forming between residue i and i+3 for the 3-10 helix and between i and i+5 for the pi helix. Both are usually very short (1 turn or so) and have been seen only at the ends of alpha helices.

Tertiary structure elements

Alpha-only structures

Structures that contain only Alpha Helices

Beta-only structures

Common folds and modules

The Rossman fold

The Rossmann fold is a protein structural motif found in proteins that bind nucleotides, especially the cofactor NAD. The structure is composed of three or more parallel beta strands linked by two alpha helices in the topological order beta-alpha-beta-alpha-beta. Because each Rossmann fold can bind one nucleotide, binding domains for dinucleotides such as NAD consist of two paired Rossmann folds that each bind one nucleotide moiety of the cofactor molecule. Single Rossmann folds can bind mononucleotides such as the cofactor FMN.

The motif is named for Michael Rossmann who first pointed out that this was a frequently occurring motif in nucleotide binding proteins.

The hemoglobin fold

The immunoglobulin fold

Keratin

Found in hair and in the palms of human hands.

Collagen