Structural Biochemistry/Proteins/Protein Folding Problem

The “protein folding problem” consists of three closely related puzzles:

What is the folding code?
What is the folding mechanism?
Can we predict the native structure of a protein from its amino acid sequence?

Protein Folding Problem

The Protein Folding Problem is the obstacle that scientists confront when they try to predict 3D structure of proteins based on their amino acid sequence. Although it is known that a given sequence of amino acids almost always folds into a 3D structure with certain functions, it is impossible to predict, with high precision, the exact folding pattern. Understanding the speed of proteins folding, which occurs extremely quickly, has also become a challenge to scientists. To be able to understand any type of biochemical reaction requires isolation and structure determination of reactants, intermediates and products. In protein folding, the isolation of reactants, intermediates and products is complicated because most interactions in proteins are non-covalent and weak interactions which lead to rapid rates of interconversion between each reaction state. Therefore, the isolation of intermediates is not easily achieved and therefore inaccessible for X-ray crystallography. In addition, several advances in protein folding research have been made in characterizing reactants and intermediates. Based on the complexity of protein folding, there are 3 major problems of protein folding: The folding code, structure prediction and the folding speed and mechanism.

The Three Folding Problems

The Folding Code

In the late 1980s, scientists discovered that there is a sequence of amino acid code that folds proteins in a particular way. The starting point of protein folding is indeed the primary structure (the sequence of amino acids), also known as denatured state of the protein. Even the smallest amount of the denatured state can activate nucleation and proliferation carried out through protein folding pathways. Characterization of these denatured states of proteins at physiological conditions is very difficult because it is necessary to unfold the proteins to their denatured states without the presence of denaturants [2, Travagilini-Allocatelli et al.].

Recent research has allowed the study of denatured states to reach new heights using the single-molecule approach. Researchers used single-molecule experiments to examine coil to globule transition of proteins and have demonstrated that the denatured state showed steady expansion as the concentration of denaturant was increased. Similarly, at low denaturant concentrations, the peptide chain of the protein collapsed in a sequence dependent manner [2, Travagilini-Allocatelli et al.].

Also there have been advancements to study intermediates in protein folding. For example, the denatured state of the engrailed homeodomian (En-HD) was engineered to be denatured in physiological conditions and Nuclear Magnetic Resonance (NMR) has shown that it resembles a folding intermediate. An additional study discovered that the specific section of the En-HD called the helix-turn-helix motif (HTH) behaves as an independent folding domain. When examining the full protein, the HTH motif represents a folding intermediate in the En-HD folding pathway [2, Travagilini-Allocatelli et al.].

Although the folding of protein is still an enigma, scientists have taken the advantage of these protein information to design new materials, such as medicine, reagents and inhibitors, to benefit the society.

Structure Prediction

Nowadays, researchers predict the structure of a protein by inputting the amino acid sequence into a computer. The advanced technology and modeling software allow scientists and researchers to form a predicted structure. However, the structure is not accurate, as there is always a small degree of errors present. Nevertheless, this can speed up discovery of new medications since the digital structure can be manipulated.

Secondary structure prediction

Secondary structure prediction is a set of techniques that aim to predict the secondary structures of proteins and RNA sequences based only on their primary structure which is amino acid or nucleotide sequence. For example, proteins, a prediction consists of assigning regions of the amino acid sequence as alpha helices, beta strands, or turns. The success of a prediction is determined by comparing it to the results of the DSSP (the DSSP algorithm is the standard method for assigning secondary structure to the amino acids of a protein, given the atomic-resolution coordinates of the protein) algorithm applied to the crystal structure of the protein; for nucleic acids, it may be determined from the hydrogen bonding pattern. Specialized algorithms have been developed for the detection of specific well defined patterns such as transmembrane helices and coiled coils in proteins, or microRNA structures in RNA.

Tertiary structure prediction

Experimental methods such as NMR spectroscopy or x-ray diffraction analysis are widely used in order to determine tertiary protein structures. But the rate at which protein structures can be determined by experimental techniques is much lower than the rate at which new genes are identified by the various genome projects.

Ab initio protein modelling methods have been used to build 3-D protein models. For example, based on physical principles rather than on previously solved structures. There are many possible procedures that either attempt to mimic protein folding or apply some stochastic method to search possible solutions (like, global optimization of a suitable energy function). These procedures require massive computational resources, and have thus only been carried out for tiny proteins. To predict protein structure for larger proteins will require better algorithms and larger computational resources like those afforded by either powerful supercomputers. Although these computational barriers are massive, the potential benefits of structural prediction make ab initio an active research topic.

Side-chain geometry prediction describes a computational approach that can make predictions for a series of coiled-coil dimers. This method comprises a dual strategy that augments extensive conformational sampling with molecular mechanics minimization.

Quaternary structure

In the case of complexes of two or more proteins, where the structures of the proteins are known or can be predicted with high accuracy, protein–protein docking methods can be used to predict the structure of the complex.

Folding Speed and Mechanism

In 1968, Cyrus Levinthal pointed out that protein folding, with precision, happens in microseconds, which seems unrealistic and impossible. This is also known as the Levinthal's paradox. Nowadays, we have advanced methods such as mutational methods, which give us the value of phi and psi during folding, and hydrogen exchange methods, which allow us to see structural folding events. However, the dynamics and mechanism of protein folding still require additional research and understanding.

The dynamics and kinetics of unfolded polypeptide chain have been addressed by recent studies of loop formation by Keifhaber and coworkers. They used different model systems each representing different types of loops: end to end, end to interior, or interior to interior. Their experiments showed that end to interior and interior to interior loop formation formed slower than end to end loops. This discovery suggests that chain motion of one part of the unfolded polypeptide chain is coupled to other parts of the chain. These kinetics experiments also revealed that protein folding processes take place on different time scales and thus there is a hierarchy in loop formation[2, Travagilini-Allocatelli et al.].

Although additional research is necessary to understand mechanisms in protein folding, there are two different classical mechanisms that have been used to describe folding of single domain proteins. The first of the mechanisms is called the Diffusion-Collision Model. Proteins that follow this mechanism fold in a stepwise manner that involves growing secondary structure elements. These elements then collide, combine and strengthen. For example, there is evidence that the En-HD mentioned above follows the diffusion-collision model. The second mechanism is known as the Nucleation-Condensation Model. Proteins following this method have been seen to fold from an unstructured denatured state with simultaneous formation of secondary and tertiary structure. For example, a homologous protein of En-HD called hTRF1 has been shown to follow this model. However, there are many proteins that exhibit characteristic pathways of both diffusion-collision and nucleation-condensation models [2, Travagilini-Allocatelli et al.].

The starting point of protein folding: the denatured state

In the denatured state, the structure can trigger nucleation and propagation, which may carry through the folding pathway. Characterization of denatured states of proteins at physical conditions represents a hard task as needed to disfavor the population of native states without adding denaturants. Chemically denatured states may act like random-coil polymer at high denaturant concentrations. Sherman and Haran used single-molecule experiments to analyze the coil to globule transition of protein L and showed that the denatured state of the protein increases as the denaturant concentration increases. Also Eaton and co-workers compared the size and dynamics of the denatured states of those two proteins, displaying a similar length of 64 and 66 amino acids.

Mechanisms of protein folding

There were two different mechanism used to describe the folding of single-domain proteins. Some proteins such as barnase, has been described to fold in a stepwise manner with rapid formation of distinct nuclei and also with their collision and consolidation. There are also other proteins, with chymotrypsin inhibitor 2 as an example of the nucleation-condensation model. The folding pathway of the small alpha beta protein domain has been shown to be distinct from the pure nucleation-condensation and diffusion-collision, but still displaying the characteristics of both models.

Folding stability and function

The inherent stability of individual protein segment is a key factor in determining the folding mechanism of a given protein. Many times, cell’s life relies on the ability of its constituent proteins to fold into 3D structures that are crucial for their function. The amount of folded functional protein in a cell depends on several factors such as, rate of protein biosynthesis and degradation.

There was a question about whether the stability and folding of fully folded proteins can be related to their activity. Allostery can be the bridge where protein folding meets function. Allosteric effects involve communication between ligand binding sites which is critical to many physiological processes. As allostery is a thermodynamic process, it should not only be considered by changes in conformation but also by changes in the dynamics of the mean conformation.

Therefore more research is necessary to fully comprehend the mechanism of protein folding and find a solution to the protein folding problem.

Reference

Ken A Dill, S Banu Ozkan, Thomas R Weikl, John D Chodera and Vincent A Voelz. The protein folding problem: when will it be solved?Current Opinion in Structural Biology 2007.
Carlo Travaglini-Allocatelli, Yiva Ivarsson, Per Jemth and Stefano Gianni. Folding and stability of globula proteins and implications for function Current Opinion in Structural Biology 2009, 19:3-7.
Mount DM (2004). Bioinformatics: Sequence and Genome Analysis. 2. Cold Spring Harbor Laboratory Press. ISBN 0879697121
Zhang Y (2008). "Progress and challenges in protein structure prediction". Curr Opin Struct Biol 18 (3): 342–8. doi:10.1016/j.sbi.2008.02.004. PMC 2680823. PMID 18436442