Proteomics/Introduction to Proteomics/Principles of Proteomics

The Process of Proteomics

The proteome, as defined above, is extremely dynamic. Protein expression in cells can change through time based on many internal and external environmental conditions. This dynamic nature can actually be both beneficial and a potential complication. If all proteins a cell ever needed were always present, it would likely be easier to correlate genes with proteins. However, variable protein expression allows us to look at what is up and down regulated with a given set of circumstances. This is very useful in determining proteins related to and affected by diseases and conditions such as temperature or presence of certain molecules in the environment (such as lactose in E. coli with the lac operon).

One of the important goals of proteomics as a field is to enhance understanding of the biological processes involved in protein construction and conformation. Each protein folds into a specific shape, which directly affects its function. Understanding how proteins fold, as well as how they interact with one another and other molecules, directly relates to understanding the biological processes occurring in cells. However, currently there are limitations to the speed at which proteomics-related data can be retrieved and analyzed. Structure determination is often the most time-consuming process, and is usually done by X-ray crystallography or nuclear magnetic resonance spectroscopy. Protein structure determination by NMR is relatively new, and may aid in obtaining structures for proteins which are found to be difficult to crystallize (such as those with many hydrophobic regions). There is also some work toward developing an automated process for structure determination by NMR, which if accurate would enable full characterization of proteins to be done at a faster pace than ever before.

There are many techniques used in the study of proteins, such as gel electrophoresis , liquid chromatography, mass spectrometry, etc for separating, identifying, sequencing, and determining other features of proteins. In later chapters these protocols and techniques will be discussed thoroughly to allow a better understanding of how the study of the proteome is being furthered. There are emerging visualization technologies being developed to help interpret the data for more detailed results during analysis. Protein microarrays, also described as protein chips or more specifically biochips, are techniques in development to determine protein concentrations, especially peptides and samples in very low concentrations. Chemical microarrays are other tools being used in proteomics, and this type of microarray is used in peptide analysis, as well as analysis of other small molecules.

There are high performance imaging software and tools that allow real-time scanning of 2D gels, more specifically western blots. Even the infra-red spectrum is being used to reduce noise in imaging of gels. Top-down analysis techniques are also in development and being practiced in order to increase sample sizes for improved time efficiency in the lab.

More proteomic technology advances include Free Flow Electrophoresis, which allow for higher affinity in size separation, and ProteomeLab’s PF2D system, which has separation based on charge and hydrophobicity. Nanotechnology seems to be very promising field to proteomics, and some scientists are producing nano-particles as labeling devices, such as SERRS tags.

One difficulty being encountered in proteomics is the study of proteins present in relatively low concentrations in a cell. Since many proteins exist in very high concentrations, their presence can mask lower-concentration proteins. Along the same lines, it is very difficult to isolate these less-concentrated proteins, as the resolution of current methods does not allow visualization of proteins below a certain concentration threshold. Methods with fairly high resolution have the problem of not being able to isolate a large number of different proteins, and need a great deal of sample preparation. The difference between the highest and lowest concentration of protein in a system, or detectable with a certain technique is known as dynamic range in proteomics. Combining different techniques with complementary strengths shows a lot of promise in proteomics studies.

Another difficulty with proteomics is the analyses of the data gathered in the proteomical study (or studies). A single computer programmed to simulate a single protein folding for a specific function can take up to thirty years. However, there are a number of efforts intended to speed up this process. One way scientists are trying to analyze the data and simulate the different proteins folding is through distributed computing. Distributed programming is essentially using many computers -- interpreting their normal idle time, in contrast to a specific programming procedure, for example, in order to characterize and analyze data that a parent computer sent them. With this process, the time required to model one protein's folding can be cut from thirty years to ten days. These and other types of improvements show great promise in furthering the study of proteins.

High-throughput analysis and tools are being used to bypass these limitations in the data collection process. Some of the limitations or bottle-necks in proteomics include large and uncurated databases, low affinity to the separation of peptides, and the many types of post translational modifications (PTMs) of proteins that lead to much more complexity -- not only of Tertiary structure and Quaternary structure, but also actions, effects, and functions of a given protein. High-throughput proteomics incorporates large scale purifications methods, such as affinity tags. Some microscale productions are being used to examine protein profiles including protein chips and microfluid separations. High-throughput prescreening techniques are being developed as well to help assure sample quality and allow for quicker results from testing, like for cancer patients.