From Wikibooks, open books for an open world
Jump to: navigation, search


When scientists study the relationship between proteins between different species they must determine whether or not these proteins are homologous to one another. Homologous proteins are proteins that have a common ancestor. There are two classes of homologs. They are ortholog and paralog. Orthologs are homologous proteins that are found among different species but have similar function. Orthologs occurs due to speciation. Paralog are homologous proteins that are found within the same species. They have very similar structure but serve different functions within the organism. Paralogs are due to gene duplication within the specie. Scientists finds out whether the proteins are homologous or not by studying the DNA sequence, amino acid sequence, the tertiary structure of the protein. In other word, they study the sequence alignment and structural alignment of the protein.

Sequence Alignment[edit]

When analyzing the DNA sequence or the amino acid sequence similarity and differences , scientists study the alignment of the DNA sequence or the alignment of the amino acid alignment of the protein. Alignment of DNA sequences is less sensitive to that of the alignment of amino acid. DNA sequence only consists of 4 bases. This means that the chance that both residues are the same is 1 out of 4. For the amino acid sequence, the chance that both residues are the same is 1 out of 21. The scoring of the degree of similarity of the alignment of each case is determined by Matrix. There are a variety of methods in order to determine the optimal alignment of nucleotide sequences. One famous method is determined by the Needleman Wunsch algorithm.


Matrix is a method used to determine the degree or similarity between the sequences (DNA or Amino Acid). Matrix takes account of two factors which are conservation and frequency.

Conversation determines whether or not a certain residue can substitute for another residue by comparing the residue’s physical factors such as their hydrophobicity, charge, and size. For example if two residues are both hydrophobic they may substitute for each other. If two residues have different charges, chance that they can substitute for one another is very slim.

Frequency states how often a residue occurs. For example, if sequence A has a 20% of residue A and 30% of residue B and if sequence B has 21% of residue A and 35% of residue B, these residues A and B prevalence may indicate that these sequences are related to one another.

The techniques used by matrix are the sliding between sequences, the gap introduced within the sequence, and the deletion of certain residues within the sequence. This method is used in order to improve the alignment’s similarity. In sliding, one sequence is sided in reference to the other. For example after the sliding the sequence by one residue, the similarity between the sequences may increase because more residues are identical or similar to one another. Gaps are introduces within the sequence in order to increase the residue similarity. For example if a sequence A have this particular range of residues that are not important or critical to its function or structure but have an important residue rang after the non-important residue range and the other sequence B lacks the sequence unimportant residue range but does have the same important residue range of sequence A. Then a gap is introduces in sequence B in order to make for the non important residue range. For deletion, certain residues are omitted in the sequence. Unimportant residues are usually deleted in order to increase the similarity between the sequences.

There are two types of matrices. They are identity matrix and substitution matrix. Identity matrix only assign points to residues that are identical where as substitution may assign points to residues that are different yet alike in a conservative perspective. Substitution matrix are more accurate than that of identity matrix because in taking account for conservation substitution, substitution matrix give large positive score to the sequence where frequent substitution occur and large negative score where rare substitution occur . Hence substitution matrix is more sensitive to that of the identity matrix.

Structural Alignment[edit]

Structural alignment is the analysis of the degree of similarity between primary, secondary and tertiary structure. In protein, the tertiary structure is more conserved than that of primary structure because tertiary structure is more closely related to the protein’s function. The purpose of structural alignment is to improve sequence alignment (method discussed before this) by creating a sequence template. Since some regions are more conserved than other, sequence template is template that maps out the conserved amino acid residues that are structurally and functionally important to a particular protein family member. Hence sequence template helps find protein that belongs to certain family member.

SIRCh Link for Bioinformatics (Selected Internet Resourcess)