Structural Biochemistry/Genome Analysis/Sequenced Genomes

From Wikibooks, open books for an open world
Jump to: navigation, search

Due to modern techniques of DNA analysis, many genomes have been sequenced and analyzed. A famous example is the human genome through the Human Genome Project.

Human Genome Project[edit]

The human genome project was an international scientific research effort to fully map out the human genome. This project was started by James D. Watson at the US Institute of Health, but research centers worked on the project all over the world; such as France, Germany, Japan, China, the United Kingdom, and India. So far about 92.3% of the genome has been sequenced, but its difficult to determine due to non-coding sequences of DNA or "junk" DNA.

The genome project uncovered some key findings such as the genome of the human race is 99.9% alike.


Sequencing genomes allow scientists to identify homologous proteins and establish evolutionary relationships. Furthermore if a newly discovered protein is homologous to a known protein, through homology scientists can make an educated guess on how the new protein functions.

The Impact of Sequencing on Medicine[edit]

The ability to quickly sequence the human genome in the future may have significant impacts on medicine. Knowledge about genes and an individual's DNA have already given scientists a way to predict the likelihood of certain diseases among individuals. This also allows one to analyze the chromosomal structure, the effects of evolution upon the genome, and protein structures and functions. In the future, gene therapy, genomic medicine, and preventative treatments may reduce the likelihood of disease and allow manufacturers to tailor drugs to specific individuals.

Sequenced Eukaryotic Genomes[edit]

Eukaryotes are organisms containing cells that enclose complex organelles within a well-defined cell membrane. The defining characteristic that sets Eukaryotes and Prokaryotes apart is Eukaryotes' nucleus, or nuclear envelope, in which an organism's genetic information is contained.

The first eukaryotic genome to be sequenced is that of Saccharomyces cerevisiae (S. cerevisiae) in 1996, and it is commonly known as brewer's yeast. S. cerevisiae is the most useful type of yeast due to its utility in baking and brewing, so it is the most studied eukaryotic model organisms in molecular and cell biology, similar to E. coli's role in the study of prokayortic organisms. Many proteins that are important to humans are studied by examining their homologs in yeasts. For example, signaling proteins and protein-processing enzymes are all discovered through the help of yeast genome.

Other fully sequenced organisms include: roundworm, fruitfly, pufferfish (first vertebrate to be sequenced after humans), and Arabidopsis thaliana.

The tables from below are taken from Wikipedia's list of sequenced eukaryotic genomes.



The Chromista are a group of protists that contains the algal phyla Heterokontophyta, Haptophyta and Cryptophyta. Members of this group are mostly studied for evolutionary interest.

Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Guillardia theta Cryptomonad Model organism 0.551 Mb
(nucleomorph genome only)
464[1] Canadian Institute of Advanced Research, Philipps-University Marburg and the University of British Columbia 2001[1]
Thalassiosira pseudonana
Strain:CCMP 1335
Diatom 2.5 Mb 11,242[2] Joint Genome Institute and the University of Washington 2004[2]
Phaeodactylum tricornutum
Strain: CCAP1055/1
Diatom 27.4 Mb 10,402 Joint Genome Institute 2008 [3]


Alveolata are a group of protists which includes the Ciliophora, Apicomplexa and Dinoflagellata. Members of this group are of particular interest to science as the cause of serious human and livestock diseases.

Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Babesia bovis Parasitic protozoan Cattle pathogen 8.2 Mb 3,671 2007[4]
Cryptosporidium hominis
Parasitic protozoan Human pathogen 10.4 Mb 3,994[5] Virginia Commonwealth University 2004[5]
Cryptosporidium parvum
C- or genotype 2 isolate
Parasitic protozoan Human pathogen 16.5 Mb 3,807[6] UCSF and University of Minnesota 2004[6]
Paramecium tetraurelia Ciliate Model organism 72 Mb 39,642[7] Genoscope 2006[7]
Plasmodium falciparum
Parasitic protozoan Human pathogen (malaria) 22.9 Mb 5,268[8] Malaria Genome Project Consortium 2002[8]
Plasmodium knowlesi Parasitic protozoan Primate pathogen (malaria) 23.5 Mb 5,188[9] 2008[9]
Plasmodium vivax Parasitic protozoan Human pathogen (malaria) 26.8 Mb 5,433[10] 2008[10]
Plasmodium yoelii yoelii
Parasitic protozoan Rodent pathogen (malaria) 23.1 Mb 5,878[11] TIGR and NMRC 2002[11]
Tetrahymena thermophila Ciliate Model organism 104 Mb 27,000[12] 2006[12]
Theileria parva
Parasitic protozoan Cattle pathogen (African east coast fever) 8.3 Mb 4,035[13] TIGR and the International Livestock Research Institute 2005[13]
Theileria annulata
Ankara clone C9
Parasitic protozoan Cattle pathogen 8.3 Mb 3,792 Sanger 2005[14]


Excavata is a group of related free living and symbiotic protists; it includes the Metamonada, Loukozoa, Euglenozoa and Percolozoa. They are researched for their role in human disease.

Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Leishmania major
Parasitic protozoan Human pathogen 32.8 Mb 8,272[15] Sanger Institute 2005[15]
Giardia lamblia Parasitic protozoan Human pathogen 11.7 Mb 6,470[16] 2007[16]
Trichomonas vaginalis Parasitic protozoan Human pathogen (Trichomoniasis) 160 Mb 59,681[17] TIGR 2007[17]
Trypanosoma brucei
Strain:TREU927/4 GUTat10.1
Parasitic protozoan Human pathogen (Sleeping sickness) 26 Mb 9,068 [18] Sanger Institute and TIGR 2005[18]
Trypanosoma cruzi
Strain:CL Brener TC3
Parasitic protozoan Human pathogen (Chagas disease) 34 Mb 22,570[19] TIGR, Seattle Biomedical Research Institute and Uppsala University 2005[19]


Amoebozoa are a group of motile amoeboid protists, members of this group move or feed by means of temporary projections, called pseudopods. The best known member of this group is the slime mold which has been studied for centuries; other members include the Archamoebae, Tubulinea and Flabellinea. Some Amoeboza cause disease.

Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Dictyostelium discoideum
Slime mold Model organism 34 Mb 12,500[20] Consortium from University of Cologne, Baylor College of Medicine and the Sanger Centre 2005[20]
Entamoeba histolytica
Parasitic protozoan Human pathogen (amoebic dysentery) 23.8 Mb 9,938[21] TIGR, Sanger Institute and the London School of Hygiene and Tropical Medicine 2005[21]


Higher plants[edit]

Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Arabidopsis thaliana
Wild mustard Model plant 120 Mb 25,498[22] Arabidopsis Genome Initiative[23] 2000[22]
Brassica napus Rapeseed Oil plant 1,100 Mb Bayer CropScience 2009[24]
Oryza sativa
ssp indica
Rice Crop and model organism 420 Mb 32-50,000[25] Beijing Genomics Institute, Zhejiang University and the Chinese Academy of Sciences 2002[25]
Oryza sativa
ssp japonica
Rice Crop and model organism 466 Mb 46,022-55,615[26] Syngenta and Myriad Genetics 2002[26]
Ostreococcus tauri Green alga Simple eukaryote 12.6 Mb Laboratoire Arago 2006[27]
Physcomitrella patens Bryophyte Model organism

early diverging land plant

500 Mb 39,458[28] US Department of Energy Office of Science Joint Genome Institute 2008[28]
Populus trichocarpa Balsam poplar or Black Cottonwood Carbon sequestration, model tree, commercial use (timber), and comparison to A. thaliana 550 Mb 45,555[29] The International Poplar Genome Consortium 2006[29]
Vitis vinifera Grapevine PN40024 Fruit crop 490 Mb[30] 30,434[30] The French-Italian Public Consortium for Grapevine Genome Characterization 2007[30]
Zea mays
ssp mays
Corn (maize) Fruit crop 2,800 Mb 50,000-60,000 NSF 2008[31]


Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Cyanidioschyzon merolae
Red alga Simple eukaryote 16.5 Mb 5,331[32] University of Tokyo, Rikkyo University, Saitama University and Kumamoto University 2004[32]
Thalassiosira pseudonoana[33] Heterokont
Chlamydomonas reinhardtii[34] Model organism 2007[34]
Ostreococcus tauri[33] Chlorophyte


Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Ashbya gossypii
Strain:ATCC 10895
Fungus Plant pathogen 9.2 Mb 4,718[35] SyngentaAG and University of Basel 2004[35]
Aspergillus fumigatus
Fungus Human pathogen 29.4 Mb 9,926[36] Sanger Institute, University of Manchester, TIGR, Institut Pasteur, Nagasaki University, University of Salamanca and OpGen 2005[36]
Aspergillus nidulans
Strain:FGSC A4
Fungus Model organism 30 Mb 9,500[37] 2005[37]
Aspergillus niger
Strain:CBS 513.88
Fungus Biotechnology - fermentation 33.9 Mb 14,165[38] 2007[38]
Aspergillus oryzae
Fungus Used to ferment soy 37 Mb 12,074[39] National Institute of Technology and Evaluation 2005[39]
Candida glabrata
Fungus Human pathogen 12.3 Mb 5,283[40] Génolevures Consortium [41] 2004[40]
Cryptococcus (Filobasidiella) neoformans
Fungus Human pathogen 20 Mb 6,500[42] TIGR and Stanford University 2005[42]
Debaryomyces hansenii
Yeast Cheese ripening 12.2 Mb 6,906[40] Génolevures Consortium 2004[40]
Encephalitozoon cuniculi Microsporidium Human pathogen 2.9 Mb 1,997[43] Genoscope and Université Blaise Pascal 2001[43]
Kluyveromyces lactis
Yeast 10-12 Mb 5,329[40] Génolevures Consortium 2004[40]
Magnaporthe grisea Fungus Plant pathogen 37.8 Mb 11,109[44] 2005[44]
Neurospora crassa Fungus Model eukaryote 40 Mb 10,082[37] Broad Institute, Oregon Health and Science University, University of Kentucky, and the University of Kansas 2003[37]
Saccharomyces cerevisiae
Baker's yeast Model eukaryote 12.1 Mb 6,294[45] International Collaboration for the Yeast Genome Sequencing[46] 1996[45]
Schizosaccharomyces pombe
Yeast Model eukaryote 14 Mb 4,824[47] Sanger Institute and Cold Spring Harbor Laboratory 2002[47]
Yarrowia lipolytica
Yeast Industrial uses 20 Mb 6,703[40] Génolevures Consortium 2004[40]



Organism Type Shotgun Coverage Genome size Number of genes predicted Organization Year of completion
Bos taurus Cow 6* 3.0 Gb[48][49] 22000[50] Cattle Genome Sequencing International Consortium 2009
Canis lupus familiaris Dog 7.6* 2.4 Gb[51] 19,300[51] Broad Institute and Agencourt Bioscience 2005[51]
Cavia porcellus Guinea Pig 2* 3.4 Gb The Genome Sequencing Platform, The Genome Assembly Team[49]
Dasypus novemcinctus Nine-banded Armadillo 2* [52] 3.0 Gb Broad Institute[49]
Echinops telfairi Hedgehog-Tenrec 2* [52] Broad Institute
Equus caballus Horse 6.8* 2.1 Gb [49] Broad Institute et al.[49] 2007 [53]
Erinaceus europaeus Western European Hedgehog 2* [52] Broad Institute
Felis catus Cat 2* 3 Gb 20,285 The Genome Sequencing Platform, The Genome Assembly Team[49] 2007[54]
Homo sapiens Human 3.2 Gb [55] 25,000[55] Human Genome Project Consortium and Celera Genomics Draft 2001[56][57]
Complete 2006[58]
Loxodonta africana African Elephant 2* [52] 3 Gb Broad Institute
Macaca mulatta Rhesus Macaque 6* Macaque Genome Sequencing Consortium[49]
Microcebus murinus Gray Mouse Lemur 2* [52] The Genome Sequencing Platform, The Genome Assembly Team[49]
Monodelphis domestica Gray Short-tailed Opossum 3.5 Gb 18 - 20,000 Broad Institute et al. 2007[49][59]
Mus musculus
Strain: C57BL/6J
Mouse 2.5 Gb 24,174[60] International Collaboration for the Mouse Genome Sequencing[61] 2002[60]
Myotis lucifugus Little Brown Bat 2* [49] Broad Institute
Ochotona princeps American Pika 2* [52] Broad Institute
Ornithorhynchus anatinus [62] Platypus 6* [49] Washington University
Oryctolagus cuniculus Rabbit 2* [52] 2.5 Gb Broad Institute et al. [49]
Otolemur garnettii Small-eared Galago, or Bushbaby 2* [52] Broad Institute
Pan troglodytes Chimpanzee 6* [49] 3.1 Gb Chimpanzee Sequencing and Analysis Consortium 2005[63]
Pongo pygmaeus Orangutan 3.0 Gb Institute for Molecular Biotechnology [49]
Rattus norvegicus Rat 1.8* or better 2.8 Gb [49] 21,166[64] Rat Genome Sequencing Project Consortium 2004[64]
Sorex araneus European Shrew 2* [52] 3.0 Gb [49] The Genome Sequencing Platform, The Genome Assembly Team[49]
Spermophilus tridecemlineatus Thirteen-lined Ground Squirrel 2* The Genome Sequencing Platform, The Genome Assembly Team[49]
Tupaia belangeri Northern Tree Shrew 2* Broad Institute[49]


Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Anopheles gambiae
Strain: PEST
Mosquito Vector of malaria 278 Mb 13,683[65] Celera Genomics and Genoscope 2002[65]
Apis mellifera Honey bee Model for eusocial behavior 1800 Mb 10,157[66] The Honeybee Genome Sequencing Consortium 2006[66]
Bombyx mori
Moth (domestic silk worm) Silk production 530 Mb University of Tokyo and National Institute of Agrobiological Sciences 2004[67]
Drosophila melanogaster Fruit fly Model animal 165 Mb 13,600[68] Celera, UC Berkeley, Baylor College of Medicine, European DGP 2000[68]


Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Caenorhabditis briggsae Nematode worm For comparison with C. elegans 104 Mb 19,500[69] Washington University, Sanger Institute and Cold Spring Harbor Laboratory 2003[69]
Caenorhabditis elegans
Strain:Bristol N2
Nematode worm Model animal 100 Mb 19,000[70] Washington University and the Sanger Institute 1998[70]
Meloidogyne hapla Northern root-knot nematode Vegetable pathogen 54 Mb 14,420[71] 2008[71]
Meloidogyne incognita Southern root-knot nematode Plant pathogen 86 Mb 19,212[72] INRA, Genoscope and International M.incognita Genome Consortium[73] 2008[72]
Pristionchus pacificus Nematode worm Model invertebrate 169 Mb 23,500[74] Max-Planck Institute for Developmental Biology &

Genome Sequencing Center, Washington University School of Medicine


Other animals[edit]

Organism Type Relevance Genome size Number of genes predicted Organization Year of completion
Ciona intestinalis Tunicate Simple chordate 116.7 Mb 16,000[75] Joint Genome Institute 2003[75]
Ciona savignyi Tunicate 174 Mb Broad Institute 2007[76]
Gallus gallus Chicken 1000 Mb 20-23,000[77] International Chicken Genome Sequencing Consortium 2004[77]
Strongylocentrotus purpuratus Sea urchin Model eukaryote 814 Mb 23,300[78] Sea Urchin Genome Sequencing Consortium 2006[78]
Takifugu rubripes Puffer fish Vertebrate with small genome 390 Mb 22-29,000[79] International Fugu Genome Consortium[80] 2002[81]
Tetraodon nigroviridis Puffer fish Vertebrate with compact genome 340 Mb[82] 22,400[82] Genoscope and the Broad Institute 2004[82]

Sequenced Bacterial Genomes[edit]

There are some techniques which are improving to be fast and high volume DNA sequencing like fluorescent dideoxynucleotide chain terminators, "shot gun" method etc. The bacterial genome of Haemophilus influenza wa determined in 1995 with a "short gun" method. The genomic DNA is cut randomly into fragments and then the computer programs brings out the whole sequence by matching the overlapping regions between these fragments. The H. influenzae genome consists of 1,830,137 base pairs and encodes approximately 1740 proteins. With these similar approaches, more than 100 bacterial and archaeal species including key model of organisms such as E.coli, Salmonella typhimurium, and Archaeoglobus fulgidus, as well as pathogenic organisms such as Yersina pestis (causing bubonic plague) and Bacillus anthracis (anthrax).1


1. Berg, Jeremy M. 2007. Biochemistry. Sixth Ed. New York: W.H. Freeman. 68-69, 78. 2. Voet, Voet, Pratt (2004). - Fundamentals of Biochemistry