Methods and Concepts in the Life Sciences/Genome Engineering
Genome engineering refers to the strategies and techniques for the targeted, specific modification of the genome of living organisms.
Early approaches to genome engineering involved modifying genetic sequences using only homologous recombination. Using a homologous sequence located on another strand as a model can lead this natural DNA maintenance mechanism to repair a DNA strand. It is possible to induce homologous recombination between a cellular DNA strand and an exogenous DNA strand inserted in the cell using a vector such as the modified genome of a retrovirus. The recombination phenomenon is flexible enough for a certain level of change (addition, suppression or modification of a DNA portion) to be introduced to the targeted homologous area.
Modifying genomes using only homologous recombination remained a long and random process, until additional developments were made that could increase the rate of homologous recombination in somatic cell types. These developments include the use of site directed endonucleases, which achieve gene modification through causing double stranded DNA breaks which triggers the cells natural DNA repair mechanism, predominantly non homologous end joining (NHEJ) as well as a low frequency of homologous recombination (HR).
Restriction enzymes commonly used in molecular biology to cut DNA interact with sequences of 1 to 10 nucleotides. These sequences, which are very short and generally palindromic, often occur at several sites in the genome (the human genome comprises 6.4 billion bases). Restriction enzymes are therefore likely to cut the DNA molecule several times. In their efforts to find a genome surgery approach offering a higher degree of accuracy and security, scientists therefore turned to more precise tools.
More targeted genome engineering can be performed by using enzymes which are able to recognize and interact with DNA sequences that are sufficiently long so as to occur only once, with high probability, in any given genome. The DNA modification therefore takes place precisely at the site of the target sequence. With recognition sites of over 12 base pairs, meganucleases and zinc finger nucleases offer this degree of precision.
Once the DNA has been cut, natural DNA repair mechanisms and homologous recombination enable the incorporation of a modified sequence or a new gene.
The success of these different stages (recognition, cleavage and recombination) depends on various factors, including the efficacy of the vector that introduces the enzyme into the cell, the enzyme cleavage activity, the cell’s capacity for homologous recombination and probably the state of the chromatin at the given locus.
Meganucleases, discovered in the late 1980s, are enzymes in the endonuclease family which are characterized by their capacity to recognize and cut large DNA sequences (from 12 to 40 base pairs). The most widespread and best known meganucleases are the proteins in the LAGLIDADG family, which owe their name to a conserved amino acid sequence.
These enzymes were identified in the 1990s as promising tools for genome engineering. However, even though they occur in nature and each one exhibits slight variations in its DNA recognition site, there is virtually no chance of finding the exact meganuclease required to act on a specific DNA sequence. Each new genome engineering target therefore requires an initial protein engineering stage to produce a custom meganuclease.
There are two methods for creating custom meganucleases:
- Mutagenesis involves generating collections of variants using a meganuclease with properties similar to the desired enzyme, then selecting these variants using high-throughput screening. This procedure can be optimized by adopting what are known as “semi-rational” methods, in which the structural data is electronically processed in order to focus the mutagenesis to the part of the enzyme that interacts with DNA and triggers the cleavage.
- Combinatorial assembly is a method whereby protein subunits from different enzymes can be associated or fused.
These two approaches can be combined. A large bank containing several tens of thousands of protein units has been created. These units can be combined to obtain chimeric meganucleases that recognize the target site. This technique has enabled the development of several meganucleases specific for sequences in the genomes of viruses, plants, etc.
Zinc finger nuclease-based engineering
Zinc finger motifs occur in several transcription factors. The zinc ion, found in 8% of all human proteins, plays an important role in the organization of their three-dimensional structure. In transcription factors, it is most often located at the protein-DNA interaction sites, where it stabilizes the motif. The C-terminal part of each finger is responsible for the specific recognition of the DNA sequence.
The recognized sequences are short, made up of around 3 base pairs, but by combining 6 to 8 zinc fingers whose recognition sites have been characterized, it is possible to obtain specific proteins for sequences of around 20 base pairs. It is therefore possible to control the expression of a specific gene. It is also possible to fuse a protein constructed in this way with the catalytic domain of an endonuclease in order to induce a targeted DNA break, and therefore to use these proteins as genome engineering tools.
The method generally adopted for this involves associating two proteins – each containing 3 to 6 specifically chosen zinc fingers – with the catalytic domain of the FokI endonuclease. The two proteins recognize two DNA sequences that are a few nucleotides apart. Linking the two zinc finger proteins to their respective sequences brings the two endonucleases associated with them closer together. This means that they can be dimerized and then cut the DNA molecule.
Several approaches are used to design specific zinc finger nucleases for the chosen sequences. The most widespread involves combining zinc-finger units with known specificities (modular assembly). Various selection techniques, using bacteria, yeast or mammal cells have been developed to identify the combinations that offer the best specificity and the best cell tolerance.
Transcription activator-like effector nucleases (TALENs) are artificial restriction enzymes generated by fusing a specific DNA binding domain to a generic DNA cleaving domain. The DNA binding domains, which can be designed to bind any desired DNA sequence, come from TAL effectors, DNA-binding proteins excreted by certain bacteria that infect plants. TALENs are used in a similar way to designed zinc finger nucleases.
CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats) are genetic elements that bacteria use as a kind of acquired immunity to protect against viruses. They consist of short sequences that originate from viral genomes and have been incorporated into the bacterial genome. Cas (CRISPR associated) proteins process these sequences and cut matching viral DNA sequences. By introducing plasmids containing Cas genes and specifically constructed CRISPRs into eukaryotic cells, the genome can be cut at any desired position.
The CRISPR/Cas9 system has revolutionized the field of genome engineering since 2013. Editing genomes with CRISPR involves expressing the RNA-guided Cas9 endonuclease along with guide RNAs directing it to cut particular target sequences to be edited. When Cas9 cuts the target sequences, the cell is induced to repair the damage by replacing the original sequence with an altered version delivered by the researcher. Because it is trivial to make a guide RNA directing Cas9 to cut any gene, CRISPR tremendously simplifies the process of deleting, adding, or modifying genes. As of 2014 it had successfully been tested in 20 species, including human cells. In many of these species, the edits were accomplished in cells that give rise to sperm or eggs, allowing them to be inherited by future generations.