Almost all human genetic variation is relatively insignificant biologically- that is, it has no apparent adaptive significance. Some variation such as a neutral mutations, alter the amino acid sequence of the resulting protein but produces no detectable change in its function. Other variation, for example, silent mutations, do not even change the amino acid sequence of a polypeptide.
Furthermore, only a small percentage of the DNA sequences in the human genome are coding sequences (sequences that are ultimately translated into protein) or regulatory sequences (sequences that can influence the level, timing, and tissue specificity of gene expression). However, these supposedly silent variations may be useful in mapping specific genes in the human genome, is not allowing the study of variation amongst individuals in a population flourish. The co-existence of more than one variant of an allele is called genetic polymorphism.
More precisely, an allele is usually defined as polymorphic if it is present at a frequency of >1% in a population. Variation among individuals however, need not only occur in base sequences in deoxyribonucleic acid (DNA), which codes for the production of a polypeptide molecule (gene) or even promoters. It is therefore essentail to determine the types of polymorphism inherent in individuals of a species, before these DNA variations can be studied on a molecular level. Single-nucleotide polymorphs (SNPs) consist of differences in the identity and frequency of a single nucleotide pair at a particular locus.
For example, some individuals in a population may have the base pair T-A at the chromosomal site whereas others may have the pair C-G instead. SNPs are the most common form of genetic variation amongst individuals of a species because they are distributed uniformly along all 46 chromosomes. In the human genome, any two randomly chosen DNA molecules are likely to differ at loci every 1000-3000bp in gene-coding DNA, in comparison to one SNP site in every 500-1000bp of non-coding DNA1.
One of the major achievements of the Human Genome Project has been to generate 300,000 of these SNPs with the ultimate goal of creating an SNP map of the entire human genome2. In comparison, simple tandem repeat polymorphs (STRPs) consist of a short DNA sequence that has been repeated many times in tandem at a particular locus in the genome. A STRP with a repeating unit length of 2-9bp is labelled a microsatellite3 (e. g. the repetition of CAG –> CAGCAGCAG in a length of DNA), whereas a repeated unit length of 10-60bp is deemed as a minisatellite or variable number tandem repeat (VNTR).
Distinguishing between two types of non-coding polymorphs allows us to determine how these variations can be practically established among individuals, and why such an exercise is applicable in modern genetics. A restriction endonuclease is derived from bacteria that utilise the enzyme to disintegrate the genetic material of invading bacteriophages which are parasitic. Usually, restriction endonucleases cut DNA at specific palindromic target sites, scissoring a genome into various lengths, depending on the number of identifiable target sites present in the subjected piece of DNA.
If restriction enzymes such as BamH1, EcoR1, HaeIII or Hinf1, are applied to a length of deoxyribonucleic acid which already contains a single nucleotide polymorphic site specifically within a restriction site, then the DNA strand will be cleaved into two fragments, as expected. However, if the variant of the polymorphic site eliminates the restriction site due to a change in a single base pair, a restriction fragment length polymorphism (RFLP) has been cultivated.
An example of such a situation is shown below, where an SNP consists of a T-A nucleotide pair in one DNA molecule and a C-G pair in another molecule. The selected restriction endonuclease, EcoR1, will cut in the middle of a palindromic site reading 5′-GAATTC-3. ‘ In this incidence, molecules with the T-A base pair at the SNP site will be cleaved in the middle of the restriction site, cleaving the original length of DNA into two distinct fragments.
Alternatively, DNA molecules with the C-G hydrogen bonded base pair will not be cleaved in the middle of the site, as the presence of C-G pair destroys the palindromic restriction site for EcoR1. Thus, only one, larger fragment will be yielded (Figure 1). If the same procedure is applied to DNA molecules containing variable numbers of simple tandem repeat polymorphs, the number of copies of the tandem repeat unit determines the size of the single molecule that is produced after a restriction endonuclease is added, illustrated below:
RFLPs may only be distinguished amongst varying individuals once electrophoresis and Southern blotting hybridisation have taken place. This occurs when DNA fragments in solution, (produced by the addition of a restriction endonuclease), are separated out into their various cleaved lengths, exploiting the fact that deoxyribonucleic acid polynucleotides have an overall negative charge due to the numerous negatively charged phosphate groups within the sugar-phosphate backbones of the double helix.
If the terminals of an electric power source are connected to the opposite poles of a horizontal tube or tray containing the DNA solution from an individual, the DNA fragments will all move towards the anode at a rate that is dependant on the strength of the electric field, the shape and size of the molecules and the period of time for which the current is applied. The most common form of electrophoresis is gel electrophoresis, where a thin layer of agarose or acrylamide is prepared, containing numerous small wells into which fragmented DNA samples of various individuals are placed.
When an electric field is applied, the DNA is drawn towards the positively charged anode. However, because the gel is a complex molecular network of convoluted passages, it acts as a sieve would, allowing smaller DNA molecules to pass through more easily than larger fragments. Thus, the rate of movement towards the anode increases as the size of the DNA molecule decreases. When the positive and negative terminals are removed from the electrophoresis tray, discrete DNA containing regions in the gel, called bands, may be visualised after staining with ethidium bromide dye and viewed under ultraviolet light.
The use of control DNA of a known size is indispensable, producing distinctive bands in the agarose gel in which the length of each DNA fragment at each band is known, in effect, calibrating the measurement of bands produced by various samples. Note that to produce a significant band by this method, a minimum mass of 5 X 10-9 g of DNA is required. Following gel electrophoresis, alkali is added to the tray to denature the DNA molecules and render them single stranded. Next, a layer of nitrocellular filter is gently placed on top of the agarose gel and stacked with layers of absorbent paper.
This absorbent paper draws water molecules via the capillary action, together with the DNA fragments, from the gel and onto the nitrocellulose until they adhere with the relative positions of the DNA fragments maintained. The filter is treated so that the singly stranded DNA fragments become permanently constrained in their positions. This treated filter is then mixed with a solution containing denatured probe- a short length of single stranded DNA/RNA that is identical to the base sequence of one of the original DNA fragments (e. . a specific polymorph) and complementary to its other strand. The probe is often radioactively labelled e. g. isotope 32P, so that the particular band containing the required DNA variants may be located by placing the paper in contact with x-ray film. The result is an photomicrograph revealing a unique pattern of the banding of various lengths of polymorphic DNA in individuals, after samples of their genetic material was subjected to cleavage by restriction endonucleases.
In the above diagram (Fig. 4), the single tandem repeat polymorphism has 5 different ‘alleles’ which can be distinguished by Southern blotting using a probe unique to the sequences of all the tandem repeats. This locus is therefore said to have multiple ‘alleles’ within the population, although one chromosome can carry only one ‘allele’ and an individual’s genotype may only carry a maximum of two different ‘alleles. Nevertheless, a large number of alleles contained in the population produce a large number of possible genotypes; subsequently STRPs are highly constructive in their ability to establish an individual’s identification during DNA fingerprinting (or typing), due to this broad variance amongst a populace. Once variations at allele polymorphic sites have been distinguished amongst individuals, how could a particular polymorphism then be proven to contribute to a particular genetic disease or trait?
When one seeks to identify a gene responsible for a disease, unless there are gross deletions or other obvious changes which identify the damaged gene in patients, we have a choice of approximately 80,000 DNA gene loci from which to determine a disease causing gene. This typical problem occurs when a mutation (polymorphism) has a known effect on the organism’s phenotype and there is some idea of the gene locus on a chromosome yet no corresponding knowledge of the actual gene or altered protein transcribed and translated from it.
In such cases, the utility of DNA polymorphisms in locating and identifying disease genes resulting from genetic linkage (the tendency for genes that are sufficiently close together on a chromosome to be inherited together) is evident. For instance, if restriction polymorphisms are known to occur at random in the genome, some should occur near any particular target gene. Thus, the closer the marker to the disease causing gene, the less likely that recombination will occur between the two. Any gene that is not separated from a particular polymorphism is therefore a candidate for a disease locus.
So, if the inheritance of a disease is traced through a family by observing the individuals’ phenotypes, together with a traced inheritance of particular polymorph markers, one can comprehend that certain DNA markers must be sufficiently close to a disease gene on a particular chromosome if they tend to be inherited in a pedigree, concluding that the closer the marker, the stronger the association with the disease gene. By establishing such a relationship, the gene location can consequently be identified, initiating an examination of the gene mutation and a study of its functions made.
In this way, DNA polymorphism markers in various individuals that are made palpable through a variety of techniques such as gel electrophoresis and Southern blotting, allow the genetic mapping of DNA mutations, which cause particular diseases/traits. Once the specific disease causing gene and its exact position on a chromosome has been determined, an investigation into the nature and function of the gene requires the particular length of mutated DNA in isolation and its propagation, so that copies of the various polymorphism can be further investigated. This is where the polymerase chain reaction (PCR) is valuable.
PCR exploits the remarkable and natural function of enzymes known as polymerases, whose cellular function is to copy genetic material, whilst also proofreading it and correcting any misread copies. The polymerase chain reaction synthesised in laboratories allows vast quantities of a particular DNA sequence to be amplified in vitro. The required length of target DNA is cut from a genome, utilising restriction endonucleases. A vast excess of oligonucleotide primers, usually 18-22 nucleotides in length and complementary to the ends of the DNA sequence to be amplified are added to a solution containing the DNA segments.
DNA polymerase enzymes are obtained from thermostatically stable organisms (thermophiles) such as the Thermus aquaticus (Taq) bacteria which inhabit hot springs whose temperatures exceed 90? C. These, together with a surplus of all four nucleoside triphosphates are also added to the solution of desired DNA template molecules. The temperature of the mixture is then raised to around 95? C to disrupt hydrogen bonding in the DNA duplex, so that each of the strands becomes separated from the other. The temperature of the mixture is then diminished to a value of approximately 50-60?
C, permiting the specific primers in excess to become annealed to the separate template strands. Note that a region of duplex DNA present in the reaction mixture can only be PCR-amplified if the section is flanked by these primer oligonucleotides. It is also important to understand that the primers, although different from each other, are complementary to the sequences present in opposite strands that flank the DNA region to be amplified. The primers are orientated with their 3′ ends pointing in the direction of the DNA amplification region- each DNA strand is henceforth elongated only at the 3′ end.
To compete the cycle, the temperature is raised slightly to 70? C for the lengthening of the primer by DNA polymerase, using the templates from the original DNA molecule. One cycle is typically completed in 1-3 minutes. As a result, the amount of DNA in the mixture doubles. To begin the second cycle, the temperature is raised once again to denature all DNA duplexes present in solution. As successive cycles of denaturation, primer annealing and elongation occur, the original parent strands of DNA are further diluted by the proliferation of new daughter strands.
Although having only been invented 10 years ago by the Californian scientist Kary B. Mullis, who was subsequently awarded the 1993 Nobel Prize for such a revolutionary contribution, polymerase chain reaction amplification has dramatically transformed genomic science. The fastidious sensitivity of PCR has lead to its use in DNA typing in criminal cases where only a trace amount of template DNA has been salvaged from sparse biological molecular basis of each mutation and to study DNA sequence variation among polymorphisms of a gene present in natural populations.
PCR is used to amplify different genes in various species that code for the same or very similar functions, too. The study of polymorphisms together with PCR may yield predictive tests that enable doctors to decipher whether a person is predisposed to common disorders, not customarily consider genetic, such as cancers or coronary heart disease. For example, PCR analysis of cells shed in faeces or in the circulatory system of patients may allow premalignant changes in cells of a newly diagnosed tumour, to be discovered earlier than would commonly occur.
The disadvantages associated with PCR arise when there is contamination of a sample with extraneous genetic material, generating numerous copies of irrelevant DNA, which may lead to erroneous conclusions being made. Preventing contamination is a special challenge for laboratories in human applications, such as law or medicine, where lives are affected by these laboratory results. Unlike cell based cloning where the size of the cloned DNA sequences can approach two mega-bases, PCR can only clone size ranges on the lower end of 0. 1-5 kb long.
Amplification of longer lengths is usually accompanied by a decrease in the Taq polymerase efficiency. The fidelity of DNA replication in vivo is, in contrast, extremely high: only one base in about 3 X 109 is incorrectly copied, on average. This fidelity is due to proofreading mechanisms associated with DNA polymerases- DNA polymerases categorically require the 3′ hydroxyl end of a base paired primer strand to act as a substrate for a chain reaction to occur, those that do not have this potentially ‘free’ 3′ hydroxyl are not recognised as templates for DNA synthesis.
The Taq DNA polymerase, however, has no 3′–> 5′ exonuclease to confer this proofreading function, thus the error rate in vitro is high: 40% of synthesised PCR strands differ from an original 1kb sequence after it has undergone 20 cycles. By employing alternative heat stable DNA polymerases such as those extracted from the bacterium Pyrococcus furiosus4, proofreading mechanisms are conferred due to its associated 3′–>5′ exonuclease activity. The same 1kb segment of DNA would only experience changes in 3. 5% of the amplified copies after 20 cycles5.
One of the main disadvantages of PCR is that prior sequence information is required of the primer oligonucleotides and normally the DNA region of interest should at least be partially characterised by previously using restriction enzymes, electrophoresis and Southern blotting and sometimes cell cloning and its further application in genetically engineered organisms. Southern blotting in itself is limited by the requirement of known sequences to synthesise particular probes. Such information does not prove a severe handicap for well studied organisms e. g. omesticated, selectively bred animals, as research materials and sequencing information is readily available. However, for less documented organisms, genetic analysis of various traits within a population, arising from polymorphisms, can be carried out using a technique referred to as random amplified polymorphic DNA (RAPD. ) RAPD draws on a set of polymerase chain reaction primers with a length of 8-10 nucleotides long, whose sequence is effectively random. The genomic DNA of an organism is extracted and subjected to these random primers, usually in pairs.
Due to the short length of the primers, they often anneal at multiple sites along the DNA of the individual. Primers that anneal in the correct orientation and at a suitable distance from each other, amplify unknown DNA sequences between them. The resulting amplified sequences from a genome will vary from organism to organism in a given species, emphasising the presence or absence of a polymorphic site within a population. The RAPD may then be subjected to gel electrophoresis and analysis after staining with ethidium bromide.
When comparing band variations between individuals of a species, those that have the same amplified bands are deemed as monomorphic whereas those with divergence in band patterns contain RAPD polymorphisms. Note that discussions of variations of such organisms are very often referring to the objective band patterns observed after genetic analysis rather than varying physiological or morphological characteristics. In conclusion, it is possible to measure variations in single nucleotide polymorphisms and variable number tandem repeats amongst a population when restriction enzymes are applied.
The resulting random fragment length polymorphs produced can then be used as genetic markers to further identify hereditary predispositions to certain diseases. The polymerase chain reaction allows a small sample of a desired DNA to be augmented so that many copies are available for further investigation of the gene. Variations in ‘alleles’ between individuals can be clearly portrayed after their translation to an x-ray micrograph, after carrying out gel electrophoresis and Southern blotting (created by Edward Southern) on desired DNA fragments produced by restriction endonucleases.