Genotyping with Molecular Markers: Scoring a Molecular Marker on an Agarose Gel

Authors:

Heather Merk, The Ohio State University; Deana Namuth-Covert, University of Nebraska-Lincoln; Matthew Robbins, The Ohio State University

This page teaches users how to genotype a molecular marker, how to organize genotypic data for analysis with Joinmap and MapMaker software, and how to test whether genotypic data meets an expected segregation pattern using the chi-square test. Sample data is provided.

Learning Objectives

At the end of this lesson you should:

  • Be familiar with the conventional layout of an agarose gel photo;
  • Be able to score genotypic data; and
  • Be able to organize genotypic data in a Microsoft Excel spreadsheet.

Introduction

The purpose of this article is to provide an example of how to genotype individual tomato plants with a molecular DNA marker. There are several different molecular marker systems available to assist plant breeding programs. For the purposes of this lesson, the marker chosen as an example is a cleaved amplified polymorphism (CAP) marker, a type of marker that is often visualized by gel electrophoresis. Briefly, a CAP marker exploits differences in DNA sequences between two polymerase chain reaction (PCR) products based on the presence or absence of restriction enzyme cutting sites found within that segment of DNA. To genotype a CAP marker, the segment of DNA is amplified using PCR then cut with a restriction enzyme (referred to as digestion, or restriction enzyme digestion), which only cuts at a specific DNA sequence. After digestion, the DNA is separated on agarose gel. CAP markers are designed so that the restriction enzyme will cut the DNA of one genotype, but not another.

Although different breeding program schemes can be used, in this particular case, the individual plants are from an F2 population that is segregating for the marker. In all breeding programs, the specific marker being used must be segregating among the plant population being used in order to be useful.

CAP markers are generally visualized using gel electrophoresis. When scoring any molecular DNA marker using gel electrophoresis, keep the following considerations in mind:

  1. Include a molecular weight ladder. This is like a DNA size ruler that contains DNA fragments of known molecular weight in base pair length (Fig.1). Since many markers are scored based on their molecular weight in DNA base pairs (bp), this ladder is essential to determine the molecular weight of each band in a gel
  2. Include controls. In addition to the individuals being genotyped, individuals of known genotype (often the parents of the population) should be included to make sure to identify the correct bands in the gel to score in the population.
  3. Know the characteristics of the molecular DNA marker in the germplasm you are using. Important attributes include the expected banding pattern (one band or multiple bands), the molecular weight of each segregating band, if the marker is dominantly or codominantly inherited, and so forth.

All these considerations will make it easier to score a marker from a gel photo. Next we will follow a specific CAP marker example in a tomato breeding program.

Genotyping Example

The gel photo below (Fig.1) is a CAP marker, CosOH57, genotyped in 30 individuals that were part of a larger F2 population developed from the parents OH88119 and 06.8068. The population was developed as part of a breeding project to incorporate bacterial spot resistance into elite germplasm. In order to score the gel, the bands are evaluated based on the considerations listed above:

  1. Molecular weight ladder. The ladder is in lane 1 and is a 100 base pair ladder.
  2. Controls. The parents of the cross are included in the gel photo in lanes 2 and 3; they provide a reference for the F2 plants. Notice the difference in banding patterns between the two parents, with OH88119 showing a band at 216 bp and 06.8068 showing two bands, one at 145 bp and another at 71 bp. These are the bands we will follow in the 30 F2 progeny (Fig. 1).
  3. Marker characteristics. As we described above, CAP markers must be amplified using PCR and then digested with a restriction enzyme. In this case, the PCR products for the parents OH88119 and 06.8068 have the same molecular weight (216 bp). However, after restriction enzyme digestion with restriction enzyme, Tth111I, the PCR product from OH881119 is not cut (and remains 216 bp long), whereas the PCR product from 06.8068 is cut into two pieces of 145 and 71 bp. Like most CAP markers, CosOH57 is codominant. In heterozygous individuals, the OH88119 allele will not be digested, producing the 216 bp band, but the 06.8068 allele will produce the two smaller bands, so all three bands are present after digestion.
  4. The individuals in lanes 4 through 33 are part of an F2 population derived from crossing OH88119 and 06.8068. The 30 F2 individuals genotyped with CosOH57 should segregate in a 1:2:1 ratio (homozygous for parent A allele : heterozygous: homozygous for parent B allele). Think of it like a simple Aa x Aa selfing of F1s to give 1AA: 2Aa :1aa in the F2 generation.


Figure 1. Example gel photo of CAP marker CosOH57. The gel includes a DNA ladder, the parental genotypes (OH88119 and 6.8068), and 30 F2 individuals. Photo credit: Matthew Robbins, The Ohio State University.

Scoring the Gel

Knowing the information outlined above, the gel can be scored. Most computer programs that use marker data in subsequent analyses have a specified data format. For segregating populations, many programs code the data in relation to the parents. For example, Joinmap and MapMaker, two programs that are commonly used for mapping, code genotypes from an F2 population as follows:

Table 1: Genotype codes for an F2 population.
Code Genotype
A homozygous for parent 1 allele
B homozygous for parent 2 allele
H heterozygous
C not genotype A (dominant B allele, so could be a genotype like parent 2 or heterozygous)
D not genotype B (dominant A allele, so could be a genotype like parent 1 or heterozygous)
“.” genotype unknown (missing data)

Keep in mind the following when scoring the genotypes:

  1. The determination of which parent is “parent 1” or “parent 2” is arbitrary. BUT the parental designation MUST be consistent for all markers scored on the same population. In this example, OH88119 is parent 1 (coded as A) for CosOH57, so OH88119 MUST also be parent 1 for all other markers on this population.
  2. The A, B, and H codes are applied to codominant markers, while A and C (parent 2 allele is dominant) or B and D (parent 1 allele is dominant) codes are for dominant markers.
  3. It is also important to code for unknown or missing data—a period, in this example.

Using the genotypic codes, each individual tomato plant is scored (Fig. 1). In the example we are following, CosOH57 is a codominant marker, so the 30 F2 individuals are coded as “A” when only the 216 bp band is present, “B” when a plant has both the 145 and 71 bp bands present, or “H” when all three bands are showing for an individual tomato plant.

Genotypic scores can also be coded by the molecular weight of the fragment. This is useful when genotyping a set of individuals without common parents, and especially if multiple alleles of the marker are present. In this simpler CosOH57 example, using the molecular weight scoring method, parent 1 would be scored as “216” and parent 2 could be scored as either “145” or “71.”

Organizing Genotypic Data

Once the molecular marker is scored, it is useful to organize the data in a spreadsheet or table format. This allows data from other markers genotyped in the same population to be combined in preparation for mapping or other analyses. The individual genotypes for CosOH57 have to be reorganized into a table with markers as rows and individual plant genotypes as columns (Table 2). It is important that “F2 Plant #1” is always the same plant, no matter the particular marker being genotyped. This is a common format for mapping software. The rows for Marker2 and Marker3 indicate that genotypic data can be added for additional markers. Although parental genotypes are not included in mapping analysis, it is useful to keep them with the data for reference.

Table 2. Table with genotypic data organized with markers as rows and individual genotypes as columns.
Marker OH88119 6.8068 F2 Plant 1 F2 Plant 2 F2 Plant 3      …
CosOH57 A B A A H  
Marker2            
Marker3            
           

Data Verification by Chi-square Test

Data summaries are also useful to check whether the data collected seems reasonable based on what you expect for a particular population, or if something else may be going on, such as the marker being linked to a trait we are selecting for or forces such as natural selection are distorting the expected segregation pattern. In our example, we may want to verify that the CosOH57 marker genotypes segregate as expected—1:2:1—using a chi-square goodness-of-fit test (note: For a refresher on how to use chi-square, you may want to take a look at the chi-square lesson). The data for the gel photo above, not including the parents, is summarized in Table 3. The observed column is determined simply by counting the number of individual plants with each genotype. The expected number of each genotype is calculated by multiplying the expected frequency of the genotype by the total number of plants being genotyped:

Expected = Expected Frequency x Total

The expected frequency is determined based on the segregation ratio of 1:2:1 for our F2 population, which is 0.25: 0.5 :0.25. Thus, the expected frequency of the “A” genotype for CosOH57 is:

Expected “A” Genotype = Expected Frequency of “A” Genotype x Total Number of F2 Plants Being Genotyped

or

Expected “A” Genotype = 0.25 x 30 = 7.5

The expected frequencies and number of each genotype are also presented in Table 3.

Table 3: Summary of the CosOH57 F2 gel data.
Genotype Observed Expected frequency Expected
A 13 0.25 7.5
B 7 0.25 7.5
H 10 0.5 15
Total 30 1 30

When the observed and expected numbers are used in a chi-squared goodness-of-fit test, the calculated p value is 0.057. Since this p value is a little greater than 0.05, a common level to declare significance, there is some evidence that CosOH57 may segregate as expected. Closer inspection of the data indicates that the actual observed frequency of genotype “A” may be higher than expected, while the H genotype may be lower than expected. Additional caution should be exercised because the relatively small number of F2 individuals make it difficult to interpret this chi-square test. Ideally, statisticians recommend genotyping an F2 population using at least 50 individuals.

Conclusion

In this tutorial we learned how to genotype a CAP marker that was scored in an F2 population. The principles we used apply to any other molecular marker that we may genotype, particularly molecular markers genotyped on a gel. These general principles also apply to other plant breeding schemes. We also learned how to organize data so that we can use it for genetic mapping. Finally, we learned how to perform a chi-square analysis as an additional test to help us determine the reliability of a specific marker in our breeding population.

External Links

Additional Resources

For additional practice scoring an agarose gel:

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 659

Gel Electrophoresis Principles and Applications

Author:

Matthew Robbins, The Ohio State University

This module introduces gel electrophoresis principles and applications for genetics and plant breeding in text, animation, and video formats.

Introduction

Gel electrophoresis is commonly used in plant breeding and genomics for genotyping with molecular markers, but there are several other applications as well (see below). For example, specific DNA fragments used as markers and isolated from individual plants are amplified by the polymerase chain reaction (PCR) and the resulting DNA fragments are subsequently loaded on a gel. The gel is a solid, gelatin-like substance used to separate DNA fragments based on size. The gel is placed in a conductive salt buffer to which an electrical field is applied. As the negatively-charged DNA fragments migrate toward the positive pole, the gel acts as a size filter, with smaller fragments migrating faster than larger fragments.

Resources on Gel Electrophoresis

In addition, this video illustrates the basics of DNA extraction and gel electrophoresis in tomato:

The Plant and Soil Sciences eLibrary at the University of Nebraska-Lincoln has an informative lesson on gel electrophoresis, including an animation of the process:

Screenshot of the Gel electrophoresis animation from the Plant and Soil Sciences eLibrary
Photo credit: Plant and Soil Sciences eLibrary

Another animation on gel electrophoresis can be found at the Dolan DNA Learning Center, part of The Cold Spring Harbor Laboratory:

Screenshot of the gel electrophoresis animation at the Dolan DNA Learning Center
Photo credit: The Dolan DNA Learning Center

The Genetics Science Learning Center at the University of Utah also has an animation on gel electrophoresis:


Photo credit: The Genetics Science Learning Center

Applications of Gel Electrophoresis

DNA can be separated by electrophoresis to:

  • Visualize bands of a molecular marker to genotype individual plants
  • Verify amplification by PCR or sequencing reactions
  • Check the quality and quantity of genomic DNA after DNA extraction
  • Separate DNA fragments to clone a specific band

External Links

Additional Resources

Funding Statement

Development of this lesson was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 654

Analysis of Variance for Plant Breeding

Authors:

David M. Francis, The Ohio State University; Heather L. Merk, The Ohio State University; Matthew Robbins, The Ohio State University

This page provides an introduction to the analysis of variance (ANOVA), creating and interpreting simple ANOVA tables, and common applications of ANOVA to plant breeding. ANOVA has two common types of application in a plant breeding context: (1) evaluating treatment differences and (2) partitioning variance for heritability estimates.

Introduction

The analysis of variance (ANOVA) is a statistical tool that has two common applications in a plant breeding context. First, ANOVA can be used to test for differences between treatments in an experiment. Common examples of treatments are genotype, location, and variety. Second, ANOVA can be used to aid in estimates of heritability by partitioning variances. This module focuses on simple ANOVA models to evaluate differences between treatments.

Assumptions of ANOVA

Like other statistical tests, ANOVA assumes that certain assumptions are met. One of the principal assumptions of ANOVA is that the samples come from normally distributed populations, each with the same variance. In addition, it is assumed that the residuals come from a normally distributed population with equal variances (σ2). The Kruskal–Wallis test is an alternative to ANOVA when the above assumptions cannot be met.

Testing for Treatment Differences

ANOVA is a tool that can be used to test for differences among treatment means when the independent variable is categorical (e.g., genotypes could be AA, Aa, aa) and the dependent variable is continuous (e.g., yield measured in tons/acre). How does this work?

In ANOVA, the total variance of all samples is calculated. Portions of the total variance can be attributed to known causes (e.g., genotype). This leaves a residual portion of the variance that is uncontrolled or unexplained and is referred to as experimental error. Then the between-treatment variation (e.g., AA genotype variation vs. Aa genotype variation vs. aa genotype variation) is compared to the within-treatment variation (experimental error) (e.g., variation within the aa genotype) to assess whether differences in mean value between treatments are due to the treatment effects or chance.

In the simplest case, linear equations can be developed to describe the relationship between a trait and treatment. The question can then be asked, “which linear equation best fits the data for each treatment?” These linear equations take the following form:

Y = µ + f(treatment) + error

where

  • Y is equal to the trait value
  • µ is the population mean
  • f(treatment) is a function of the treatment
  • error represents the residual

In this module we provide two examples of ANOVA and sample data sets to assess differences in treatment effect. In the first example, four methods of soybean transformation are evaluated to determine whether transformation method affects expression of a stress-response gene. In the second example, two molecular markers are evaluated to determine whether genotype of each molecular marker results in differences in disease severity in a BC1 population.

Conclusion

ANOVA is a statistical tool that has applications to experiments in which we want to assess whether there is a difference in a continuous variable between treatment groups. In a plant breeding context, this page demonstrated the utility of ANOVA in gene expression studies and molecular marker analysis.

Additional Resources

Many statistics textbooks provide a good discussion of theory and applications of ANOVA. A few examples are listed below.

  • Clewer, A. G., and D. H. Scarisbrick. 2001. Practical statistics and experimental design for plant and crop science. John Wiley & Sons Ltd., New York.
  • Steel, R. G. D., J. H. Torrie, and D. A. Dickey. 1997. Principles and procedures of statistics a biometrical approach. The McGraw-Hill Companies, Inc., New York.

The following videos provide detailed instructions for calculating components of ANOVA tables (ANOVA1 and 2) and hypothesis testing (ANOVA3).

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 650

Cross-Pollinating Tomatoes for Hybrid Production and Population Development

Author:

David M. Francis, The Ohio State University

This video provides basic instructions for cross-pollinating tomato plants to make new hybrids or to begin the process of population development.

If you have problems viewing this video connect with our YouTube channel or see the YouTube troubleshooting guide.

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 620

2010 Tomato Disease Workshop Presentations

Authors:

John McQueen, Oregon State University

Heather L. Merk, The Ohio State University

The talks at the Tomato Disease Workshop 2010 were recorded live and edited into instructional video clips. They cover various topics such as the Genome Browser, Illumina arrays, accessing sequencing resources, setting up data pipelines, and more.

All talks were recorded in front of a live in-person and online audience at the Tomato Disease Workshop 2010 in Florida.

Next Generation Sequencing. Allen Van Denyze, University of California, Davis.

Accessing Sequence Resources. David Francis, The Ohio State University.

Tomato Genome Browser (GBrowse) for Plant Breeders. Heather Merk, The Ohio State University.

BioInformatics 101. David Francis, The Ohio State University.

Working with Tomato Infinium Genotyping Data. Allen Van Denyze, University of California, Davis.

Downstream Analysis with SNP Markers. Sung-Chur Sim, The Ohio State University.

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

Mention of specific companies is not intended for promotional purposes.

PBGworks 995

Downstream Analysis of SNP Markers Using MSA, GGT, and STRUCTURE Software

Authors:

Sung-Chur Sim, The Ohio State University; Heather L. Merk, The Ohio State University

This webinar tutorial and the accompanying pdf and zip files were presented at the 2010 Tomato Disease Workshop. These materials outline the utility of MSA, GGT, and STRUCTURE software for downstream analysis with single nucleotide polymorphism (SNP) markers. In addition, a case study demonstrates the use of STRUCTURE for association mapping of bacterial spot resistance in tomato.

The pdf at the bottom of the page (Downstream_SNP_Analysis) is a copy of Dr. Sim’s presentation. The accompanying zip files (Downstream_Supplemental_Files) at the bottom of the page include a sample data set, an Excel file with best K analysis, and SAS code for association analysis that incorporates the Q matrix.

In the first video, Dr. Sung-Chur Sim, The Ohio State University, outlines the utility of MSA, GGT, and STRUCTURE software for downstream analysis with single nucelotide polymorphism (SNP) markers. Dr. Sim also explains where and how to download the software and how to format input data.

 

If you experience problems viewing this video connect to our YouTube channel or see the YouTube troubleshooting guide.

In the second video, Dr. Sim demonstrates the use of STRUCTURE for association mapping of bacterial spot resistance in tomato. Dr. Sim introduces association analysis models and emphasizes the use of the Q matrix to correct for population structure in association mapping. In addition, Dr. Sim explains the detailed steps in STRUCTURE to infer the best “K” (number of populations), which is required to obtain the Q matrix.

If you experience problems viewing this video connect to our YouTube channel or see the YouTube troubleshooting guide.

Find all the presentations from the 2010 Tomato Disease Workshop

References Cited

  • Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Molecular Ecology 14: 2611–2620. (Available online at: http://dx.doi.org/10.1111/j.1365-294X.2005.02553.x) (verified 12 May 2012).

External Links

Additional Resources

Funding Statement

Development of this lesson was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 1005

Application of ANOVA for Plant Breeding: Single Marker Analysis Example using ANOVA in a Balanced Population (Sample SAS Program)

Author:

David M. Francis, The Ohio State University

This page provides a sample SAS program used to analyze molecular marker data using ANOVA for single marker analysis. Two molecular markers are evaluated to determine whether genotype of each molecular marker results in differences in disease severity in a BC1 population.

This module provides an example of using analysis of variance (ANOVA) to assess differences in tomato bacterial spot resistance due to molecular marker genotype (treatment effect) to determine whether genotype of each molecular marker results in differences in disease severity in a BC1 population.

The following links provide:

SAS Code

The SAS code is presented in two ways:

  • As a screenshot taken from SAS after the code has been entered (Fig.1)
  • As plain text

Figure 1. SAS Screenshot taken after SAS code for analysis of variance in a balanced population was entered. Screenshot credit: David Francis, The Ohio State University.

Plain Text SAS Code

data map;
     infile ‘a:\lnkspt.csv’; delimiter = “,” firstobs = 4;
     input gen vsc pop tg23 pto;
proc sort;
     by tg23;
proc glm;
     class tg23;
     model pop = tg23;
     means tg23 / lsd lines;
proc sort;
     by pto;
proc glm;
     class pto;
     model pop = pto;
     means pto / lsd lines;
run;

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 906

Equation to Estimate Sample Size Required for QTL Detection

Authors:

David M. Francis, The Ohio State University; Heather L. Merk, The Ohio State University

This page provides an equation to estimate the required sample size to detect quantitative trait loci (QTLs) of varying effect in F2 and BC1 populations.

Introduction

Population size, structure, and the number of molecular markers genotyped can have significant impacts on quantitative trait locus (QTL) detection. This page focuses on the effect population size can have on QTL detection. Prior knowledge of theories regarding the genetics controlling the trait can help guide breeders as to the number of plants required to detect QTLs of various sizes using the following equation. Solving the equation below can help breeders optimize resources while still gaining the desired outcome.

Equation to Calculate Minimum Sample Size

Using units of phenotypic standard deviations, sample sizes can be estimated using the following equations:

For an F2 population,

NF2 = [1 – r2F2 / r2F2] x {[z(1 – (α / 2)) / (1 – r2F2)1/2] + z(1 – β)}2 x [1 + (k2 / 2)]

For a BC1 population,

NBC1 = [1- r2BC1 / r2BC1 ] x {[z (1 – (α / 2)) /(1 – r2BC1)1/2 ] + z(1 – β)}2 x [1 + (k2 / 2)]

In these equations,

  • r2 is the fraction of phenotypic variance explained by the QTL. Note that r2 is valid only for simple models. Mean Squares provide an alternative method to estimate fraction of phenotypic variation when models have replication and location.
  • k is the dominance coefficient (k = 0 for completely additive trait, k = 1 for completely dominant trait and k = -1 for completely recessive trait). This equation assumes that the marker and QTL are coincidental.
  • α is the type I error (the probability of incorrectly identifying an association that does not, in fact, exist)
  • β is the type II error (the probability of failing to identify a true association)
  • From normal distribution tables:
    • z(1-(α/2)) = 1.96 at α = 0.05
    • z(1–β) = 1.28 at ß = 0.10

For example, in an F2 population, the estimated sample size required to detect a dominant QTL that is responsible for 30% of the phenotypic variation is 46.

NF2 = [(1 – 0.30) / 0.30] x {[1.96 / (1 – 0.30)1/2] + 1.28}2 x [1 + (12 / 2)] = 46

Funding Statement

Development of this lesson was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 862

Application of ANOVA for Plant Breeding: Single Marker Analysis Example using ANOVA in a Balanced Population (SAS Output)

Author:

David M. Francis, The Ohio State University

This page provides the output for the single molecular marker analysis using ANOVA example. Two molecular markers are evaluated to determine whether the genotype of each molecular marker results in differences in disease severity in a BC1 population.

This module provides an example of using analysis of variance (ANOVA) to assess differences in tomato bacterial spot resistance due to molecular marker genotype (treatment effect) in a BC1 population.

The following links provide:

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 907

Traditional Molecular Markers

Authors:

David M. Francis, The Ohio State University

Heather L. Merk, The Ohio State University

Deana Namuth-Covert, University of Nebraska-Lincoln

This page provides a simple overview of traditional molecular markers based on DNA sequence variation and size polymorphism. Isozyme, RFLP, RAPD, AFLP, microsatellite/SSR, SCAR, and CAP markers are presented. These tools are still used in plant breeding programs, though newer molecular marker tools should also be considered when determining a particular program’s needs and resources.

Introduction

Molecular markers have great potential to assist breeders in developing improved varieties by complementing phenotypic selection. They work by either measuring directly or indirectly a specific DNA sequence difference between various genotypes. When these markers are found to be linked to a trait of interest, it can aid the breeder in more efficiently selecting the plants or lines to move forward in the program (forward selection). Markers can also be used to improve all other trait combinations in the germplasm, either by crossing out unwanted alleles or maintaining those of value (background selection). This module is structured to give an overview of the traditional view of molecular markers; as bands on a gel. Therefore, traditional markers indirectly measure DNA differences. With advances in DNA sequencing techniques, the cost of directly determining the sequence of a DNA fragment has dropped considerably. As a result, newer molecular marker tools are quickly becoming more common and are more effective in meeting plant breeding needs on a large scale. However, not all laboratories can currently afford to invest in the equipment required to take advantage of newer technologies and the newer technologies may not be more efficient on a small scale. You can also learn about this alternative view of molecular markers, where marker data is typically presented to breeders as fluorescence intensity values. This alternative view includes information about single nucleotide polymorphism (SNP) markers.

To understand the concepts behind any molecular marker system first requires an understanding of DNA and its structure, including the nucleotides and DNA base pairing.

Traditional View of Molecular Markers

This list of traditional molecular markers including their acronyms can give us some idea of the basis of the molecular marker. It also describes the differences (polymorphisms) in DNA sequences they target.

Isozyme: By measuring variations in enzymes, isozyme analysis exploits differences in the genes that code for or regulate enzyme synthesis or activity.

RFLP (Restriction Fragment Length Polymorphism): Indirectly measure DNA sequence differences based upon the varying lengths of DNA fragments resulting from cutting it with restriction enzymes. These “fragment length polymorphisms” are visualized by hybridizing the cut DNA with labeled probes from DNA libraries.

RAPD (Random Amplified Polymorphic DNA): Utilizing a large number of short DNA primers with varying sequences, this technique exploits differences in the primer binding sites as different DNA will be amplified by the polymerase chain reaction (PCR).

AFLP (Amplified Fragment Length Polymorphism): Utilizing restriction enzymes and a large number of short DNA primers with varying sequences, this technique exploits differences in the primer binding sites as different DNA will be amplified using PCR.

SSR (Simple Sequence Repeat) or microsatellite: Using PCR, this technique exploits differences in short repetitive sequences (e.g., CAA vs. CAACAACAA) by using specifically designed DNA primers that bind on each side of repetitive DNA sequences.

SCAR (Sequence Characterized Amplified Region): Exploit length differences between two PCR products (not necessarily repeats) by using specifically designed DNA primers that bind on each side of a difference in DNA sequence. These are often created by sequencing RAPD marker PCR products and then designing more specific DNA primers than are used for the original RAPD markers.

CAP (Cut/Cleaved Amplified Polymorphism): Exploit differences in DNA sequences between two PCR products based on the presence or absence of restriction enzyme cutting sites. These markers are often designed from RFLP markers.

Steps to Marker Detection

Each marker is detected differently, which allows us to look at the different types of variation listed above.

Isozyme: separate by starch gel and stain

RFLP: digest with restriction enzyme (RE), separate by agarose gel electrophoresis, transfer DNA to membrane, hybridize with labeled probe, visualize by autoradiography

RAPD: amplify by PCR, separate by agarose gel electrophoresis, visualize with Ethidium Bromide (EtBr) stain

AFLP: digest with RE, ligate to linker, amplify by PCR with labeled primer, separate by polyacrylamide gel electrophoresis, visualize by autoradiography

SSR: amplify by PCR, separate by agarose gel electrophoresis, visualize with EtBr stain

SCAR: amplify by PCR, separate by agarose gel electrophoresis, visualize with EtBr stain

CAP: amplify by PCR, digest with RE, separate by agarose gel electrophoresis, visualize with EtBr stain

Looking at the steps to marker detection can help us figure out how easy or difficult it may be to genotype using a particular molecular marker. For example, RFLP markers require a lot of steps and they also require steps that none of the other markers do, in particular creating a library of DNA or cDNA probes. This suggests that RFLPs take very specialized knowledge and laboratory skills to perform. People who have worked with RFLPs can tell you this is true! They likely prefer to work with markers that are PCR-based because once you have learned the PCR technique, you can work with many different types of molecular markers and PCR does not take long. Table 1 provides us with some more information to help us compare the different molecular markers. We know that PCR-based markers are advantageous, but the table provides us with some information we wouldn’t necessarily know based on the marker names or the detection steps alone. The additional resources listed at the end of this page can provide you with more in-depth information about these molecular markers.

Table 1. Marker systems, genetic properties, strengths, and limitations.
Molecular Marker Type of Inheritance PCR-Based? Strengths Limitations
Isozyme Co-dominant No – enzyme activity base
  • Fast relative to RFLP
  • Limited number of loci
  • Limited alleles per locus
  • Protein is measured (therefore not exact measure of genotype)
  • Tissue specificity/ environmental regulation
RFLP Co-dominant No
  • Fast
  • Large number of loci
  • Pre-screen for single copy sequences to be used as probes
  • Slower than isozymes
  • Assumption that when samples share a fragment, they share flanking cleavage sites
RAPD Dominant Yes
  • Fast
  • Measures phenotype in outcrossing species
  • Multiple loci can be scored in single reaction
  • Sensitive to reaction conditions (reproducibility issues)
  • Assumption that when two samples share a fragment, it is the same locus
AFLP Dominant Yes
  • Detects large number of bands and therefore polymorphism
  • Multi-step, therefore high technical requirements
SCAR Co-dominant Yes
  • Fast
  • Requires sequence data, therefore expensive to develop primers
CAP Co-dominant Yes  
  • Requires restriction enzyme digestion of PCR product, enzymes can be expensive
  • Requires sequence data, therefore expensive to develop primers
SSR Co-dominant Yes
  • Fast
  • Commercially available for some crops
  • Detect multiple alleles
  • Requires sequence data, therefore expensive to develop primers

Additional Resources

For a further introduction to molecular markers, see Chapter 3 (p. 45–83), Introduction to Genomics, in:

  • Liu, B. H. 1998. Statistical genomics: Linkage, mapping, and QTL analysis. CRC Press, Boca Raton, FL.

For an introduction to molecular markers, linkage mapping, QTL analysis, and marker-assisted selection written for professional plant breeders, see:

  • Collard, B.C.Y., M.Z.Z. Jaufer, J. B. Brouwer, and E.C.K. Pang. 2005. An introduction to markers, quantitative trait locus (QTL) mapping and marker-assisted selection for crop improvement: The basic concepts. Euphytica 142: 169–196. (Available online at: http://dx.doi.org/10.1007/s10681-005-1681-5) (verified 24 Mar 2012).

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 856