Working with Infinium Genotype Data

Authors:

Allen Van Deynze, University of California, Davis; Kelly Zarka, Michigan State University

This page provides video and an audio-transcript of Dr. Allen van Deynze’s workshop “Working with Infinium Genotype Data”, originally presented at the SolCAP workshop at the Potato Association of America meeting in August 2010. This session focuses on highly parallel genotyping tools as we move away from scoring sequence polymorphism as a “band on a gel.”

This session will focus on highly parallel genotyping tools. We are moving away from scoring sequence polymorphism as a “band on a gel”. We will present the design of the SolCAP/Illumina consortium tool, discuss the resulting data format, and discuss quality control.

You can view the webinar below or at the SolCAP website.

View the webinar transcript.

If you experience problems viewing this video connect to our YouTube channel or see the YouTube troubleshooting guide.

 Presenter: Allen Van Deynze, University of California, Davis

View other SolCAP webinars

External Links

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

Mention of specific companies is not intended for promotional purposes.

PBGworks 884

Traditional Molecular Markers

Authors:

David M. Francis, The Ohio State University

Heather L. Merk, The Ohio State University

Deana Namuth-Covert, University of Nebraska-Lincoln

This page provides a simple overview of traditional molecular markers based on DNA sequence variation and size polymorphism. Isozyme, RFLP, RAPD, AFLP, microsatellite/SSR, SCAR, and CAP markers are presented. These tools are still used in plant breeding programs, though newer molecular marker tools should also be considered when determining a particular program’s needs and resources.

Introduction

Molecular markers have great potential to assist breeders in developing improved varieties by complementing phenotypic selection. They work by either measuring directly or indirectly a specific DNA sequence difference between various genotypes. When these markers are found to be linked to a trait of interest, it can aid the breeder in more efficiently selecting the plants or lines to move forward in the program (forward selection). Markers can also be used to improve all other trait combinations in the germplasm, either by crossing out unwanted alleles or maintaining those of value (background selection). This module is structured to give an overview of the traditional view of molecular markers; as bands on a gel. Therefore, traditional markers indirectly measure DNA differences. With advances in DNA sequencing techniques, the cost of directly determining the sequence of a DNA fragment has dropped considerably. As a result, newer molecular marker tools are quickly becoming more common and are more effective in meeting plant breeding needs on a large scale. However, not all laboratories can currently afford to invest in the equipment required to take advantage of newer technologies and the newer technologies may not be more efficient on a small scale. You can also learn about this alternative view of molecular markers, where marker data is typically presented to breeders as fluorescence intensity values. This alternative view includes information about single nucleotide polymorphism (SNP) markers.

To understand the concepts behind any molecular marker system first requires an understanding of DNA and its structure, including the nucleotides and DNA base pairing.

Traditional View of Molecular Markers

This list of traditional molecular markers including their acronyms can give us some idea of the basis of the molecular marker. It also describes the differences (polymorphisms) in DNA sequences they target.

Isozyme: By measuring variations in enzymes, isozyme analysis exploits differences in the genes that code for or regulate enzyme synthesis or activity.

RFLP (Restriction Fragment Length Polymorphism): Indirectly measure DNA sequence differences based upon the varying lengths of DNA fragments resulting from cutting it with restriction enzymes. These “fragment length polymorphisms” are visualized by hybridizing the cut DNA with labeled probes from DNA libraries.

RAPD (Random Amplified Polymorphic DNA): Utilizing a large number of short DNA primers with varying sequences, this technique exploits differences in the primer binding sites as different DNA will be amplified by the polymerase chain reaction (PCR).

AFLP (Amplified Fragment Length Polymorphism): Utilizing restriction enzymes and a large number of short DNA primers with varying sequences, this technique exploits differences in the primer binding sites as different DNA will be amplified using PCR.

SSR (Simple Sequence Repeat) or microsatellite: Using PCR, this technique exploits differences in short repetitive sequences (e.g., CAA vs. CAACAACAA) by using specifically designed DNA primers that bind on each side of repetitive DNA sequences.

SCAR (Sequence Characterized Amplified Region): Exploit length differences between two PCR products (not necessarily repeats) by using specifically designed DNA primers that bind on each side of a difference in DNA sequence. These are often created by sequencing RAPD marker PCR products and then designing more specific DNA primers than are used for the original RAPD markers.

CAP (Cut/Cleaved Amplified Polymorphism): Exploit differences in DNA sequences between two PCR products based on the presence or absence of restriction enzyme cutting sites. These markers are often designed from RFLP markers.

Steps to Marker Detection

Each marker is detected differently, which allows us to look at the different types of variation listed above.

Isozyme: separate by starch gel and stain

RFLP: digest with restriction enzyme (RE), separate by agarose gel electrophoresis, transfer DNA to membrane, hybridize with labeled probe, visualize by autoradiography

RAPD: amplify by PCR, separate by agarose gel electrophoresis, visualize with Ethidium Bromide (EtBr) stain

AFLP: digest with RE, ligate to linker, amplify by PCR with labeled primer, separate by polyacrylamide gel electrophoresis, visualize by autoradiography

SSR: amplify by PCR, separate by agarose gel electrophoresis, visualize with EtBr stain

SCAR: amplify by PCR, separate by agarose gel electrophoresis, visualize with EtBr stain

CAP: amplify by PCR, digest with RE, separate by agarose gel electrophoresis, visualize with EtBr stain

Looking at the steps to marker detection can help us figure out how easy or difficult it may be to genotype using a particular molecular marker. For example, RFLP markers require a lot of steps and they also require steps that none of the other markers do, in particular creating a library of DNA or cDNA probes. This suggests that RFLPs take very specialized knowledge and laboratory skills to perform. People who have worked with RFLPs can tell you this is true! They likely prefer to work with markers that are PCR-based because once you have learned the PCR technique, you can work with many different types of molecular markers and PCR does not take long. Table 1 provides us with some more information to help us compare the different molecular markers. We know that PCR-based markers are advantageous, but the table provides us with some information we wouldn’t necessarily know based on the marker names or the detection steps alone. The additional resources listed at the end of this page can provide you with more in-depth information about these molecular markers.

Table 1. Marker systems, genetic properties, strengths, and limitations.
Molecular Marker Type of Inheritance PCR-Based? Strengths Limitations
Isozyme Co-dominant No – enzyme activity base
  • Fast relative to RFLP
  • Limited number of loci
  • Limited alleles per locus
  • Protein is measured (therefore not exact measure of genotype)
  • Tissue specificity/ environmental regulation
RFLP Co-dominant No
  • Fast
  • Large number of loci
  • Pre-screen for single copy sequences to be used as probes
  • Slower than isozymes
  • Assumption that when samples share a fragment, they share flanking cleavage sites
RAPD Dominant Yes
  • Fast
  • Measures phenotype in outcrossing species
  • Multiple loci can be scored in single reaction
  • Sensitive to reaction conditions (reproducibility issues)
  • Assumption that when two samples share a fragment, it is the same locus
AFLP Dominant Yes
  • Detects large number of bands and therefore polymorphism
  • Multi-step, therefore high technical requirements
SCAR Co-dominant Yes
  • Fast
  • Requires sequence data, therefore expensive to develop primers
CAP Co-dominant Yes  
  • Requires restriction enzyme digestion of PCR product, enzymes can be expensive
  • Requires sequence data, therefore expensive to develop primers
SSR Co-dominant Yes
  • Fast
  • Commercially available for some crops
  • Detect multiple alleles
  • Requires sequence data, therefore expensive to develop primers

Additional Resources

For a further introduction to molecular markers, see Chapter 3 (p. 45–83), Introduction to Genomics, in:

  • Liu, B. H. 1998. Statistical genomics: Linkage, mapping, and QTL analysis. CRC Press, Boca Raton, FL.

For an introduction to molecular markers, linkage mapping, QTL analysis, and marker-assisted selection written for professional plant breeders, see:

  • Collard, B.C.Y., M.Z.Z. Jaufer, J. B. Brouwer, and E.C.K. Pang. 2005. An introduction to markers, quantitative trait locus (QTL) mapping and marker-assisted selection for crop improvement: The basic concepts. Euphytica 142: 169–196. (Available online at: http://dx.doi.org/10.1007/s10681-005-1681-5) (verified 24 Mar 2012).

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 856

Equation to Estimate Sample Size Required for QTL Detection

Authors:

David M. Francis, The Ohio State University; Heather L. Merk, The Ohio State University

This page provides an equation to estimate the required sample size to detect quantitative trait loci (QTLs) of varying effect in F2 and BC1 populations.

Introduction

Population size, structure, and the number of molecular markers genotyped can have significant impacts on quantitative trait locus (QTL) detection. This page focuses on the effect population size can have on QTL detection. Prior knowledge of theories regarding the genetics controlling the trait can help guide breeders as to the number of plants required to detect QTLs of various sizes using the following equation. Solving the equation below can help breeders optimize resources while still gaining the desired outcome.

Equation to Calculate Minimum Sample Size

Using units of phenotypic standard deviations, sample sizes can be estimated using the following equations:

For an F2 population,

NF2 = [1 – r2F2 / r2F2] x {[z(1 – (α / 2)) / (1 – r2F2)1/2] + z(1 – β)}2 x [1 + (k2 / 2)]

For a BC1 population,

NBC1 = [1- r2BC1 / r2BC1 ] x {[z (1 – (α / 2)) /(1 – r2BC1)1/2 ] + z(1 – β)}2 x [1 + (k2 / 2)]

In these equations,

  • r2 is the fraction of phenotypic variance explained by the QTL. Note that r2 is valid only for simple models. Mean Squares provide an alternative method to estimate fraction of phenotypic variation when models have replication and location.
  • k is the dominance coefficient (k = 0 for completely additive trait, k = 1 for completely dominant trait and k = -1 for completely recessive trait). This equation assumes that the marker and QTL are coincidental.
  • α is the type I error (the probability of incorrectly identifying an association that does not, in fact, exist)
  • β is the type II error (the probability of failing to identify a true association)
  • From normal distribution tables:
    • z(1-(α/2)) = 1.96 at α = 0.05
    • z(1–β) = 1.28 at ß = 0.10

For example, in an F2 population, the estimated sample size required to detect a dominant QTL that is responsible for 30% of the phenotypic variation is 46.

NF2 = [(1 – 0.30) / 0.30] x {[1.96 / (1 – 0.30)1/2] + 1.28}2 x [1 + (12 / 2)] = 46

Funding Statement

Development of this lesson was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 862