Population Development

PBGworks T80

Inbred Backcross (IBC) Lines and Populations

Author:

Matthew Robbins, The Ohio State University

This module provides visual and written explanations of inbred backcross population development, characteristics of inbred backcross populations, and examples of the method’s use from the scientific literature. Inbred backcross populations can be used to identify genetic factors that underlie quantitative traits and are developed in a two-stage process of backcrossing then inbreeding.

Introduction

The inbred backcross (IBC) population was proposed by Wehrhahn and Allard (1965) as a way of identifying genes or quantitative trait loci (QTL) that contribute to a quantitatively inherited trait. This is accomplished by developing a population that collectively contains most of the genome of a donor parent, divided among each individual line in the population. The majority of the genome of each line is from the recurrent parent, with a small portion from the donor parent. IBC breeding has also been employed for the introgression of exotic germplasm to improve quantitative traits in crop plants. This method has been utilized in bean (Bliss, 1981; Sullivan and Bliss, 1983), oilseed rape (Butruille et al., 1999), rice (Lin et al., 1998), cucumber (Robbins et al., 2008) and tomato (Hartman and St Clair, 1999; Doganlar et al., 2002; Kabelka et al., 2002; Kabelka et al., 2004; Yang et al., 2005, Robbins et al., 2009) for classical breeding and QTL studies.

Development of an IBC Population

The first stage of generating an IBC population (Fig. 1, steps 1–3) is similar to generating a backcross breeding population. One distinction is that many individuals are backcrossed to the recurrent parent to generate an IBC population. The second stage (Fig. 1, step 4) is similar to single-seed descent to generate recombinant inbred lines (RILs).

  1. An inbred donor parent is crossed to an inbred recurrent parent to produce an F1, which is fully heterozygous.
  2. The F1 is backcrossed to the recurrent parent to generate the BC1.
  3. A large number of BC1 individuals are backcrossed to the recurrent parent to generate the BC2 generation. Seed is saved from each individual. Each line is backcrossed to the recurrent parent for several generations. The total number of backcross generations, including the BC1 generation, is called k.
  4. Individuals in the BCk population are self-pollinated until they reach homozygosity (usually five or more generations) using the single-seed descent method. The IBC population consists of all of the individual backcross-inbred lines.

Schematic demonstrating the steps to develop an inbred backcross (IBC) population
Figure 1. Schematic illustrating the development of an inbred backcross (IBC) population. Figure credit: Matthew Robbins, The Ohio State University.

An important consideration in creating an IBC population is the number of backcrossing generations. More backcrossing ensures that the IBC lines will be more like the recurrent parent, since the percentage of the genome from donor parent is reduced by half with each generation of backcrossing (see article on backcrossing). However, the probability of recovering the genes from the donor parent is reduced by half each generation due to the backcrossing process. The probability of recovering the gene(s) from the donor parent is (1/2)k+1 for a single gene and (1/2)2k+2 for two unlinked genes.

Advantages of an IBC Population

  • An immortal population. Each line in an IBC population is inbred and can be propagated simply by self-pollination.
  • The population can be replicated. Since each entry of the population is a line and not an individual, traits can be measured on a plot basis rather than an individual plant basis. This allows the population to be evaluated in multiple environments over years, which increases the precision of trait measurements.
  • A breeding friendly population. Since the majority of the genome of each entry in an IBC population is from the recurrent parent, which is typically an elite line, IBC lines can directly be used in crosses with minimal germplasm improvement.
  • Mapping quantitative traits. The structure of IBC populations makes them a good population for mapping quantitative traits using single factor analysis.
  • Simultaneous discovery and introgression. Quantitative traits can be mapped and introgressed in the same population.

Disadvantages of an IBC Population

  • Time. Developing an IBC population requires a minimum of eight generations (Fig. 1).
  • Limited ability to study epistatic interactions. Since only a small part of the donor genome is represented in each line, it is difficult to study the interaction of multiple, unlinked genes from the donor parent.
  • Not amenable to some QTL mapping methods. Because the structure of an IBC population is not a simple segregating population, the algorithms of the majority of QTL mapping software are not designed to work with this population type. It is not practical to use interval or composite interval mapping methods on an IBC population.

References Cited

  • Bliss, F. A. 1981. Utilization of vegetable germplasm. HortScience 16: 129–132.
  • Butruille, D. V., R. P. Guries, and T. C. Osborn. 1999. Linkage analysis of molecular markers and quantitative trait loci in populations of inbred backcross lines of Brassica napus L. Genetics 153: 949–964. (Available online at: http://www.genetics.org/cgi/content/full/153/2/949) (verified 23 Sept 2010).
  • Doganlar, S., A. Frary, H. M. Ku, and S. D. Tanksley. 2002. Mapping quantitative trait loci in inbred backcross lines of Lycopersicon pimpinellifolium (LA1589). Genome 45: 1189–1202. (Available online at: http://article.pubs.nrc-cnrc.gc.ca/ppv/RPViewDoc?issn=0831-2796&volume=45&issue=6&startPage=1189) (verified 23 Sept 2010).
  • Hartman, J. B., and D. A. St.Clair. 1999. Combining ability for beet armyworm (Spodoptera exigua) resistance and horticultural traits of selected Lycopersicon pennellii-derived inbred backcross lines of tomato. Plant Breeding 118: 523–530.
  • Kabelka, E., B. Franchino, and D. M. Francis. 2002. Two loci from Lycopersicon hirsutum LA407 confer resistance to strains of Clavibacter michiganensis subsp. michiganensis. Phytopathology 92: 504–510. (Available online at: http://apsjournals.apsnet.org/doi/abs/10.1094/PHYTO.2002.92.5.504) (verified 23 Sept 2010).
  • Kabelka, E., W. Yang, and D. M. Francis. 2004. Improved tomato fruit within an inbred backcross line derived from Lycopersicon esculentum and L. hirsutum involves the interaction of loci. Journal of the American Society of Horticultural Science 129: 250–257.
  • Lin, S. Y., T. Sasaki, and M. Yano. 1998. Mapping quantitative trait loci controlling seed dormancy and heading date in rice, Oryza sativa L., using backcross inbred lines. Theoretical and Applied Genetics 96: 997–1003.
  • Robbins, M. D., M. D. Casler, and J. E. Staub. 2008. Pyramiding QTL for multiple lateral branching in cucumber using inbred backcross lines. Molecular Breeding 22: 131–139.
  • Robbins, M. D., A. Darrigues, S. Sim, M.A.T. Masud, and D. M. Francis. 2009. Characterization of hypersensitive resistance to bacterial spot race T3 (Xanthomonas perforans) from tomato accession PI 128216. Phytopathology 99: 1037–1044. (Available online at: http://apsjournals.apsnet.org/doi/abs/10.1094/PHYTO-99-9-1037) (verified 27 Sept 2010).
  • Sullivan, J. G., and F. A. Bliss. 1983. Expression of enhanced seed protein content in inbred backcross lines of common bean. Journal of American Society of Horticultural Science 108: 787–791.
  • Yang, W., E. J. Sacks, M. L. Lewis-Ivey, S. A. Miller, and D. M. Francis. 2005. Resistance in Lycopersicum esculentum intraspecific crosses to race T1 strains of Xanthomonas campestris pv. vesicatoria causing bacterial spot of tomato. Phytopathology 95: 519–527. (Available online at: http://apsjournals.apsnet.org/doi/abs/10.1094/PHYTO-95-0519) (verified 27 Sept 2010).
  • Wehrhahn, C., and R. W. Allard. 1965. Detection and measurement of the effects of individual genes involved in the inheritance of a quantitative character in wheat. Genetics 31: 109–119.

Funding Statement

Development of this lesson was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 647

Analysis of Variance (ANOVA): Experimental Design for Fixed and Random Effects

Authors:

David M. Francis, The Ohio State University

Heather L. Merk, The Ohio State University

Matthew Robbins, The Ohio State University

This page is a continuation of the overview of Analysis of Variance (ANOVA) and is intended to help plant breeders consider fixed and random effects. The concepts of fixed and random effects are discussed in the context of experimental design and analysis. Reference ANOVA tables are provided.

Introduction

This page is a continuation of the Overview of Analysis of Variance page and is intended to help plant breeders consider the notions of fixed and random effects and the impacts these can have on ANOVA in the context of plant breeding. Briefly, ANOVA is a statistical test that takes the total variation and assigns it to known causes, leaving a residual portion allocated to uncontrolled or unexplained variation, called the experimental error. By measuring variability as sums of squares deviating from the mean sum of squares for all observations, the variation assigned to different controlled causes will be additive. It is therefore important to completely define the statistical model. Otherwise, the experimental error may be unnecessarily inflated (McIntosh, 1983).

Fixed and Random Effects

In the Overview of Analysis of Variance page, we considered the following linear model:

Y = m + f(treatment) + error

where

  • Y is equal to the trait value
  • m is the population mean
  • f(treatment) is a function of the treatment
  • error represents the residual

Fixed Effects

Intuitively, we may think about the treatments as being under our control and as “fixed.” Usually we are interested in comparing the dependent variable among factors/levels of the fixed effect. For example, we may want to evaluate whether differences in yield (dependent variable) between field locations for some elite cultivars we’ve been developing. To conduct this experiment, we would select the cultivars we want to evaluate and find suitable locations for our trial. We could think of the cultivars and locations as being fixed; we purposely chose to study different cultivars and locations. In this case, we are only interested in the performance of the elite cultivars we’re testing in the specific locations we’re testing.

Random Effects

Random effects, in contrast to fixed effects, are typically used to account for variance in the dependent variable. Also, unlike fixed effects, we aren’t looking to compare one level of the random effect to another. In our example, we could also consider location as a random effect. In the case of random effects, levels are chosen randomly from an infinite population and we want to make inferences that can extend beyond the sample. If this were the case, the cultivars would still be fixed effects, but location would be random. If we felt our locations were representative of all possible locations, we could use the different locations to help us make an evaluation of how well cultivars perform across locations as a whole, not just at the locations we’ve tested. The classification of effects as fixed or random determines the appropriate F-test.

ANOVA tables

McIntosh (1983) provides a set of reference tables for use during experimental design and analysis. These tables are intended for field experiments conducted over two or more locations or years. Some of the tables are replicated below.

Table 1. Expected mean squares for randomized complete blocks experiments combined over locations.
Sources of variation df Mean squares Expected mean squares1
RL-RT RL-FT FL-FT
Locations (l) l-1 M1 σ2e + rσ2TL + tσ2R(L) + rtσ2L σ2e + tσ2R(L) + rtσ2L σ2e + tσ2R(L) + rtσ2L
Blocks(Location) (r) l(r-1) M2 σ2e + tσ2R(L) σ2e + tσ2R(L) σ2e + tσ2R(L)
Treatment (t) t-1 M3 σ2e + rσ2TL + rlσ2T σ2e + rσ2TL + rlσ2T σ2e + rlσ2T
Location x treatment (l-1)(t-1) M4 σ2e + rσ2TL σ2e + rσ2TL σ2e + rσ2TL
Pooled error l(r-1)(t-1) M5 σ2e σ2e σ2e

1 R = random, F = fixed, L = location, T = treatment

Table 2. F-ratios used to test effects for randomized complete block experiments combined over locations.
Sources of variation Mean squares Expected mean squares1
RL-RT RL-FT FL-FT
Locations (l) M1 (M1+M5)/(M2+M4) M1/M2 M1/M2
Blocks(Location) (r) M2      
Treatment (t) M3 M3/M4 M3/M4 M3/M5
Location x treatment M4 M4/M5 M4/M5 M4/M5
Pooled error M5      

1 R = random, F = fixed, L = location, T = treatment

In a genetic/breeding experiment, treatments would likely be genotypes or varieties.

Conclusion

When designing experiments, plant breeders must consider the question they want to answer. Consequently, plant breeders must consider what type of statistical analyses are appropriate to answer the desired question. With regards to ANOVA, two important points should be considered in this context.

  1. Unaccounted sources of variation will be pooled into the error term resulting in an inflated error.
  2. The appropriate F-tests differ depending on whether the effects are fixed or random.
  3. Fixed effects influence mean and random effects influence variance.

References Cited

Additional Information

Many statistics textbooks provide a good discussion of theory and applications of ANOVA. Two examples are listed below.

  • Clewer, A. G., and D. H. Scarisbrick. 2001. Practical statistics and experimental design for plant and crop science. John Wiley & Sons, New York.
  • Steel, R.G.D., J. H. Torrie, and D. A. Dickey. 1997. Principles and procedures of statistics: A biometrical approach. McGraw–Hill, New York.

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 865

Overview of Michigan State’s Tart Cherry Breeding and Genetics Program

Author:

Audrey Sebolt, Michigan State University

This page provides an overview of Michigan State University’s tart cherry breeding and genetics program, led by Dr. Amy Iezzoni.

Introduction

Michigan State University’s tart cherry breeding and genetics program, led by Dr. Amy Iezzoni (Fig. 1), is the only one of its kind in the United States. After being hired in 1981, Dr. Iezzoni set out to determine industry needs for tart cherry and to access available germplasm to develop improved tart cherry varieties.

Dr. Amy Iezzoni, lead of Michigan State University's tart cherry breeding program.
Figure 1. Dr. Amy Iezzoni, lead of Michigan State University’s tart cherry breeding and genetics program. Photo credit: Michigan State University Tart Cherry Breeding and Genetics Program.

Tart Cherry Industry Needs

Virtually all tart cherries are processed. Processed tart cherry products include jams, dried cherries, individually quick frozen cherries, cherry juice, and—making up the largest portion—pie filling. The tart cherry industry is a monoculture, consisting essentially of one cultivar: ‘Montmorency’, a 400-year-old cultivar from France (Fig. 2).

Montmorency tart cherry branch with fruit.
Figure 2. Montmorency tart cherry branch with fruit. Photo credit: Michigan State University Tart Cherry Breeding and Genetics Program.

Why is Tart Cherry Production in the U.S. Essentially a Monoculture?

First, most of the cherry germplasm and excellent varieties that would have provided alternatives to ‘Montmorency’ evolved or were bred in Eastern Europe. Prior to the cold war, they were essentially unavailable to the U.S. Second, ‘Montmorency’ is extremely productive. The trees flourish in the sandy soils and harsh winters of Western Michigan, which produces 75% of the nation’s tart cherries. ‘Montmorency’ requires very little horticultural management and can withstand trunk damage inflicted by mechanical harvesting. Fruit produced from this cultivar are generally uniform in size and have clear flesh and bright red skin, characteristics which have become the standard for ‘American cherry pie’. There are limitations to ‘Montmorency’; the fruit can be soft and the trees are highly susceptible to cherry leaf spot (Blumeriella jaapii) (Fig. 3), which is a major financial cost to the ~$39 million tart cherry industry.

Cherry leaf spot susceptible and resistant leaves.
Figure 3. Leaves susceptible (left) and resistant (right) to cherry leaf spot. Photo credit: Michigan State University Tart Cherry Breeding and Genetics Program.

Tart Cherry Germplasm in the U.S.

When Dr. Iezzoni joined Michigan State, only a small collection of ‘Montmorency’ sports and varieties from Western Europe were available. Over a 15-year period, Dr. Iezzoni collected cherry accessions from Eastern Europe, the center of diversity for tart cherry, to expand her germplasm base. That effort led to the establishment of the world’s largest tart cherry germplasm collection, located at Michigan State University’s Clarksville Horticultural Research Station.

To incorporate germplasm from Eastern Europe, Dr. Iezzoni overcame genetically controlled self-incompatibility (Fig. 4). Today, Dr. Iezzoni’s tart cherry breeding program focuses on increased firmness, pit size and shape, late bloom time, disease resistance, processing savings due to less use of colorants and/or sugar, freestone or “airfree” (Fig. 5), and high yield.

Tart cherry pollination.
Figure 4. Cherry pollination. Photo credit: Michigan State University Tart Cherry Breeding and Genetics Program.

Freestone tart cherry.
Figure 5. Freestone or “air free” tart cherry. Photo credit: Cameron Peace, Washington State University.

For more information, visit Dr. Iezzoni’s website.

External Links

Funding Statement

Development of this page was supported in part by the Michigan Cherry Committee and the USDA’s National Institute of Food and Agriculture (NIFA). Project title: RosBREED: Enabling marker-assisted breeding in Rosaceae is provided by the Specialty Crops Research Initiative Competitive Grant 2009-51181-05808. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s)and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 937

The Polymerase Chain Reaction (PCR)

Author:

Matthew Robbins, The Ohio State University

This module provides an overview of the polymerase chain reaction (PCR), describes PCR using an analogy to photocopying a book, provides links to animations describing PCR, and provides examples of analysis of PCR products.

Introduction

The polymerase chain reaction (PCR) is a procedure that mimics the cellular process of DNA replication using the machinery of heat-resistant bacteria in a cyclic manner, resulting in several million copies of a specific DNA sequence that can then be visualized through electrophoresis and staining with a dye. PCR is commonly used in plant genetics and molecular breeding to copy a specific DNA fragment from the genome of an individual as a step in the process of  molecular marker assisted selection. The use of PCR to copy a specific portion of a genome is analogous to photocopying a specific page of a book. Table 1 illustrates this analogy by comparing the component required to copy DNA by PCR to those needed to photocopy a page of a book.

Table 1. Comparing components in PCR to photocopying a page in a book.

Photocopier items PCR components
The book The entire genome
(called the DNA template)
The page A portion of the genome (fragment) we are interested in
A bookmark Primers that “mark” the specific fragment
The copy machine

The enzyme that copies DNA
(called a polymerase)

Paper and toner

The four bases that make up DNA
(called nucleotides)

In the same way that a bookmark identifies the specific page to photocopy out of a book, PCR primers identify the specific fragment to be copied from the entire genome. In order to copy a page, the photocopier uses the paper and toner to make the copy. Similarly, the polymerase requires nucleotides to produce a replicate of the original DNA fragment.

Resources on PCR

To understand in more detail how these components function in PCR, the Plant and Soil Sciences eLibrary at the University of Nebraska-Lincoln has an informative lesson on PCR including an animation of the process:

Screenshot of the PCR animation from the Plant and Soil Sciences eLibrary
Photo credit: Plant and Soil Science eLibrary

Another animation on PCR can be found at the Dolan DNA Learning Center, part of The Cold Spring Harbor Laboratory.

Screenshot of the introdction to the PCR animation at the Dolan DNA Learning Center
Photo credit: The Dolan DNA Learning Center

The Genetics Science Learning Center at the University of Utah also has an animation on PCR.

Screenshot of the PCR tutorial at the Genetic Science Learning Center
Photo credit: The Genetics Science Learning Center

Analyzing PCR products

When using PCR for genotyping, the amplified DNA fragments can be analyzed several different ways. DNA amplified by PCR can be:

External Links

Additional Resources

For some PCR related entertainment, we recommend “The PCR Song“. With lyrics such as “PCR, when you need to find out who’s your Daddy; PCR, when you need to solve a crime…” this video produced by BioRad features characterizations of famous and not-so-famous folk singers. If you like the musical theme, the “GTCA Song” song rocks to the tune of YMCA while reviewing the biochemistry of PCR.

Funding Statement

Development of this lesson was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

Mention of specific companies is not intended for promotion purposes.

PBGworks 653

Genotyping with Molecular Markers: Scoring a Molecular Marker on an Agarose Gel

Authors:

Heather Merk, The Ohio State University; Deana Namuth-Covert, University of Nebraska-Lincoln; Matthew Robbins, The Ohio State University

This page teaches users how to genotype a molecular marker, how to organize genotypic data for analysis with Joinmap and MapMaker software, and how to test whether genotypic data meets an expected segregation pattern using the chi-square test. Sample data is provided.

Learning Objectives

At the end of this lesson you should:

  • Be familiar with the conventional layout of an agarose gel photo;
  • Be able to score genotypic data; and
  • Be able to organize genotypic data in a Microsoft Excel spreadsheet.

Introduction

The purpose of this article is to provide an example of how to genotype individual tomato plants with a molecular DNA marker. There are several different molecular marker systems available to assist plant breeding programs. For the purposes of this lesson, the marker chosen as an example is a cleaved amplified polymorphism (CAP) marker, a type of marker that is often visualized by gel electrophoresis. Briefly, a CAP marker exploits differences in DNA sequences between two polymerase chain reaction (PCR) products based on the presence or absence of restriction enzyme cutting sites found within that segment of DNA. To genotype a CAP marker, the segment of DNA is amplified using PCR then cut with a restriction enzyme (referred to as digestion, or restriction enzyme digestion), which only cuts at a specific DNA sequence. After digestion, the DNA is separated on agarose gel. CAP markers are designed so that the restriction enzyme will cut the DNA of one genotype, but not another.

Although different breeding program schemes can be used, in this particular case, the individual plants are from an F2 population that is segregating for the marker. In all breeding programs, the specific marker being used must be segregating among the plant population being used in order to be useful.

CAP markers are generally visualized using gel electrophoresis. When scoring any molecular DNA marker using gel electrophoresis, keep the following considerations in mind:

  1. Include a molecular weight ladder. This is like a DNA size ruler that contains DNA fragments of known molecular weight in base pair length (Fig.1). Since many markers are scored based on their molecular weight in DNA base pairs (bp), this ladder is essential to determine the molecular weight of each band in a gel
  2. Include controls. In addition to the individuals being genotyped, individuals of known genotype (often the parents of the population) should be included to make sure to identify the correct bands in the gel to score in the population.
  3. Know the characteristics of the molecular DNA marker in the germplasm you are using. Important attributes include the expected banding pattern (one band or multiple bands), the molecular weight of each segregating band, if the marker is dominantly or codominantly inherited, and so forth.

All these considerations will make it easier to score a marker from a gel photo. Next we will follow a specific CAP marker example in a tomato breeding program.

Genotyping Example

The gel photo below (Fig.1) is a CAP marker, CosOH57, genotyped in 30 individuals that were part of a larger F2 population developed from the parents OH88119 and 06.8068. The population was developed as part of a breeding project to incorporate bacterial spot resistance into elite germplasm. In order to score the gel, the bands are evaluated based on the considerations listed above:

  1. Molecular weight ladder. The ladder is in lane 1 and is a 100 base pair ladder.
  2. Controls. The parents of the cross are included in the gel photo in lanes 2 and 3; they provide a reference for the F2 plants. Notice the difference in banding patterns between the two parents, with OH88119 showing a band at 216 bp and 06.8068 showing two bands, one at 145 bp and another at 71 bp. These are the bands we will follow in the 30 F2 progeny (Fig. 1).
  3. Marker characteristics. As we described above, CAP markers must be amplified using PCR and then digested with a restriction enzyme. In this case, the PCR products for the parents OH88119 and 06.8068 have the same molecular weight (216 bp). However, after restriction enzyme digestion with restriction enzyme, Tth111I, the PCR product from OH881119 is not cut (and remains 216 bp long), whereas the PCR product from 06.8068 is cut into two pieces of 145 and 71 bp. Like most CAP markers, CosOH57 is codominant. In heterozygous individuals, the OH88119 allele will not be digested, producing the 216 bp band, but the 06.8068 allele will produce the two smaller bands, so all three bands are present after digestion.
  4. The individuals in lanes 4 through 33 are part of an F2 population derived from crossing OH88119 and 06.8068. The 30 F2 individuals genotyped with CosOH57 should segregate in a 1:2:1 ratio (homozygous for parent A allele : heterozygous: homozygous for parent B allele). Think of it like a simple Aa x Aa selfing of F1s to give 1AA: 2Aa :1aa in the F2 generation.


Figure 1. Example gel photo of CAP marker CosOH57. The gel includes a DNA ladder, the parental genotypes (OH88119 and 6.8068), and 30 F2 individuals. Photo credit: Matthew Robbins, The Ohio State University.

Scoring the Gel

Knowing the information outlined above, the gel can be scored. Most computer programs that use marker data in subsequent analyses have a specified data format. For segregating populations, many programs code the data in relation to the parents. For example, Joinmap and MapMaker, two programs that are commonly used for mapping, code genotypes from an F2 population as follows:

Table 1: Genotype codes for an F2 population.
Code Genotype
A homozygous for parent 1 allele
B homozygous for parent 2 allele
H heterozygous
C not genotype A (dominant B allele, so could be a genotype like parent 2 or heterozygous)
D not genotype B (dominant A allele, so could be a genotype like parent 1 or heterozygous)
“.” genotype unknown (missing data)

Keep in mind the following when scoring the genotypes:

  1. The determination of which parent is “parent 1” or “parent 2” is arbitrary. BUT the parental designation MUST be consistent for all markers scored on the same population. In this example, OH88119 is parent 1 (coded as A) for CosOH57, so OH88119 MUST also be parent 1 for all other markers on this population.
  2. The A, B, and H codes are applied to codominant markers, while A and C (parent 2 allele is dominant) or B and D (parent 1 allele is dominant) codes are for dominant markers.
  3. It is also important to code for unknown or missing data—a period, in this example.

Using the genotypic codes, each individual tomato plant is scored (Fig. 1). In the example we are following, CosOH57 is a codominant marker, so the 30 F2 individuals are coded as “A” when only the 216 bp band is present, “B” when a plant has both the 145 and 71 bp bands present, or “H” when all three bands are showing for an individual tomato plant.

Genotypic scores can also be coded by the molecular weight of the fragment. This is useful when genotyping a set of individuals without common parents, and especially if multiple alleles of the marker are present. In this simpler CosOH57 example, using the molecular weight scoring method, parent 1 would be scored as “216” and parent 2 could be scored as either “145” or “71.”

Organizing Genotypic Data

Once the molecular marker is scored, it is useful to organize the data in a spreadsheet or table format. This allows data from other markers genotyped in the same population to be combined in preparation for mapping or other analyses. The individual genotypes for CosOH57 have to be reorganized into a table with markers as rows and individual plant genotypes as columns (Table 2). It is important that “F2 Plant #1” is always the same plant, no matter the particular marker being genotyped. This is a common format for mapping software. The rows for Marker2 and Marker3 indicate that genotypic data can be added for additional markers. Although parental genotypes are not included in mapping analysis, it is useful to keep them with the data for reference.

Table 2. Table with genotypic data organized with markers as rows and individual genotypes as columns.
Marker OH88119 6.8068 F2 Plant 1 F2 Plant 2 F2 Plant 3      …
CosOH57 A B A A H  
Marker2            
Marker3            
           

Data Verification by Chi-square Test

Data summaries are also useful to check whether the data collected seems reasonable based on what you expect for a particular population, or if something else may be going on, such as the marker being linked to a trait we are selecting for or forces such as natural selection are distorting the expected segregation pattern. In our example, we may want to verify that the CosOH57 marker genotypes segregate as expected—1:2:1—using a chi-square goodness-of-fit test (note: For a refresher on how to use chi-square, you may want to take a look at the chi-square lesson). The data for the gel photo above, not including the parents, is summarized in Table 3. The observed column is determined simply by counting the number of individual plants with each genotype. The expected number of each genotype is calculated by multiplying the expected frequency of the genotype by the total number of plants being genotyped:

Expected = Expected Frequency x Total

The expected frequency is determined based on the segregation ratio of 1:2:1 for our F2 population, which is 0.25: 0.5 :0.25. Thus, the expected frequency of the “A” genotype for CosOH57 is:

Expected “A” Genotype = Expected Frequency of “A” Genotype x Total Number of F2 Plants Being Genotyped

or

Expected “A” Genotype = 0.25 x 30 = 7.5

The expected frequencies and number of each genotype are also presented in Table 3.

Table 3: Summary of the CosOH57 F2 gel data.
Genotype Observed Expected frequency Expected
A 13 0.25 7.5
B 7 0.25 7.5
H 10 0.5 15
Total 30 1 30

When the observed and expected numbers are used in a chi-squared goodness-of-fit test, the calculated p value is 0.057. Since this p value is a little greater than 0.05, a common level to declare significance, there is some evidence that CosOH57 may segregate as expected. Closer inspection of the data indicates that the actual observed frequency of genotype “A” may be higher than expected, while the H genotype may be lower than expected. Additional caution should be exercised because the relatively small number of F2 individuals make it difficult to interpret this chi-square test. Ideally, statisticians recommend genotyping an F2 population using at least 50 individuals.

Conclusion

In this tutorial we learned how to genotype a CAP marker that was scored in an F2 population. The principles we used apply to any other molecular marker that we may genotype, particularly molecular markers genotyped on a gel. These general principles also apply to other plant breeding schemes. We also learned how to organize data so that we can use it for genetic mapping. Finally, we learned how to perform a chi-square analysis as an additional test to help us determine the reliability of a specific marker in our breeding population.

External Links

Additional Resources

For additional practice scoring an agarose gel:

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 659

Gel Electrophoresis Principles and Applications

Author:

Matthew Robbins, The Ohio State University

This module introduces gel electrophoresis principles and applications for genetics and plant breeding in text, animation, and video formats.

Introduction

Gel electrophoresis is commonly used in plant breeding and genomics for genotyping with molecular markers, but there are several other applications as well (see below). For example, specific DNA fragments used as markers and isolated from individual plants are amplified by the polymerase chain reaction (PCR) and the resulting DNA fragments are subsequently loaded on a gel. The gel is a solid, gelatin-like substance used to separate DNA fragments based on size. The gel is placed in a conductive salt buffer to which an electrical field is applied. As the negatively-charged DNA fragments migrate toward the positive pole, the gel acts as a size filter, with smaller fragments migrating faster than larger fragments.

Resources on Gel Electrophoresis

In addition, this video illustrates the basics of DNA extraction and gel electrophoresis in tomato:

The Plant and Soil Sciences eLibrary at the University of Nebraska-Lincoln has an informative lesson on gel electrophoresis, including an animation of the process:

Screenshot of the Gel electrophoresis animation from the Plant and Soil Sciences eLibrary
Photo credit: Plant and Soil Sciences eLibrary

Another animation on gel electrophoresis can be found at the Dolan DNA Learning Center, part of The Cold Spring Harbor Laboratory:

Screenshot of the gel electrophoresis animation at the Dolan DNA Learning Center
Photo credit: The Dolan DNA Learning Center

The Genetics Science Learning Center at the University of Utah also has an animation on gel electrophoresis:


Photo credit: The Genetics Science Learning Center

Applications of Gel Electrophoresis

DNA can be separated by electrophoresis to:

  • Visualize bands of a molecular marker to genotype individual plants
  • Verify amplification by PCR or sequencing reactions
  • Check the quality and quantity of genomic DNA after DNA extraction
  • Separate DNA fragments to clone a specific band

External Links

Additional Resources

Funding Statement

Development of this lesson was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 654

Analysis of Variance for Plant Breeding

Authors:

David M. Francis, The Ohio State University; Heather L. Merk, The Ohio State University; Matthew Robbins, The Ohio State University

This page provides an introduction to the analysis of variance (ANOVA), creating and interpreting simple ANOVA tables, and common applications of ANOVA to plant breeding. ANOVA has two common types of application in a plant breeding context: (1) evaluating treatment differences and (2) partitioning variance for heritability estimates.

Introduction

The analysis of variance (ANOVA) is a statistical tool that has two common applications in a plant breeding context. First, ANOVA can be used to test for differences between treatments in an experiment. Common examples of treatments are genotype, location, and variety. Second, ANOVA can be used to aid in estimates of heritability by partitioning variances. This module focuses on simple ANOVA models to evaluate differences between treatments.

Assumptions of ANOVA

Like other statistical tests, ANOVA assumes that certain assumptions are met. One of the principal assumptions of ANOVA is that the samples come from normally distributed populations, each with the same variance. In addition, it is assumed that the residuals come from a normally distributed population with equal variances (σ2). The Kruskal–Wallis test is an alternative to ANOVA when the above assumptions cannot be met.

Testing for Treatment Differences

ANOVA is a tool that can be used to test for differences among treatment means when the independent variable is categorical (e.g., genotypes could be AA, Aa, aa) and the dependent variable is continuous (e.g., yield measured in tons/acre). How does this work?

In ANOVA, the total variance of all samples is calculated. Portions of the total variance can be attributed to known causes (e.g., genotype). This leaves a residual portion of the variance that is uncontrolled or unexplained and is referred to as experimental error. Then the between-treatment variation (e.g., AA genotype variation vs. Aa genotype variation vs. aa genotype variation) is compared to the within-treatment variation (experimental error) (e.g., variation within the aa genotype) to assess whether differences in mean value between treatments are due to the treatment effects or chance.

In the simplest case, linear equations can be developed to describe the relationship between a trait and treatment. The question can then be asked, “which linear equation best fits the data for each treatment?” These linear equations take the following form:

Y = µ + f(treatment) + error

where

  • Y is equal to the trait value
  • µ is the population mean
  • f(treatment) is a function of the treatment
  • error represents the residual

In this module we provide two examples of ANOVA and sample data sets to assess differences in treatment effect. In the first example, four methods of soybean transformation are evaluated to determine whether transformation method affects expression of a stress-response gene. In the second example, two molecular markers are evaluated to determine whether genotype of each molecular marker results in differences in disease severity in a BC1 population.

Conclusion

ANOVA is a statistical tool that has applications to experiments in which we want to assess whether there is a difference in a continuous variable between treatment groups. In a plant breeding context, this page demonstrated the utility of ANOVA in gene expression studies and molecular marker analysis.

Additional Resources

Many statistics textbooks provide a good discussion of theory and applications of ANOVA. A few examples are listed below.

  • Clewer, A. G., and D. H. Scarisbrick. 2001. Practical statistics and experimental design for plant and crop science. John Wiley & Sons Ltd., New York.
  • Steel, R. G. D., J. H. Torrie, and D. A. Dickey. 1997. Principles and procedures of statistics a biometrical approach. The McGraw-Hill Companies, Inc., New York.

The following videos provide detailed instructions for calculating components of ANOVA tables (ANOVA1 and 2) and hypothesis testing (ANOVA3).

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 650

Cross-Pollinating Tomatoes for Hybrid Production and Population Development

Author:

David M. Francis, The Ohio State University

This video provides basic instructions for cross-pollinating tomato plants to make new hybrids or to begin the process of population development.

If you have problems viewing this video connect with our YouTube channel or see the YouTube troubleshooting guide.

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 620