Association Analysis


Heather L. Merk, The Ohio State University; Nicholas Wheeler, Oregon State University; Sung-Chur Sim, The Ohio State University; M. Awais Khan, University of Illinois, Urbana-Champaign; David Harry, Oregon State University; Jennifer Kling, Oregon State University; Zhifen Zhang, The Ohio State University; Allen Van Deynze, University of California, Davis; David Francis, The Ohio State University

This curriculum page provides links to learning modules and tutorials relevant to association analysis. The learning modules introduce linkage disequilibrium (LD) and association genetics. The tutorials focus on preparation of phenotypic (using an augmented experiment design and obtaining BLUPs) and genotypic (using MSA, Structure, and GGT2) data. Association analysis using TASSEL is also demonstrated.

Data Pipeline

Figure 1. Data pipeline for association analysis.

Learning Modules


The Unified Mixed Model

y = μ + Sα + Qv + Zu + e

Phenotype Data (y)

Genotype Data

Marker Matrix (Sα)

  • m by q matrix, where m is the total number of observations and q is the number of genotypes at a marker locus
  • Analyzing SNP quality

Population Structure (Q matrix – Qv)

Kinship Matrix (Polygene effect – Zu)

Marker Coverage

Combined Analysis

Additional Resources

The Unified Mixed Model

  • Unified Mixed Model [Online]. Buckler Lab for Maize Genetics and Diversity. Available at: (verified 29 March 2012).
  • Yu, J., G. Pressoir, W. H. Briggs, I. V. Bi, M. Yamasaki, J. F. Doebley, M. D. McMullen, et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38: 203-208. (Available online at: (verified 29 March 2012).

Accounting for Population Structure

  • Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick, and D. Reich. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38: 904-909. (Available online at: (verified 29 March 2012).
  • Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155: 945–959. (Available online at: (verified 29 March 2012).

SAS Code for Association Analysis

Creating Matrix Equations Online

Funding Statement

Development of this page was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, agreement 2009-85606-05673, administered by Michigan State University. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

PBGworks 1333