Bioinformatics

These bioinformatics and data management tutorials provide an introduction to sequence data, on-line genomic resources, and general data manipulation.

Introduction to Sequence Data

On-Line Resources

 

PBGworks 1555

Part One of Field Phenomics: Developing and Using a Sensor Array

Authors:

Pedro Pedro Andrade-Sanchez, The University of Arizona; John Heun, The University of Arizona

This webinar is the first in a two part series on high throughput field phenotyping. This presentation describes the development and use of a field-based sensor array.

Part 1 – Describing the tractor

Part 2 – Discussing the sensors

Part 3 – Explaining the electronics

Full recording

Original air date: Thursday, October 24, 2013, 1:00 PM EDT

Presenters

Pedro Andrade-Sanchez and John Heun from the University of Arizona

See Part 2: Data Analysis

Related Publications

Andrade-Sanchez Pedro, Gore Michael A., Heun John T., Thorp Kelly R., Carmo-Silva A. Elizabete, French Andrew N., Salvucci Michael E., White Jeffrey W. (2013) Development and evaluation of a field-based high-throughput phenotyping platform. Functional Plant Biology.

Andrade-Sanchez, P., & Heun, J. T. (2012). From GPS to GNSS: Enhanced Functionality of GPS-Integrated Systems in Agricultural Machines.

A. Elizabete Carmo-Silva, Michael A. Gore, Pedro Andrade-Sanchez, Andrew N. French, Doug J. Hunsaker, Michael E. Salvucci (2012) Decreased CO2 availability and inactivation of Rubisco limit photosynthesis in cotton plants under heat and drought stress in the field. Environmental and Experimental Botany.83:1-11. ISSN 0098-8472, 10.1016/j.envexpbot.2012.04.001.

Jeffrey W. White, Pedro Andrade-Sanchez, Michael A. Gore, Kevin F. Bronson, Terry A. Coffelt, Matthew M. Conley, Kenneth A. Feldmann, Andrew N. French, John T. Heun, Douglas J. Hunsaker, Matthew A. Jenks, Bruce A. Kimball, Robert L. Roth, Robert J. Strand, Kelly R. Thorp, Gerard W. Wall, Guangyao Wang (2012) Field-based phenomics for plant genetics research. Field Crops Research, Volume 133: 101-112, ISSN 0378-4290, 10.1016/j.fcr.2012.04.003.

Find all of PBG’s upcoming and recorded webinars »

Funding Statement

Development of this resource was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, and Dry Bean Root Health East Africa. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

 

Attachments:

Developing and Using a Sensor Array.pdf (14.34 MB)

PBGworks 1631

David Francis

David Francis is a Professor at The Ohio State University in the Department of Horticulture and Crop Science.  David is a tomato breeder, developing breeding lines, parents and hybrids for the processing tomato industry. His research group integrates field-based plant breeding with the discovery of sequence variation, and techniques derived from population genetics to identify novel traits and understand how human selection has shaped contemporary plant varieties. 

Contact Information

Email

PBGworks 1677

Statistical Inference

These tutorials provide examples of statistical analyses that are commonly used by plant breeders and scientists in general.

ANOVA

Sample Size Estimation

Population Structure and Linkage

Association Analysis

Layout Specific Analyses

PBGworks 1557

Part Two of Field Phenomics: Data Analysis

imageThis presentation describes how to handle data generated by a field-based sensor array.
This webinar is the second in a two part series on high throughput field phenotyping. This presentation describes how to handle data generated by a field-based sensor array. The video was recorded live as a webinar October 31, 2013.

 

Part 1

Part 2

Part 3

Full Recording

Software Links

QGIS: A Free and Open Source Geographic Information System

HTP Geoprocessor: A plugin for QGIS

ASReml: Data analysis software designed for fitting linear mixed models

PROSAIL: The combined PROSPECT leaf optical properties model and SAIL canopy bidirectional reflectance model

Presenters

Michael Gore is an associate professor of molecular breeding and genetics for nutritional quality at Cornell University in Ithaca, NY, where he is a member of the faculty in the Department of Plant Breeding and Genetics. He holds a BS and MS from Virginia Tech in Blacksburg, Virginia, and a PhD from Cornell University. Before joining the faculty at Cornell, he worked as a Research Geneticist with the USDA-ARS at the Arid-Land Agricultural Research Center in Maricopa, AZ. His expertise is in the field of quantitative genetics and genomics, especially the genetic dissection of metabolic traits. He has also contributed to the development and application of field-based, high-throughput phenotyping tools for plant breeding and genetics research. He teaches two short courses at the Tucson Winter Plant Breeding Institute in Tucson, Arizona, and serves on the editorial boards of Crop Science and Theoretical and Applied Genetics. His career accomplishments in plant breeding and genetics earned him the National Association of Plant Breeders Early Career Scientist Award in 2012 and the American Society of Plant Biologists Early Career Award in 2013. 

 

Kelly Thorp is a Research Agricultural Engineer with USDA-ARS in Maricopa, Arizona. He holds a BS and MS from the University of Illinois at Urbana-Champaign and a PhD from Iowa State University. His research focuses primarily on the development and application of informational technologies for monitoring cropping systems and understanding cropping system processes.  Areas of expertise include remote sensing, cropping system simulation modeling, and geographic information systems.  Application areas for these technologies include crop water and nitrogen status assessment, precision agriculture, management of nitrogen fertilizer, irrigation and drainage water management, field-based plant phenomics, and development of new bioenergy crops.  He serves as an associate editor for Transactions of the ASABE and Applied Engineering in Agriculture.

 

 

See Part 1: Developing and Using a Sensor Array

Related Publications

Andrade-Sanchez Pedro, Gore Michael A., Heun John T., Thorp Kelly R., Carmo-Silva A. Elizabete, French Andrew N., Salvucci Michael E., White Jeffrey W. (2013) Development and evaluation of a field-based high-throughput phenotyping platform. Functional Plant Biology.

A. Elizabete Carmo-Silva, Michael A. Gore, Pedro Andrade-Sanchez, Andrew N. French, Doug J. Hunsaker, Michael E. Salvucci (2012) Decreased CO2 availability and inactivation of Rubisco limit photosynthesis in cotton plants under heat and drought stress in the field. Environmental and Experimental Botany.83:1-11. ISSN 0098-8472, 10.1016/j.envexpbot.2012.04.001.

Jeffrey W. White, Pedro Andrade-Sanchez, Michael A. Gore, Kevin F. Bronson, Terry A. Coffelt, Matthew M. Conley, Kenneth A. Feldmann, Andrew N. French, John T. Heun, Douglas J. Hunsaker, Matthew A. Jenks, Bruce A. Kimball, Robert L. Roth, Robert J. Strand, Kelly R. Thorp, Gerard W. Wall, Guangyao Wang (2012) Field-based phenomics for plant genetics research. Field Crops Research, Volume 133: 101-112, ISSN 0378-4290, 10.1016/j.fcr.2012.04.003.

Thorp, K.R., Wang, G., West, A.L., Moran, M.S., Bronson, K.F., White, J.W., Mon, J.  2012.  Estimating crop biophysical properties from remote sensing data by inverting linked radiative transfer and ecophysiological models.  Remote Sensing of Environment. 124:224-233.

Jeffrey W. White, Pedro Andrade-Sanchez, Michael A. Gore, Kevin F. Bronson, Terry A. Coffelt, Matthew M. Conley, Kenneth A. Feldmann, Andrew N. French, John T. Heun, Douglas J. Hunsaker, Matthew A. Jenks, Bruce A. Kimball, Robert L. Roth, Robert J. Strand, Kelly R. Thorp, Gerard W. Wall, Guangyao Wang (2012) Field-based phenomics for plant genetics research. Field Crops Research, Volume 133: 101-112, ISSN 0378-4290, 10.1016/j.fcr.2012.04.003.

Recommended Reading

Barnett V, Lewis T (1994). Outliers in Statistical Data, 3rd edition, John Wiley, New York, NY, USA.

Belsley DA, Kuh E, Welsch RE (2004). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley-Interscience, Hoboken, NJ, USA. 

Box GEP, Cox DR (1964). An analysis of Transformations. J. Roy. Stat. Soc. B. Met. 26: 211–252.

Cook RD, and Weisberg S (1982) Residuals and Influence in Regression, Chapman & Hall, New York, NY, USA

Endelman JB (2011). Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Gen 4: 250-255.

Henderson CR (2004). Applications of Linear Models in Animal Breeding. University of Guelph, Guelph, Ontario, Canada.

Holland JB, Nyguist WE, Cervantes-Martınez CT (2003). Estimating and interpreting heritability for plant breeding: an update. In Janick J (ed) Plant Breeding Reviews. John Wiley and Sons: Hoboken, New Jersey, USA.

Hung H-Y, Browne CJ, Guill KE, Coles N, Eller M, Garcia A et al (2011). The relationship between parental genetic or phenotypic divergence and progeny variation in the maize Nested Association Mapping population. Heredity 108:490-499. 

Jia Y, Jannink J-L (2012). Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192: 1513-1522.

Jiang C, Zeng ZB (1995). Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111-1127.

Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M (2012). A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet 44: 1066-1071.

Kutner MH, Nachtsheim CJ, Neter J, Li W (2004). Applied Linear Statistical Models, 4th ed. McGraw-Hill, Boston, MA, USA.

Li H, Ye G, Wang J (2007). A modified algorithm for the improvement of composite interval mapping. Genetics 175: 361-374.

Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ et al (2012). GAPIT: genome association and prediction integrated tool.
Bioinformatics 28: 2397-2399.

McCullagh P and Nelder JA (1989). Generalized Linear Models, Second Edition. Chapman & Hall, New York, NY, USA.

Osborne JW, Overbay A (2004). The power of outliers (and why researchers should always check for them). Practical Assessment, Research & Evaluation, 9(6).

Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q et al (2012). An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44: 825-830.

Wu W-R, Li W-M, Tang D-Z, Lu H-R, Worland AJ (1999). Time-related mapping of quantitative trait loci underlying tiller number in rice. Genetics 151: 297-303.

Funding Statement

Development of this resource was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, and Dry Bean Root Health East Africa, Cotton Incorporated and United States Department of Agriculture – Agricultural Research Service (USDA-ARS).  Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. The USDA is an equal opportunity provider and employer.

PBGworks 1633

Soil Borne Diseases of Dry Bean Webinar

This webinar looks at morphological and molecular techniques for identifying soil born fungal and oomycete pathogens in dry bean production systems.

Original day/time: February 6, 2014 at 1:00pm Eastern Time (-05:00 GMT)

Resources from the Webinar

Databases

Field identification

Metagenomics

Find all upcoming and archived webinars »

About the Presenter

Martin Chlivers is a visiting Assistant Professor at Michigan State University.  He studies diseases of field crops, and the biology and genetics of fungal and oomycete organisms that cause disease. He utilizes both classic techniques such as culturing causal agents and the latest technologies, such as next generation sequencing of metagenomes, to improve disease management by understanding the organisms, factors, and host-pathogen interactions that drive disease.

 

PBGworks 1681

Genetic Mapping and QTL Analysis

These tutorials provide an introduction to association genetics and QTL mapping as well as different analytical approaches.


(Figure 1) Flow of information from collecting genotypic and phenotypic data to mapping to graphical genotyping and quantitative trait locus analysis. Figure credit: Heather Merk. Image credits: Genotyping, Allen Van Deynze, UC Davis; Mapping, Scott Wolfe, The Ohio State University; Graphical Genotyping, Nancy Huarachi, The Ohio State University; Phenotyping, David Francis, The Ohio State University; QTL Analysis, Hamid Ashrafi, UC Davis.

Introduction to Association Genetics

Mapping and QTL Analysis

PBGworks 1563

Fast Semi-Parallel Linear and Logistic Regression for Genome-Wide Association Studies

Author:

Karolina Sikorska, Department of Biostatistics, Erasmus Medical Centre in Rotterdam

This tutorial demonstrates semi-parallel computing and SNP data re-organization using the statistical program, R. Karolina Sikroska describes techniques for speeding up genome-wide association studies (GWAS) and making genome-wide association scans possible on a notebook computer using matrix operations and matrix oriented binary files. The video is from a webinar recorded September 12, 2013.

Part 1: Semi-Parallel Linear Regression

Part one explains GWA analysis in a loop using lm and lsfit functions and semi-parallel computations of linear regression with covariates.  Also explains how to handle missing phenotype and SNP data.

Part 2: Semi- Parallel Logisitic Regression

Part two explains semi-parallel logisitic regression in R based on iteratively reweighted least squares (equivalent to glm), with and without covariates.

Part 3: Efficient Data Access

Part three explains how to convert the SNP matrix from a text file to an array-oriented binary file using the Ncdf and ff packages.  Array-oriented binary files allow efficient access to blocks (columns) of SNPs by SNP, as opposed to by individual/line (rows).

Full Recording


​Download R and Individual R Packages

www.r-project.org

R Packages Specific to this Tutorial

ncdf: Interface to Unidata netCDF data files
ff: memory-efficient storage of large data on disk and fast access functions

R Codes Available

https://bitbucket.org/ksikorska/gwasp


About the Presenter

Karolina Sikorska received a Master’s degree in Mathematics from the Gdansk University of Technology, Poland, with a specialization in financial mathematics.  In 2009 she started  her PhD project in the Department of Biostatistics, Erasmus Medical Centre in Rotterdam.  Her research is related to fast computations in genome-wide association studies.  Her work is focused on developing new methodology and algorithms which significantly speed up computations in GWAS for simple models, such as linear and logistic regression, as well as, mixed models for analyzing longitudinal data. She is also interested in improving tools for efficient data access in GWAS framework.

 


Related Publications

Sikorska, K., Lesaffre, E., Groenen, P. F., & Eilers, P. H. (2013). GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics, 14(1), 166.

Sikorska, K., Rivadeneira, F., Groenen, P. J., Hofman, A., Uitterlinden, A. G., Eilers, P. H., & Lesaffre, E. (2013). Fast linear mixed model computations for genome‐wide association studies with longitudinal data. Statistics in Medicine, 32(1), 165-180.


 

Funding Statement

Development of this resource was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, Dry Bean Root Health East Africa, and the Erasmus Medical Center  Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.

Attachments:

Slides.pdf (570.88 KB)

PBGworks 1641