Karolina Sikorska, Department of Biostatistics, Erasmus Medical Centre in Rotterdam
Part 1: Semi-Parallel Linear Regression
Part one explains GWA analysis in a loop using lm and lsfit functions and semi-parallel computations of linear regression with covariates. Also explains how to handle missing phenotype and SNP data.
Part 2: Semi- Parallel Logisitic Regression
Part two explains semi-parallel logisitic regression in R based on iteratively reweighted least squares (equivalent to glm), with and without covariates.
Part 3: Efficient Data Access
Part three explains how to convert the SNP matrix from a text file to an array-oriented binary file using the Ncdf and ff packages. Array-oriented binary files allow efficient access to blocks (columns) of SNPs by SNP, as opposed to by individual/line (rows).
Download R and Individual R Packages
R Packages Specific to this Tutorial
ncdf: Interface to Unidata netCDF data files
ff: memory-efficient storage of large data on disk and fast access functions
R Codes Available
About the Presenter
Karolina Sikorska received a Master’s degree in Mathematics from the Gdansk University of Technology, Poland, with a specialization in financial mathematics. In 2009 she started her PhD project in the Department of Biostatistics, Erasmus Medical Centre in Rotterdam. Her research is related to fast computations in genome-wide association studies. Her work is focused on developing new methodology and algorithms which significantly speed up computations in GWAS for simple models, such as linear and logistic regression, as well as, mixed models for analyzing longitudinal data. She is also interested in improving tools for efficient data access in GWAS framework.
Sikorska, K., Lesaffre, E., Groenen, P. F., & Eilers, P. H. (2013). GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics, 14(1), 166.
Sikorska, K., Rivadeneira, F., Groenen, P. J., Hofman, A., Uitterlinden, A. G., Eilers, P. H., & Lesaffre, E. (2013). Fast linear mixed model computations for genome‐wide association studies with longitudinal data. Statistics in Medicine, 32(1), 165-180.
Development of this resource was supported in part by the National Institute of Food and Agriculture (NIFA) Solanaceae Coordinated Agricultural Project, Dry Bean Root Health East Africa, and the Erasmus Medical Center Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the United States Department of Agriculture.
Slides.pdf (570.88 KB)