SIMreg: gene-trait similarity regression for marker-set association analysis (SIMreg includes HSreg as part of the package)
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject
"SIMreg codes" before download.
SIMreg is a tool to perform maker-set association analysis. Association analysis at gene,
pathway, or exon levels (here by marker-set analysis) hold great promise in evaluating
modest etiological effects of genes with GWAS or sequence data. However, currently
available methods target detection of either rare or common variants but not both,
assume additive and same-direction effects for loci within a marker set, use
test-based frameworks that cannot accommodate covariates such as population structure,
and do not have the capacity to assess interaction effects. SIMreg provides a flexible,
powerful and computationally efficient alternative for conducting marker-set analysis.
It has the following features that distinguish it from other methods.
-
The method uses genetic similarity to aggregate information across markers,
and incorporates adaptive weights depending on allele frequencies to accommodate
rare and common variants.
-
Collapsing information at the similarity level instead of genotype level
bypasses the worry of cancelling signals of opposite etiological effects,
and is applicable on any class of genetic variant without having to dichotomize
the allele types.
-
It is regression-based, naturally incorporates covariates, and is applicable to both observed and imputed (dosage) genotypes.
-
We use a rigorous analytical derivation to demonstrate that collapsing information through similarity status
explicitly captures the locus-locus interactions among all markers in a set.
-
It provides a series of test statistics that can be used to assess (a) marginal genetic main effect (G test),
(b) gene-environment interaction effects (GxE test), or (c) the joint effects of both types simultaneously.
These tests do not require permutations to assess significance, and are fast to compute.
SIMreg is an extension of (incorporates all features and functions of) HSreg.
HSreg: Haplotype Similarity Regression for Multi-marker Association Analysis.
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "HSreg codes" before download.
HSreg implements a similarity-based regression method to detect associations
between traits and multimarker genotypes. The method uses genetic/haplotype
similarity to aggregate information from multiple polymorphic sites (e.g., SNPs
or a mixture of different polymorphisms), and regresses trait similarities for
pairs of "unrelated" individuals on their genetic similarities to access the
gene-trait association. The similarity regression allows for covariates, uses
phase-independent similarity measures to bypass the needs to impute phase
information, and is applicable to traits of general types (e.g., quantitative
and qualitative traits). It can be shown that the similarity model is
equivalent to the random effects haplotype analysis and explicitly models the
non-additive effects among markers. These features make it an ideal tool for
evaluating association between phenotype and marker sets defined by haplotypes,
genes or pathway.
PLhap: Penalized-Likelihood Regression for Haplotype Specific Analysis.
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "PLhap codes" before download.
PLhap implements a penalized regression approach to systematically evaluate the
pattern and structure of haplotype effects. The method takes unphased genotype
data and outputs the haplotype group structure based on their effect size.
PLhap differs from the typical way of haplotype analysis, where haplotype
inference focuses on relative effects compared with an arbitrarily chosen
baseline haplotype. The typical analysis does not depict the effect structure
unless an additional inference procedure is used in a secondary post hoc
analysis, and such analysis tends to lack power. By putting an L1 penalty on
the pairwise difference of the haplotype effects, PLhap avoids the need to
choose a baseline haplotype, and simultaneously carries out effect estimation
and effect comparison of all haplotypes. It can serve as a tool to comprehend
candidate regions identified from a genome or chromosomal scan.
MarkerQC: A Quality Control Algorithmm for Filtering SNPs in Genomewide Association Studies.
Pongpanich, Sullivan, and Tzeng 2010. Submitted
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "Marker QC filter" before download.
Marker QC implements an algorithm that is based on principal component analysis and
clustering analysis to identify genotyping outliers. The method minimizes
the decisions of arbitrary cutoff values, allows a collective
consideration of all QC features, and provides conditional thresholds
contingent on the values of other QC variables (such as different missing
proportion threshold for different minor allele frequency).
Hap-clustering: R code for Evolutionary-based Haplotype Clustering.
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "Hap clustering codes" before download.
Hap-Clustering implements a regression-based approach using
clustered haplotypes to assess haplotype-phenotype association. It generalizes
the probabilistic clustering methods of Tzeng to the generalized linear model
(GLM) framework established by Schaid et al. (2002). Hap-clustering uses
unphased genotypes and accounts for both phase uncertainty and clustering
uncertainty when performing association tests. Its GLM framework allows
adjustment of covariates and can model qualitative and quantitative traits. It
is best used to evaluate the overall haplotype association.
QSHS: R Code for Quadratic Statistics of Haplotype Similarity (QSHS).
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "QSHS codes" before download.
QSHS implements a class of association tests based on
haplotype similarity. Specifically, many measures of haplotype similarity
can be expressed in the same quadratic form, and we give the general form
of the variance. These methods can be applied to either phase-known or
phase-unknown data.