Jung-Ying Tzeng
Associate Professor

Software

SIMreg: gene-trait similarity regression for marker-set association analysis (SIMreg includes HSreg as part of the package)
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "SIMreg codes" before download.
SIMreg is a tool to perform maker-set association analysis. Association analysis at gene, pathway, or exon levels (here by marker-set analysis) hold great promise in evaluating modest etiological effects of genes with GWAS or sequence data. However, currently available methods target detection of either rare or common variants but not both, assume additive and same-direction effects for loci within a marker set, use test-based frameworks that cannot accommodate covariates such as population structure, and do not have the capacity to assess interaction effects. SIMreg provides a flexible, powerful and computationally efficient alternative for conducting marker-set analysis. It has the following features that distinguish it from other methods.
  1. The method uses genetic similarity to aggregate information across markers, and incorporates adaptive weights depending on allele frequencies to accommodate rare and common variants.
  2. Collapsing information at the similarity level instead of genotype level bypasses the worry of cancelling signals of opposite etiological effects, and is applicable on any class of genetic variant without having to dichotomize the allele types.
  3. It is regression-based, naturally incorporates covariates, and is applicable to both observed and imputed (dosage) genotypes.
  4. We use a rigorous analytical derivation to demonstrate that collapsing information through similarity status explicitly captures the locus-locus interactions among all markers in a set.
  5. It provides a series of test statistics that can be used to assess (a) marginal genetic main effect (G test), (b) gene-environment interaction effects (GxE test), or (c) the joint effects of both types simultaneously. These tests do not require permutations to assess significance, and are fast to compute.
SIMreg is an extension of (incorporates all features and functions of) HSreg.
HSreg: Haplotype Similarity Regression for Multi-marker Association Analysis.
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "HSreg codes" before download.
HSreg implements a similarity-based regression method to detect associations between traits and multimarker genotypes. The method uses genetic/haplotype similarity to aggregate information from multiple polymorphic sites (e.g., SNPs or a mixture of different polymorphisms), and regresses trait similarities for pairs of "unrelated" individuals on their genetic similarities to access the gene-trait association. The similarity regression allows for covariates, uses phase-independent similarity measures to bypass the needs to impute phase information, and is applicable to traits of general types (e.g., quantitative and qualitative traits). It can be shown that the similarity model is equivalent to the random effects haplotype analysis and explicitly models the non-additive effects among markers. These features make it an ideal tool for evaluating association between phenotype and marker sets defined by haplotypes, genes or pathway.
PLhap: Penalized-Likelihood Regression for Haplotype Specific Analysis.
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "PLhap codes" before download.
PLhap implements a penalized regression approach to systematically evaluate the pattern and structure of haplotype effects. The method takes unphased genotype data and outputs the haplotype group structure based on their effect size. PLhap differs from the typical way of haplotype analysis, where haplotype inference focuses on relative effects compared with an arbitrarily chosen baseline haplotype. The typical analysis does not depict the effect structure unless an additional inference procedure is used in a secondary post hoc analysis, and such analysis tends to lack power. By putting an L1 penalty on the pairwise difference of the haplotype effects, PLhap avoids the need to choose a baseline haplotype, and simultaneously carries out effect estimation and effect comparison of all haplotypes. It can serve as a tool to comprehend candidate regions identified from a genome or chromosomal scan.
MarkerQC: A Quality Control Algorithmm for Filtering SNPs in Genomewide Association Studies.
Pongpanich, Sullivan, and Tzeng 2010. Submitted
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "Marker QC filter" before download.
Marker QC implements an algorithm that is based on principal component analysis and clustering analysis to identify genotyping outliers. The method minimizes the decisions of arbitrary cutoff values, allows a collective consideration of all QC features, and provides conditional thresholds contingent on the values of other QC variables (such as different missing proportion threshold for different minor allele frequency).
Hap-clustering: R code for Evolutionary-based Haplotype Clustering.
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "Hap clustering codes" before download.
Hap-Clustering implements a regression-based approach using clustered haplotypes to assess haplotype-phenotype association. It generalizes the probabilistic clustering methods of Tzeng to the generalized linear model (GLM) framework established by Schaid et al. (2002). Hap-clustering uses unphased genotypes and accounts for both phase uncertainty and clustering uncertainty when performing association tests. Its GLM framework allows adjustment of covariates and can model qualitative and quantitative traits. It is best used to evaluate the overall haplotype association.
QSHS: R Code for Quadratic Statistics of Haplotype Similarity (QSHS).
To facilitate updating, please send tzeng@stat.ncsu.edu a blank email with subject "QSHS codes" before download.
QSHS implements a class of association tests based on haplotype similarity. Specifically, many measures of haplotype similarity can be expressed in the same quadratic form, and we give the general form of the variance. These methods can be applied to either phase-known or phase-unknown data.