Why PolyGenius
Why PGS workflows matter
Polygenic scores are now used far beyond simple prediction benchmarks: they are used to study disease biology, stratify risk, and prioritize downstream analyses. As GWAS scale increases, the bottleneck is less about obtaining a score and more about running complete, reproducible score-to-inference workflows across real cohorts.
Where workflows usually fail
In practice, teams still spend most effort on glue code: harmonizing summary statistics from different sources, matching genome builds and alleles, handling large genotype formats, and keeping analysis settings traceable across cohorts. That patchwork setup is fragile and hard to audit, especially when analyses must run in separate secure environments and only summary results can be shared.
What PolyGenius changes
PolyGenius keeps the analysis declarative: users specify what they want (generate, compute, evaluate, associate, visualize) while the framework handles how steps are resolved and reused. The goal is not to replace state-of-the-art methods, but to make robust end-to-end polygenic workflows easier to run, easier to reproduce, and easier to extend.
Workflow snapshot
The core flow is:
generate: build one or morePolyGeniusModelobjects from GWAS inputscompute: apply models to cohort genotypes and derive score/population-structure layersevaluate: compare predictive behavior across model candidatesassociate: run inferential models against phenotypesvisualize: produce analysis-ready summaries and figures
Install and continue
remotes::install_github("holstegelab/PolyGenius")
library(PolyGenius)Continue with Chapter 2 for the minimal concepts, then Chapter 3 for the first full run.