Why PolyGenius

Why PGS workflows matter

Polygenic scores are now used far beyond simple prediction benchmarks: they are used to study disease biology, stratify risk, and prioritize downstream analyses. As GWAS scale increases, the bottleneck is less about obtaining a score and more about running complete, reproducible score-to-inference workflows across real cohorts.

Where workflows usually fail

In practice, teams still spend most effort on glue code: harmonizing summary statistics from different sources, matching genome builds and alleles, handling large genotype formats, and keeping analysis settings traceable across cohorts. That patchwork setup is fragile and hard to audit, especially when analyses must run in separate secure environments and only summary results can be shared.

What PolyGenius changes

PolyGenius keeps the analysis declarative: users specify what they want (generate, compute, evaluate, associate, visualize) while the framework handles how steps are resolved and reused. The goal is not to replace state-of-the-art methods, but to make robust end-to-end polygenic workflows easier to run, easier to reproduce, and easier to extend.

Workflow snapshot

The core flow is:

  • generate: build one or more PolyGeniusModel objects from GWAS inputs
  • compute: apply models to cohort genotypes and derive score/population-structure layers
  • evaluate: compare predictive behavior across model candidates
  • associate: run inferential models against phenotypes
  • visualize: produce analysis-ready summaries and figures

Install and continue

remotes::install_github("holstegelab/PolyGenius")
library(PolyGenius)

Continue with Chapter 2 for the minimal concepts, then Chapter 3 for the first full run.