eur <- dataRetriever$referencePanels$get(name = "EUR")
referencePanels$add(
name = eur$name,
description = eur$description,
genotype.info = eur,
overwrite = TRUE
)Algorithms
Algorithm declarations tell PolyGenius which PRS construction methods you want to run. They stay separate from GWAS source declarations, so one set of algorithms can be crossed with many GWAS sources in the same generate$models(...) call.
At the moment, PolyGenius supports four built-in algorithm declarations:
ClumpingThresholdingLDpred2lassosum2COJO(currently a stub implementation)
How declarations map to model generation
Each generate$algorithms$...() call produces one or more generate.algorithm resource specs. generate$models() combines those specs with the requested GWAS sources and asks the execution engine to resolve the corresponding polygenius.model outputs.
Reference panels for LD-based algorithms
ClumpingThresholding, LDpred2, and lassosum2 all depend on a registered reference panel.
Once the panel is registered, you refer to it by name, for example reference.panel = "EUR".
ClumpingThresholding
ClumpingThresholding is the sparse baseline method: it assumes the model can be represented by near-independent lead variants chosen after LD clumping and then filtered by a p-value threshold.
Declare it
alg_ct <- generate$algorithms$ClumpingThresholding(
pval = c(1e-4, 1e-6, 5e-8),
reference.panel = "EUR",
clump.r2 = 0.1,
clump.kb = 250,
clump.p1 = 1e-4,
eaf.threshold = 0
)Practical notes
- Multiple
pvalvalues create multiple model specs in one call. - PolyGenius clumps once at the loosest requested threshold and derives stricter thresholds by filtering that clumped result.
- If you use
eaf.threshold > 0, the GWAS source needs aneafcolumn.
LDpred2
LDpred2 is the LD-aware shrinkage option. It is appropriate when you want a model that can distribute signal across many correlated variants rather than only retaining clumped lead SNPs.
What it assumes in practice
PolyGenius needs:
- GWAS effect sizes and standard errors;
- an effective sample size;
- a reference panel that is a reasonable LD match for the GWAS ancestry/build.
For local GWAS sources, that usually means providing se plus n, or at least metadata$n_eff / metadata$sample_size.
Install prerequisites
install.packages(c("bigsnpr", "bigstatsr"))Declare it
alg_ldpred2 <- generate$algorithms$LDpred2(
reference.panel = "EUR",
mode = "auto",
pval = 0.01,
ld.size = 3000,
ld.thr = 0.002,
ncores = 4
)Important parameters to start with are reference.panel, mode, pval, ld.size, ld.thr, and ncores. More advanced tuning options such as h2.est, p.causal, use.MLE, alpha, allow.jump.sign, and shrink.corr are exposed in the reference page.
lassosum2
lassosum2 is another LD-aware whole-genome method. Conceptually it assumes that many variants may contribute, but it uses a penalized regression/shrinkage formulation instead of the LDpred2 family.
What it assumes in practice
The same data requirements as LDpred2 apply here:
- GWAS effects plus standard errors;
- an effective sample size;
- a matching reference panel for LD.
Install prerequisites
install.packages(c("bigsnpr", "bigstatsr"))Declare it
alg_lassosum2 <- generate$algorithms$lassosum2(
reference.panel = "EUR",
pval = 0.01,
delta = c(0.001, 0.01, 0.1),
lambda = NULL,
ncores = 4
)The main declaration variables are reference.panel, pval, delta, lambda, and ncores.
COJO
Conceptually, COJO targets conditionally independent effects estimated from summary statistics plus LD information. In the current PolyGenius codebase, however, the built-in COJO support is still a stub.
Current implementation status
generate$algorithms$COJO(...)is supported as a declaration.- The current
RunCojoRuledoes not call an external COJO backend yet. - The stub returns the GWAS variants unchanged after the requested
pvalfiltering step upstream.
Declare it
alg_cojo <- generate$algorithms$COJO(
pval = 5e-8
)Run several algorithms in one call
models <- generate$models(
sources = src_local,
algorithms = list(
alg_ct,
alg_ldpred2,
alg_lassosum2,
alg_cojo
)
)
modelsReference pages
For full parameter documentation, see:
- generate\(algorithm\)ClumpingThresholding
- generate\(algorithm\)LDpred2
- generate\(algorithm\)lassosum2
- generate\(algorithm\)COJO
Engine internals for panel conversion, LD-matrix reuse, matching, and scheduling are covered in Chapter 17.
Extending algorithms
PolyGenius is intentionally open-ended here: algorithms are resolved through rules, so new methods can be added without changing the high-level generate$models() interface. For extension patterns, including how an algorithm can depend on intermediate resources produced by other rules, see Chapter 18.