Algorithms

Algorithm declarations tell PolyGenius which PRS construction methods you want to run. They stay separate from GWAS source declarations, so one set of algorithms can be crossed with many GWAS sources in the same generate$models(...) call.

At the moment, PolyGenius supports four built-in algorithm declarations:

  • ClumpingThresholding
  • LDpred2
  • lassosum2
  • COJO (currently a stub implementation)

How declarations map to model generation

Each generate$algorithms$...() call produces one or more generate.algorithm resource specs. generate$models() combines those specs with the requested GWAS sources and asks the execution engine to resolve the corresponding polygenius.model outputs.

Reference panels for LD-based algorithms

ClumpingThresholding, LDpred2, and lassosum2 all depend on a registered reference panel.

eur <- dataRetriever$referencePanels$get(name = "EUR")

referencePanels$add(
  name = eur$name,
  description = eur$description,
  genotype.info = eur,
  overwrite = TRUE
)

Once the panel is registered, you refer to it by name, for example reference.panel = "EUR".

ClumpingThresholding

ClumpingThresholding is the sparse baseline method: it assumes the model can be represented by near-independent lead variants chosen after LD clumping and then filtered by a p-value threshold.

Declare it

alg_ct <- generate$algorithms$ClumpingThresholding(
  pval = c(1e-4, 1e-6, 5e-8),
  reference.panel = "EUR",
  clump.r2 = 0.1,
  clump.kb = 250,
  clump.p1 = 1e-4,
  eaf.threshold = 0
)

Practical notes

  • Multiple pval values create multiple model specs in one call.
  • PolyGenius clumps once at the loosest requested threshold and derives stricter thresholds by filtering that clumped result.
  • If you use eaf.threshold > 0, the GWAS source needs an eaf column.

LDpred2

LDpred2 is the LD-aware shrinkage option. It is appropriate when you want a model that can distribute signal across many correlated variants rather than only retaining clumped lead SNPs.

What it assumes in practice

PolyGenius needs:

  • GWAS effect sizes and standard errors;
  • an effective sample size;
  • a reference panel that is a reasonable LD match for the GWAS ancestry/build.

For local GWAS sources, that usually means providing se plus n, or at least metadata$n_eff / metadata$sample_size.

Install prerequisites

install.packages(c("bigsnpr", "bigstatsr"))

Declare it

alg_ldpred2 <- generate$algorithms$LDpred2(
  reference.panel = "EUR",
  mode = "auto",
  pval = 0.01,
  ld.size = 3000,
  ld.thr = 0.002,
  ncores = 4
)

Important parameters to start with are reference.panel, mode, pval, ld.size, ld.thr, and ncores. More advanced tuning options such as h2.est, p.causal, use.MLE, alpha, allow.jump.sign, and shrink.corr are exposed in the reference page.

lassosum2

lassosum2 is another LD-aware whole-genome method. Conceptually it assumes that many variants may contribute, but it uses a penalized regression/shrinkage formulation instead of the LDpred2 family.

What it assumes in practice

The same data requirements as LDpred2 apply here:

  • GWAS effects plus standard errors;
  • an effective sample size;
  • a matching reference panel for LD.

Install prerequisites

install.packages(c("bigsnpr", "bigstatsr"))

Declare it

alg_lassosum2 <- generate$algorithms$lassosum2(
  reference.panel = "EUR",
  pval = 0.01,
  delta = c(0.001, 0.01, 0.1),
  lambda = NULL,
  ncores = 4
)

The main declaration variables are reference.panel, pval, delta, lambda, and ncores.

COJO

Conceptually, COJO targets conditionally independent effects estimated from summary statistics plus LD information. In the current PolyGenius codebase, however, the built-in COJO support is still a stub.

Current implementation status

  • generate$algorithms$COJO(...) is supported as a declaration.
  • The current RunCojoRule does not call an external COJO backend yet.
  • The stub returns the GWAS variants unchanged after the requested pval filtering step upstream.

Declare it

alg_cojo <- generate$algorithms$COJO(
  pval = 5e-8
)

Run several algorithms in one call

models <- generate$models(
  sources = src_local,
  algorithms = list(
    alg_ct,
    alg_ldpred2,
    alg_lassosum2,
    alg_cojo
  )
)

models

Reference pages

For full parameter documentation, see:

Engine internals for panel conversion, LD-matrix reuse, matching, and scheduling are covered in Chapter 17.

Extending algorithms

PolyGenius is intentionally open-ended here: algorithms are resolved through rules, so new methods can be added without changing the high-level generate$models() interface. For extension patterns, including how an algorithm can depend on intermediate resources produced by other rules, see Chapter 18.