generate.sources.mySource <- function(ids, api.token = NULL) {
ResourceSpecSet(lapply(ids, function(id) {
resources.factory$gwas.request(
source = "mySource",
id = id,
.meta = list(api.token = api.token)
)
}))
}Extending PolyGenius
Most PolyGenius extensions do not require changing the scheduler itself. The normal extension path is to add new resource declarations, new rules, or both, and let the existing execution engine discover and schedule them the same way it handles the built-in sources and algorithms.
Extension design principles
The existing engine is easiest to extend when you follow the same patterns as the built-in code:
- Keep cache identity in
params, and keep runtime-only hints in.meta. - Keep
inputs(),bind(), andrequirements()light; do heavy I/O inrun(). - Reuse existing resource types when they already match the artifact you need.
- Normalize outputs into the same data shapes that downstream rules already expect.
The core extension building blocks are ResourceSpec, resources.factory, Rule, and ExecutionEngine.
Add a new GWAS source
In most cases, a new GWAS source does not need a new resource type. You usually keep the existing gwas.request -> gwas.sumstats pattern and add a new source label plus a rule that knows how to resolve it.
Step 1: add a user-facing source declaration
Step 2: add a rule that resolves that request into gwas.sumstats
FetchMySourceRule <- R6::R6Class(
"FetchMySourceRule",
inherit = Rule,
public = list(
initialize = function() {
super$initialize("fetch.my.source")
},
matches = function(output.spec) {
identical(output.spec$type, resources.type$gwas.sumstats) &&
identical(output.spec$params$source, "mySource")
},
inputs = function(output.spec) {
list(
request = resources.factory$gwas.request(
source = "mySource",
id = output.spec$params$id,
.meta = output.spec$meta
)
)
},
requirements = function(output.spec, inputs) {
list(cores = 1, memory = 2)
},
run = function(output.spec, inputs, logger) {
request <- inputs$request$spec
raw <- fetch_somehow(request$params$id, token = request$meta$api.token)
normalized <- normalize_to_polygenius_columns(raw)
list(
data = PolyGeniusModel(
variants = normalized,
name = request$params$id,
build = "GRCh38",
gwas = list(id = request$params$id)
),
meta = list(build = "GRCh38"),
logs = list(log.entry(self$name, "completed"))
)
}
)
)Step 3: normalize the returned GWAS columns
At minimum, built-in generation expects:
chrpositioneaneabetapval
If the new source should support LDpred2 or lassosum2 cleanly, you should also provide:
sen, or source metadata carryingn_eff/sample_size
If you want eaf.threshold support in ClumpingThresholding, also provide eaf.
Step 4: register the rule
Add the new rule to rules.sources so the core runtime can discover it.
Add a new PRS algorithm
New algorithms usually keep the existing final output type, polygenius.model. That means you normally add:
- a user-facing algorithm declaration;
- one or more rules that resolve that algorithm’s output;
- optional new intermediate resources if the method needs them.
Step 1: add a user-facing algorithm declaration
generate.algorithm.MyMethod <- function(reference.panel, pval = 1, ncores = 1) {
ResourceSpecSet(
resources.factory$generate.algorithm(
name = "MyMethod",
reference.panel = reference.panel,
pval = pval,
ncores = ncores
)
)
}Once this exists, generate$models() will automatically cross it with the requested GWAS sources and create the corresponding polygenius.model requests.
Step 2: add a rule for the algorithm output
RunMyMethodRule <- R6::R6Class(
"RunMyMethodRule",
inherit = Rule,
public = list(
initialize = function() {
super$initialize("run.my.method")
},
matches = function(output.spec) {
identical(output.spec$type, resources.type$polygenius.model) &&
identical(output.spec$params$algorithm, "MyMethod")
},
inputs = function(output.spec) {
list(
gwas = resources.factory$gwas.sumstats(
source = output.spec$params$gwas.source,
id = output.spec$params$gwas.id,
.meta = list(pval.max = output.spec$params$pval)
)
)
},
requirements = function(output.spec, inputs) {
list(cores = output.spec$params$ncores %||% 1, memory = 8)
},
run = function(output.spec, inputs, logger) {
gwas <- inputs$gwas$data
model.variants <- fit_my_method(gwas)
list(
data = PolyGeniusModel(
variants = model.variants,
name = gwas$name,
build = gwas$build,
gwas = gwas$gwas,
generation = list(algorithm = "MyMethod")
),
meta = list(build = gwas$build, algorithm = "MyMethod"),
logs = list(log.entry(self$name, "completed"))
)
}
)
)Reusing other rules from an algorithm
An algorithm does not call other rules directly. Instead, it depends on the resources those rules know how to produce.
For example, an LD-based method can ask for:
gwas.sumstatsld.matrixld.matchedreference.panel.bigsnpclumped.variants
That is exactly how the built-in LDpred2 and lassosum2 paths work: they request ld.matrix and ld.matched, and the engine resolves those resources through the existing support rules before the final algorithm rule runs.
If your algorithm needs build-aware dependencies, use bind() so the exact input resources can be derived after upstream metadata becomes available.
Step 3: register the rule
Add the new rule to rules.algorithms.
Add a new resource type or intermediate rule
Some extensions do need a new cached intermediate artifact. In that case, add a new resource type and then add rules that produce and consume it.
Step 1: add the resource type and factory constructor
resources.type$my.intermediate <- "my.intermediate"
resources.factory$my.intermediate <- function(gwas.source, gwas.id, setting, .meta = NULL,
.serializer = resources.serializer$rds) {
ResourceSpec$new(
resources.type$my.intermediate,
gwas.source = gwas.source,
gwas.id = gwas.id,
setting = setting,
.meta = .meta,
.serializer = .serializer
)
}Step 2: choose persistence
Use:
resources.serializer$rdsfor general R objects;resources.serializer$data.fstfor large tabular or model-like payloads that benefit from the faster serializer.
Step 3: add a rule that produces the resource
Your new rule can now matches() on resources.type$my.intermediate, declare its inputs, and return the new artifact from run(). Downstream algorithm rules can then depend on that resource exactly the same way the built-in rules depend on ld.matrix or clumped.variants.
Good built-in examples to copy
The easiest way to extend PolyGenius is to copy the closest built-in pattern:
- LoadLocalGWASRule for a simple local source.
- FetchOpenGWASRule for a remote authenticated source with retry logic.
- RunLdPred2Rule and RunLassosum2Rule for algorithms that reuse intermediate resources.
- BuildLDMatrixRule and MatchLDVariantsRule for intermediate artifacts shared across algorithms.
- RuleRegistry, rules.sources, and rules.algorithms for how the runtime discovers new rules.
For the scheduler and cache behavior that these extensions plug into, see Chapter 17.