load_ad_gwas <- function() {
data.table::fread(
"data/gwas/AD_local_grch37.tsv.gz",
select = c("chr", "position", "ea", "nea", "beta", "se", "pval", "n", "eaf", "rsid")
)
}
src_local <- generate$sources$local(
gwas = list(AD_local = load_ad_gwas),
metadata = data.frame(
id = "AD_local",
build = "GRCh37",
trait = "Alzheimer's disease",
sample_size = 245678,
stringsAsFactors = FALSE
)
)GWAS Sources
PolyGenius asks you to declare the GWAS summary-statistics sources you want, not to hand-write a different retrieval script for every study. That separation lets one generate$models(...) call mix local files, OpenGWAS studies, and GWAS Catalog accessions while the execution engine takes care of loading, fetching, normalizing, and caching.
Under the hood, every source declaration creates one or more gwas.request resources. The engine then resolves those into normalized gwas.sumstats resources that the algorithms can consume in a uniform way.
Why source declarations matter
- You describe the desired GWAS inputs once and reuse the same algorithm declarations across source types.
- The engine can discover the correct retrieval rule automatically for each source.
- Retrieved or loaded GWAS datasets are cached in a common format, so repeated runs can reuse them.
- Mixed-source runs stay composable: local, OpenGWAS, and GWAS Catalog requests can all appear in the same generation call.
Local sources are the most explicit option: you provide the data yourself plus metadata that tells PolyGenius what the GWAS is and which build it uses.
For local inputs, a loader function is usually the best pattern, because the engine only evaluates it when that GWAS is actually needed. The implementation also accepts an already-loaded data frame, but lazy loader functions scale better for large files and keep declarations cheap.
Declare a local GWAS
The metadata table must contain at least:
idbuild
Any extra metadata columns are carried forward into the normalized GWAS metadata, which is useful for fields such as trait, n_eff, or sample_size.
Local column requirements
Minimum columns required by the local-source rule:
chrpositioneaneabetapval
Optional columns that are often worth providing:
se: required by LD-based algorithms such as LDpred2 and lassosum2.n: used as variant-level effective sample size for LD-based algorithms.eaf: needed when ClumpingThresholding useseaf.threshold > 0.rsid: optional, but useful for provenance and inspection.
If n is not present, LD-based algorithms can still work when the metadata supplies a study-level n_eff or sample_size.
Multiple local GWAS inputs in one declaration
src_local_multi <- generate$sources$local(
gwas = list(
T2D_local = function() data.table::fread("data/gwas/T2D_local_grch37.tsv.gz"),
CAD_local = function() data.table::fread("data/gwas/CAD_local_grch37.tsv.gz")
),
metadata = data.frame(
id = c("T2D_local", "CAD_local"),
build = c("GRCh37", "GRCh37"),
trait = c("Type 2 diabetes", "Coronary artery disease"),
stringsAsFactors = FALSE
)
)Use OpenGWAS when you want IEU-hosted studies and are happy to let PolyGenius fetch them on demand through ieugwasr.
Declare OpenGWAS studies
src_open <- generate$sources$opengwas(
ids = c("ieu-a-2", "ieu-b-5067"),
wait.for.allowance = TRUE
)You can also pass the token explicitly:
src_open <- generate$sources$opengwas(
ids = "ieu-a-2",
opengwas.jwt = Sys.getenv("OPENGWAS_JWT"),
wait.for.allowance = TRUE
)Authentication
OpenGWAS access requires a JWT token.
If you do not pass opengwas.jwt, PolyGenius calls ieugwasr::get_opengwas_jwt(). That keeps token handling aligned with ieugwasr; in practice this usually means the token is taken from the OPENGWAS_JWT environment variable or whatever authentication route ieugwasr is configured to use.
Retrieval mode depends on the requested p-value threshold
OpenGWAS retrieval is not implemented as one fixed download path. PolyGenius chooses the path from the effective pval threshold requested by the generation run.
- When
pval < 0.1, PolyGenius usesieugwasr::tophits(..., clump = 0). This is the lighter-weight path, and most filtering happens on the OpenGWAS side before the results are normalized locally. - When
pval >= 0.1, PolyGenius usesieugwasr::gwasinfo_files()to find the GWAS VCF, downloads the VCF, parses it locally, and applies the p-value filter on the client side after normalization.
This threshold is determined by what the requested algorithms need from that source, not by a separate source-side argument.
Allowance waiting and retry
wait.for.allowance = TRUE is the default. When OpenGWAS reports that the allowance has been exhausted, PolyGenius:
- reads the reset time from the API message when it can;
- sleeps until that time, using a 60-second fallback when no timestamp can be parsed;
- retries the call once.
If wait.for.allowance = FALSE, PolyGenius surfaces the allowance error immediately instead of waiting.
Use the GWAS Catalog source when you want public studies that already have harmonized summary statistics in the catalog’s harmonized layout.
Declare GWAS Catalog accessions
src_catalog <- generate$sources$gwascatalog(
ids = c("GCST90027158", "GCST90304505")
)Harmonized-only support
The current GWAS Catalog rule only supports studies that expose a harmonized payload. Internally, PolyGenius resolves the study metadata, follows the full_summary_stats location, looks for the harmonised directory, reads the harmonized metadata, and then downloads the single harmonized .h.tsv.gz summary-statistics file.
That means:
- harmonized GWAS Catalog resources are supported;
- non-harmonized summary-statistics layouts are not;
- the normalized output is treated as a harmonized GRCh38 resource.
No JWT token is required for GWAS Catalog access.
Combine several source types in one run
Once the sources are declared, the generation call is the same regardless of where the GWAS data comes from.
alg_ct <- generate$algorithms$ClumpingThresholding(
pval = c(5e-8, 1e-6),
reference.panel = "EUR",
clump.r2 = 0.1,
clump.kb = 250
)
models_all <- generate$models(
sources = list(src_local_multi, src_catalog, src_open),
algorithms = alg_ct
)
models_allReference pages
For exact source arguments, see the reference pages for generate\(sources\)local, generate\(sources\)gwascatalog, and generate\(sources\)opengwas.
Next, Chapter 5 covers the supported PRS algorithms, their assumptions, and how to declare them.