Execution Engine Caching and Registries

PolyGenius helps users manage the large collection of resources required for real polygenic analyses. These include reference panels, LD matrices, GWAS summary statistics, and liftover chains, along with the metadata needed to connect them correctly across algorithms, genome builds, and repeated runs.

Some of these resources are declared directly by the user, while others are discovered only when an operation is being resolved. PolyGenius handles this through two complementary layers:

the settings registries, which keep track of known infrastructure resources such as genome builds, reference panels, and liftover chains;
the execution engine cache, which stores resolved resources and intermediate products so they can be reused in later operations.

Registries under `settings`

The settings environment is the catalog of long-lived infrastructure resources that the rest of the system relies on. Useful reference pages are settings, settings$genomeBuilds](../reference/topics/GenomeBuilds.html), [settings$referencePanels, and settings$liftoverChains.

Genome builds

settings$genomeBuilds defines the supported genome-build labels, aliases, and normalization helpers used throughout PolyGenius. Its job is not to store data files, but to make sure that resources declared with slightly different user-facing names can still be resolved consistently.

In practice, this registry provides a shared vocabulary for build-aware operations. Whenever PolyGenius needs to compare a GWAS build with a reference-panel build, decide whether liftover is needed, or label a resource consistently in logs and metadata, it relies on settings$genomeBuilds.

Liftover chains

settings$liftoverChains tracks available chain files for build-to-build conversions. These resources are needed when a requested operation depends on assets that are available in one build but must be used in another.

This registry can be managed directly by the user, but it is also used by the execution engine. If a required chain is not already available locally, PolyGenius can resolve it through its built-in liftover-chain rules, download it automatically, and make it available for subsequent operations. Those rules are documented under settings-liftover-chains-rules.

Reference panels

settings$referencePanels tracks the reference panels that can be used for clumping, LD construction, and LD-aware algorithms. At a high level, it provides an inventory of what panel assets are available, what genome build they belong to, and which formats can be resolved.

As with liftover chains, this registry can be managed by the user but is also actively used by the execution engine. If a requested panel is missing, PolyGenius can download it automatically. If a downstream method needs a different build or format, the engine can trigger the required liftover or conversion steps before the panel is used. The low-level rules behind that behaviour are documented under settings-reference-panels-rules.

The important idea is that these are not just static configuration entries. Liftover chains and reference panels are operational resources: they may be retrieved, transformed, and reused as part of future computations, and the registries are where PolyGenius keeps track of that managed infrastructure.

Execution-engine caching and inspection

The registries above manage durable infrastructure resources. The execution engine manages a second layer: the file-backed cache of resolved resources created while PolyGenius runs.

That cache lives in the ResourceStore, which is exposed through core as core$store. Whenever the execution engine resolves a resource, it checks whether a compatible cached copy already exists. If so, the cached resource is reused. If not, the relevant rule is run and the resulting resource is written back to the store for future reuse.

This applies not only to final outputs, but also to intermediate artifacts. Depending on the workflow, the cache may contain normalized GWAS summary statistics, converted reference panels, matched LD inputs, or full LD matrices. That is what allows later operations to reuse expensive intermediate work rather than recomputing it from scratch.

The main interfaces for inspecting this system are:

core, which exposes the runtime gateway;
ExecutionEngine as core$execution.engine, which resolves dependencies and schedules jobs;
ResourceStore as core$store, which provides cache paths, loading, saving, and per-type indexes.

A simple inspection workflow is:

print(core)
print(core$store)

core$store$view.index("gwas.sumstats")
core$store$view.index("reference.panel")
core$store$view.index("ld.matrix")

Those indexes give a direct view of which resources have already been materialized and are therefore candidates for reuse in later runs.

Chapter 17 covers the execution engine itself in more detail, including graph discovery, scheduling, and cache-aware execution.