compute$scores

Compute Polygenic Risk Scores (PRS)

Description

Calculates the PRS values for every model in a PolyGeniusData object.

Overview:

  • Collates the unique set of genetic variants across all PRS models.

  • Extracts allele dosages for each variant from the genotype data.

  • Matches each extracted variant to its model definition by chromosome and position, confirming which allele is the effect allele and flipping dosages when necessary.

  • Applies each model’s effect sizes (beta) to the per-sample dosages and yields the PRS score.

When genotype data is split into multiple files, extraction, variant matching, and effect-size-by-dosage multiplication are performed per file to reduce computation time and memory use. The resulting per-file PRS values are summed to compute the final scores.

Usage

`compute$scores`(data, minor.allele.freq.threshold = 0, model.filter = function(variants) variants, models = NULL, dosages.fallback = NULL, simplify = FALSE)

Arguments

data

A PolyGeniusData object.

minor.allele.freq.threshold

(Optional) A numeric in [0, 0.5] (default: 0). Variants with minor-allele frequency below this threshold (or above 1 - threshold) are excluded.

model.filter

(Optional) A function function(variants) to filter model variants. It receives:

  • name: the PRS model name.

  • metadata: model metadata (from data$modm$metadata[name, ]).

  • variants: the model variant table (data$mod$models[[name]]).

models

(Optional) table or list of tables representing It must return a data.frame with columns chr, position, ea, nea, and beta.

simplify

Logical indicating if in the case of computing a single PRS model should returned as a vector (simplify=TRUE) or as a single column table

Details

Variant matching: While PRS model variants are named as chr_pos_nea_ea (for effect allele ea and non-effect allele nea), genotype files may use other representations such as chr:pos:a0:a1_a2, where a0 and a1 are the two alleles, and a2 indicates the dosage allele. Matching is done by chromosome and position, accepting a match if (ea == a0 & nea == a1) or (ea == a1 & nea == a0).

Correcting allele dosages: Once matched, the dosage allele a2 is compared to the model’s effect allele ea. If a2 != ea, the dosage used is 2 - dosage.

Debug information: When debug output is enabled by the caller, implementation-specific diagnostic files may be written alongside score outputs.

  1. PLINK .traw files of extracted variant dosages.

  2. PLINK .afreq and .log files with allele frequency calculations.

  3. Matched variant .traw.prescores files containing matched and flipped allele dosages with columns: variant, flipped, beta, model.id, and one column per sample.

Note: writing these files can be time-consuming and may produce large output. You can set the number of threads using setDTthreads() to improve performance.

Value

Invisibly returns the input PolyGeniusData object with the following additions:

  • A matrix of PRS values (samples × models), stored in data$scores or data$layers[[key]].

  • A log entry under data$log$misc containing a variant-matching summary for diagnostics.