compute\(similarity\)obs

Compute similarity matrices between observations or models

Description

compute$similarity$obs() and compute$similarity$mod() compute pairwise similarity or distance matrices between observations (based on PRS score profiles) or between models (based on SNP overlap or score patterns).

Both functions always return full-sized matrices (n_obs × n_obs or n_mod × n_mod) preserving the original order from data$obs_names or data$mod_names, even when computation is performed on a subset.

Subsetting parameters (obs and models) follow the same resolution pattern as data$fetch(): they can be character vectors (names), logical/numeric vectors, or expressions that resolve via data$fetch() to filter the computation.

Usage

`compute$similarity$obs`(data, layer, method = c("pearson", "spearman", "euclidean", "manhattan", "cosine"), obs, use = "pairwise.complete.obs", ...)

`compute$similarity$mod`(data, layer, method = c("score.pearson", "score.spearman", "snp.jaccard", "snp.weighted.overlap"), models, use = "pairwise.complete.obs", ...)

Arguments

data

A PolyGeniusData object

layer

Unquoted name of scores layer (e.g., X, Y)

method

For models, similarity metric:

  • “snp.jaccard” (Jaccard index on SNP overlap)

  • “snp.weighted.overlap” (Weighted by effect sizes)

  • “score.pearson” (Pearson correlation of score vectors)

  • “score.spearman” (Spearman correlation of score vectors)

obs

Optional. Unquoted expression to subset observations, following the same resolution pattern as data$fetch(). Can be:

  • Character vector of observation names

  • Logical or numeric index vector

  • Expression that resolves via data$fetch() (e.g., age > 65, case == 1) If provided, only these observations are used in the computation, but the returned matrix is still full-sized with NA for excluded observations.

use

For correlation methods, how to handle missing values (see stats::cor()). Default “pairwise.complete.obs”.

Additional arguments (reserved)

models

Optional. Unquoted expression to subset models, following the same resolution pattern as data$fetch(). Can be:

  • Character vector of model names

  • Logical or numeric index vector

  • Expression that resolves via data$fetch() (e.g., gwas$trait == “AD”) If provided, only these models are used, but the returned matrix is full-sized with NA for excluded models.

Details

Observation similarity (compute$similarity$obs): Computes pairwise relationships between observations based on their PRS score profiles from the specified layer. Each observation is represented as a vector of scores across models, and similarity is computed between these vectors.

The coop package is used automatically for pearson correlation and cosine similarity if available, providing significant speed improvements for large datasets. If not installed, base R functions are used as a fallback.

When to use each method:

  • Correlation: General similarity in score patterns (scale-invariant)

  • Euclidean/Manhattan: Absolute distance between score profiles

  • Cosine: Directional similarity (ignores magnitude)

Results should be stored in data$obsp for downstream visualization:

data$obsp$correlation <- compute$similarity$obs(data, X, method = "pearson")
data$obsp$euclidean <- compute$similarity$obs(data, X, method = "euclidean")

Model similarity (compute$similarity$mod): Computes pairwise relationships between PRS models using either SNP-based or score-based methods.

SNP-based methods (snp.jaccard, snp.weighted.overlap):

  • Use variant information from data$mod$models

  • SNPs are matched by chromosome:position

  • snp.jaccard: Jaccard index = |intersection| / |union|

  • snp.weighted.overlap: Weighted by absolute effect size products

Score-based methods (score.pearson, score.spearman):

  • Use computed PRS scores from data$scores[[layer]]

  • Correlation computed across observations (models as variables)

  • Uses coop package if available for performance

Results should be stored in data$modp:

data$modp$jaccard <- compute$similarity$mod(data, X, method = "snp.jaccard")
data$modp$score.cor <- compute$similarity$mod(data, X, method = "score.pearson")

Value

A symmetric numeric matrix with dimensions n_obs × n_obs (observations) or n_mod × n_mod (models), with row and column names from data$obs_names or data$mod_names. The matrix carries a log attribute accessible via slotLog().

Functions

  • compute(similarity.obs): Compute observation-observation similarity

  • compute(similarity.mod): Compute model-model similarity

Examples

## Not run: 
# Compute pearson correlation between all observations
data$obsp$cor <- compute$similarity$obs(data, X, method = "pearson")

# Distance matrix
data$obsp$dist <- compute$similarity$obs(data, X, method = "euclidean")

# Subset by name
data$obsp$cases <- compute$similarity$obs(data, X, obs = c("subj_1", "subj_2"))

# Subset by expression (resolved via fetch)
data$obsp$elderly <- compute$similarity$obs(data, X, obs = age > 65)
data$obsp$cases <- compute$similarity$obs(data, X, obs = case == 1)

## End(Not run)

## Not run: 
# SNP overlap similarity
data$modp$jaccard <- compute$similarity$mod(data, X, method = "snp.jaccard")
data$modp$weighted <- compute$similarity$mod(data, X, method = "snp.weighted.overlap")

# Score correlation
data$modp$cor <- compute$similarity$mod(data, X, method = "score.pearson")

# Subset by name
data$modp$best <- compute$similarity$mod(data, X, models = c("LDpred2", "CT_5e8"))

# Subset by expression
data$modp$AD <- compute$similarity$mod(data, X, models = gwas$trait == "AD")

## End(Not run)