compute\(similarity\)obs
Compute similarity matrices between observations or models
Description
compute$similarity$obs() and compute$similarity$mod() compute pairwise similarity or distance matrices between observations (based on PRS score profiles) or between models (based on SNP overlap or score patterns).
Both functions always return full-sized matrices (n_obs × n_obs or n_mod × n_mod) preserving the original order from data$obs_names or data$mod_names, even when computation is performed on a subset.
Subsetting parameters (obs and models) follow the same resolution pattern as data$fetch(): they can be character vectors (names), logical/numeric vectors, or expressions that resolve via data$fetch() to filter the computation.
Usage
`compute$similarity$obs`(data, layer, method = c("pearson", "spearman", "euclidean", "manhattan", "cosine"), obs, use = "pairwise.complete.obs", ...)
`compute$similarity$mod`(data, layer, method = c("score.pearson", "score.spearman", "snp.jaccard", "snp.weighted.overlap"), models, use = "pairwise.complete.obs", ...)
Arguments
data
|
A |
layer
|
Unquoted name of scores layer (e.g., |
method
|
For models, similarity metric:
|
obs
|
Optional. Unquoted expression to subset observations, following the same resolution pattern as
|
use
|
For correlation methods, how to handle missing values (see |
…
|
Additional arguments (reserved) |
models
|
Optional. Unquoted expression to subset models, following the same resolution pattern as
|
Details
Observation similarity (compute$similarity$obs): Computes pairwise relationships between observations based on their PRS score profiles from the specified layer. Each observation is represented as a vector of scores across models, and similarity is computed between these vectors.
The coop package is used automatically for pearson correlation and cosine similarity if available, providing significant speed improvements for large datasets. If not installed, base R functions are used as a fallback.
When to use each method:
-
Correlation: General similarity in score patterns (scale-invariant)
-
Euclidean/Manhattan: Absolute distance between score profiles
-
Cosine: Directional similarity (ignores magnitude)
Results should be stored in data$obsp for downstream visualization:
data$obsp$correlation <- compute$similarity$obs(data, X, method = "pearson") data$obsp$euclidean <- compute$similarity$obs(data, X, method = "euclidean")
Model similarity (compute$similarity$mod): Computes pairwise relationships between PRS models using either SNP-based or score-based methods.
SNP-based methods (snp.jaccard, snp.weighted.overlap):
-
Use variant information from
data$mod$models -
SNPs are matched by chromosome:position
-
snp.jaccard: Jaccard index = |intersection| / |union| -
snp.weighted.overlap: Weighted by absolute effect size products
Score-based methods (score.pearson, score.spearman):
-
Use computed PRS scores from
data$scores[[layer]] -
Correlation computed across observations (models as variables)
-
Uses
cooppackage if available for performance
Results should be stored in data$modp:
data$modp$jaccard <- compute$similarity$mod(data, X, method = "snp.jaccard") data$modp$score.cor <- compute$similarity$mod(data, X, method = "score.pearson")
Value
A symmetric numeric matrix with dimensions n_obs × n_obs (observations) or n_mod × n_mod (models), with row and column names from data$obs_names or data$mod_names. The matrix carries a log attribute accessible via slotLog().
Functions
-
compute(similarity.obs): Compute observation-observation similarity -
compute(similarity.mod): Compute model-model similarity
Examples
## Not run:
# Compute pearson correlation between all observations
data$obsp$cor <- compute$similarity$obs(data, X, method = "pearson")
# Distance matrix
data$obsp$dist <- compute$similarity$obs(data, X, method = "euclidean")
# Subset by name
data$obsp$cases <- compute$similarity$obs(data, X, obs = c("subj_1", "subj_2"))
# Subset by expression (resolved via fetch)
data$obsp$elderly <- compute$similarity$obs(data, X, obs = age > 65)
data$obsp$cases <- compute$similarity$obs(data, X, obs = case == 1)
## End(Not run)
## Not run:
# SNP overlap similarity
data$modp$jaccard <- compute$similarity$mod(data, X, method = "snp.jaccard")
data$modp$weighted <- compute$similarity$mod(data, X, method = "snp.weighted.overlap")
# Score correlation
data$modp$cor <- compute$similarity$mod(data, X, method = "score.pearson")
# Subset by name
data$modp$best <- compute$similarity$mod(data, X, models = c("LDpred2", "CT_5e8"))
# Subset by expression
data$modp$AD <- compute$similarity$mod(data, X, models = gwas$trait == "AD")
## End(Not run)