visualize\(data\)scores$distribution

visualize PRS score distributions

Description

Creates distribution plots for polygenic risk scores across samples, optionally faceted by phenotypic variables.

Usage

visualize.scores.distribution(
  data,
  models,
  scores.layer = X,
  obs = NULL,
  split.by = NULL,
  group.by = NULL,
  label = NULL,
  type = c("violin", "density", "histogram", "boxplot"),
  show.outliers = TRUE,
  show.points = FALSE,
  raster = FALSE,
  color.by = c("model", "group"),
  pattern.args = list(),
  geom.args = list(),
  raster.args = list(),
  facet.args = list(),
  ...
)

Arguments

data

A PolyGeniusData object containing the data to visualize.

models

Unquoted expression selecting which models to include. Can be:

  • Indices: c(1, 3, 5) for specific positions

  • Names: c(AD_PRS, PD_PRS) for specific models

  • Expressions: starts_with(“AD”) for pattern matching

scores.layer

Unquoted expression; name of the scores layer to access. Defaults to X. Use data$layers to see available layers.

obs

Optional unquoted expression selecting observations to keep. Resolved via data$fetch() on the observation side.

split.by

Unquoted expression selecting a faceting variable from phenotypes. Can be a simple variable or expression. Creates separate panels for each level.

group.by

Unquoted expression selecting a grouping variable from phenotypes. Creates multiple distributions per model within each panel. When combined with multiple models, use color.by to control which dimension uses color vs pattern.

label

Unquoted expression selecting labels for models from phenotypes or metadata. Passed to data$fetch() to replace default model names in the plot.

type

Character; type of distribution plot:

  • “violin” – Violin plots showing full distribution (default)

  • “density” – Smoothed density curves

  • “histogram” – Binned histograms

  • “boxplot” – Box-and-whisker plots

show.outliers

Logical; if TRUE (default), shows outliers in boxplot. If FALSE, outliers are hidden. Only applies when type = “boxplot”.

show.points

Logical; if TRUE, shows individual data points. For density plots, shows as rug/ticks under the distribution. For violin/boxplot, shows as jittered points. Ignored for histogram. Default is FALSE.

raster

Logical; if TRUE and show.points = TRUE, uses rasterized points via ggrastr::geom_point_rast() for better performance with large datasets. Requires the ggrastr package to be installed (optional dependency). If not available, falls back to regular geom_point(). Default is FALSE.

color.by

Character; when both multiple models and group.by are present, controls which dimension uses color/fill. Options: “model” (default) or “group”. The other dimension will use pattern textures. Ignored if only one dimension is present.

pattern.args

List; additional arguments passed to ggpattern geoms when patterns are used. Common options: pattern_density (0.1-0.5), pattern_spacing (0.01-0.05). Only used when both models and groups are present. Default is list().

geom.args

List; additional arguments passed to the main distribution geom (geom_violin, geom_density, geom_histogram, or geom_boxplot). Common options: alpha (transparency), bins (for histogram), bw (for density bandwidth). Default is list().

raster.args

List; additional arguments passed to ggrastr::geom_point_rast() when raster = TRUE. Common options: raster.dpi (default 300), raster.width, raster.height. Default is list().

facet.args

List; additional arguments passed to facet_wrap() when split.by is used. Common options: ncol, nrow, scales ("fixed", "free", "free_x", "free_y"). Default is list().

Additional arguments passed to data$fetch()

Details

For single model: shows one distribution plot. For multiple models: overlays distributions, colored by model.

Value

A

Examples

## Not run: 
# Single model distribution
visualize$data$scores$distribution(data, models = 1)

# Multiple models overlaid
visualize$data$scores$distribution(data, models = c(1, 2, 3), type = "density")

# Distribution split by sex
visualize$data$scores$distribution(data, models = 1, split.by = sex)

# Single model grouped by diagnosis
visualize$data$scores$distribution(data, models = 1, group.by = diagnosis, type = "density")

# Multiple models grouped by sex (models colored, groups patterned)
visualize$data$scores$distribution(
  data,
  models = c(1, 2),
  group.by = sex,
  color.by = "model"
)

# Multiple models grouped by sex (groups colored, models patterned)
visualize$data$scores$distribution(
  data,
  models = c(1, 2),
  group.by = sex,
  color.by = "group"
)

# Multiple models, split by diagnosis
visualize$data$scores$distribution(
  data,
  models = c(AD_PRS, PD_PRS),
  split.by = diagnosis,
  type = "density"
)

# Show individual points with rasterization for large datasets
visualize$data$scores$distribution(
  data,
  models = 1:3,
  show.points = TRUE,
  raster = TRUE
)

## End(Not run)