visualize$data$scores$distribution

visualize PRS score distributions

Description

Creates distribution plots for polygenic risk scores across samples, optionally faceted by phenotypic variables.

Usage

visualize.scores.distribution(
  data,
  models,
  scores.layer = X,
  obs = NULL,
  split.by = NULL,
  group.by = NULL,
  label = NULL,
  type = c("violin", "density", "histogram", "boxplot"),
  show.outliers = TRUE,
  show.points = FALSE,
  raster = FALSE,
  color.by = c("model", "group"),
  pattern.args = list(),
  geom.args = list(),
  raster.args = list(),
  facet.args = list(),
  ...
)

Arguments

`data`	A `PolyGeniusData` object containing the data to visualize.
`models`	Unquoted expression selecting which models to include. Can be: Indices: `c(1, 3, 5)` for specific positions Names: `c(AD_PRS, PD_PRS)` for specific models Expressions: `starts_with(“AD”)` for pattern matching
`scores.layer`	Unquoted expression; name of the scores layer to access. Defaults to `X`. Use `data$layers` to see available layers.
`obs`	Optional unquoted expression selecting observations to keep. Resolved via `data$fetch()` on the observation side.
`split.by`	Unquoted expression selecting a faceting variable from phenotypes. Can be a simple variable or expression. Creates separate panels for each level.
`group.by`	Unquoted expression selecting a grouping variable from phenotypes. Creates multiple distributions per model within each panel. When combined with multiple models, use `color.by` to control which dimension uses color vs pattern.
`label`	Unquoted expression selecting labels for models from phenotypes or metadata. Passed to `data$fetch()` to replace default model names in the plot.
`type`	Character; type of distribution plot: `“violin”` – Violin plots showing full distribution (default) `“density”` – Smoothed density curves `“histogram”` – Binned histograms `“boxplot”` – Box-and-whisker plots
`show.outliers`	Logical; if `TRUE` (default), shows outliers in boxplot. If `FALSE`, outliers are hidden. Only applies when `type = “boxplot”`.
`show.points`	Logical; if `TRUE`, shows individual data points. For density plots, shows as rug/ticks under the distribution. For violin/boxplot, shows as jittered points. Ignored for histogram. Default is `FALSE`.
`raster`	Logical; if `TRUE` and `show.points = TRUE`, uses rasterized points via `ggrastr::geom_point_rast()` for better performance with large datasets. Requires the `ggrastr` package to be installed (optional dependency). If not available, falls back to regular `geom_point()`. Default is `FALSE`.
`color.by`	Character; when both multiple models and `group.by` are present, controls which dimension uses color/fill. Options: `“model”` (default) or `“group”`. The other dimension will use pattern textures. Ignored if only one dimension is present.
`pattern.args`	List; additional arguments passed to ggpattern geoms when patterns are used. Common options: `pattern_density` (0.1-0.5), `pattern_spacing` (0.01-0.05). Only used when both models and groups are present. Default is `list()`.
`geom.args`	List; additional arguments passed to the main distribution geom (geom_violin, geom_density, geom_histogram, or geom_boxplot). Common options: `alpha` (transparency), `bins` (for histogram), `bw` (for density bandwidth). Default is `list()`.
`raster.args`	List; additional arguments passed to `ggrastr::geom_point_rast()` when `raster = TRUE`. Common options: `raster.dpi` (default 300), `raster.width`, `raster.height`. Default is `list()`.
`facet.args`	List; additional arguments passed to `facet_wrap()` when `split.by` is used. Common options: `ncol`, `nrow`, `scales` ("fixed", "free", "free_x", "free_y"). Default is `list()`.
`…`	Additional arguments passed to `data$fetch()`

Details

For single model: shows one distribution plot. For multiple models: overlays distributions, colored by model.

Value

Examples

## Not run: 
# Single model distribution
visualize$data$scores$distribution(data, models = 1)

# Multiple models overlaid
visualize$data$scores$distribution(data, models = c(1, 2, 3), type = "density")

# Distribution split by sex
visualize$data$scores$distribution(data, models = 1, split.by = sex)

# Single model grouped by diagnosis
visualize$data$scores$distribution(data, models = 1, group.by = diagnosis, type = "density")

# Multiple models grouped by sex (models colored, groups patterned)
visualize$data$scores$distribution(
  data,
  models = c(1, 2),
  group.by = sex,
  color.by = "model"
)

# Multiple models grouped by sex (groups colored, models patterned)
visualize$data$scores$distribution(
  data,
  models = c(1, 2),
  group.by = sex,
  color.by = "group"
)

# Multiple models, split by diagnosis
visualize$data$scores$distribution(
  data,
  models = c(AD_PRS, PD_PRS),
  split.by = diagnosis,
  type = "density"
)

# Show individual points with rasterization for large datasets
visualize$data$scores$distribution(
  data,
  models = 1:3,
  show.points = TRUE,
  raster = TRUE
)

## End(Not run)