associate$regression

Regression-based associations for PolyGeniusData

Description

associate$regression() fits one or more association models between observation-side clinical or biological endpoints and predictors resolved from a PolyGeniusData object.

The function is intentionally declarative: users describe what to analyze, while the regression engine chooses and executes the appropriate family.

Summary mode returns a PolyGeniusAssociations object with:

  • a tidy inferential summary table

  • family-specific plot-support artifacts

  • diagnostics describing fit warnings and failures

Model mode returns raw fitted-model objects and minimal metadata for advanced inspection.

Supported regression families are summarized below:

Family Meaning Family documentation
“lm” Continuous outcomes via Gaussian linear regression. LinearRegression
“glm” Binary outcomes via binomial logistic regression. LogisticRegression
“cox” Right-censored survival outcomes. CoxRegression
“crr” Competing-risk outcomes via Fine-Gray regression. CompetingRiskRegression
“km” Grouped Kaplan-Meier / log-rank comparisons. KaplanMeierRegression

Usage

`associate$regression`(data, outcomes = NULL, predictors = everything(), scores.layer = X, split.by = NULL, interactions = NULL, covariates = NULL, model = c("auto", "lm", "glm", "cox", "crr", "km"), time = NULL, event = NULL, competing = NULL, reference.level = NULL, weights = NULL, conf.level = 0.95, p.adjust.method = "BH", output = c("summary", "models"), ...)

Arguments

data

A PolyGeniusData object.

outcomes

Optional unquoted expression specifying one or more clinical or biological endpoints. For Cox and Fine-Gray analyses this is the endpoint/event-of-interest indicator, not the time variable.

predictors

Unquoted expression specifying tested predictors. By default everything() expands to all PRS models from scores.layer.

scores.layer

Score layer used for predictors that match PRS model names.

split.by

Optional unquoted expression defining one or more strata. A separate fit is run per observed stratum.

interactions

Optional unquoted expression defining one or more interaction variables. Classical and survival families interpret interactions on their family-specific model surface.

covariates

Optional unquoted expression defining adjustment variables.

model

Regression family to use. “auto” infers the family from the inputs.

time

Optional unquoted expression defining follow-up time for survival-style analyses.

event

Optional unquoted expression defining the event indicator for survival-style analyses. When omitted, outcomes supplies the event indicator.

competing

Optional unquoted expression defining a competing-event indicator for Fine-Gray regression. This should be TRUE/1 for an event that prevents later observation of the endpoint of interest.

reference.level

Optional character scalar naming the reference level for categorical predictors. Cox, Fine-Gray, and Kaplan-Meier analyses use this to relevel the predictor before fitting or curve construction.

weights

Optional unquoted expression or numeric vector of regression weights.

conf.level

Confidence level for interval estimates.

p.adjust.method

P-value adjustment method passed to stats::p.adjust().

output

One of “summary” or “models”.

Additional arguments are currently not supported. Unknown arguments are rejected early so misspelled formal arguments produce a direct error.

Details

With model = “auto”, the function chooses:

  • “crr” when time and competing are supplied

  • “cox” when time is supplied without competing

  • “glm” for factor outcomes

  • “lm” otherwise

For survival-style analyses, outcomes still names the endpoint being modeled. The endpoint is represented as an event indicator: TRUE/1 when the endpoint occurred and FALSE/0 otherwise. time gives the observed time or age at endpoint, competing event, or censoring. competing gives a separate indicator for an event that prevents the endpoint from being observed later. For example, in an age-at-dementia-onset analysis with death before dementia as a competing risk, use:

Observation status outcomes = dementia competing = death_without_dementia time = age_observed
Dementia before death 1 0 age at dementia onset
Died non-demented 0 1 age at death
Alive and non-demented at last visit 0 0 age at last dementia-free observation

This convention keeps outcomes aligned with linear and logistic models: it always identifies the endpoint of interest. Survival families add the observation time and optional competing-event indicator needed to model that endpoint correctly.

Predictors can be PRSs or any other observation-side variables resolvable by data$fetch(). When a requested predictor matches a model name in data$mod_names, it is read from the requested PRS score layer; otherwise it is resolved through data$fetch().

Survival support currently covers standard right-censored workflows. The following are not supported:

  • multi-state outcomes

  • counting-process / start-stop inputs

  • interval-censored outcomes

  • artifact merging during meta-analysis

Regression summary rows use a stable schema so downstream filtering, plotting, and meta-analysis can operate on the returned summary table. Core columns are:

Column Meaning
family Regression family used for the fit.
outcome Resolved outcome variable name.
predictor Resolved tested predictor name.
term Model term represented by the row.
term.type Row type: main, interaction, or omnibus.
stratum Observed analysis stratum, or “all”.
estimate, se, lower, upper Estimate, standard error, and confidence interval on the family-specific effect scale.
statistic, p.value, adj.p.value Inferential statistic and raw/adjusted p-values.
n Effective sample size used for the fit.
formula Model formula or family-specific symbolic specification used for the fit.
fit.id Stable fit identifier shared with artifacts and diagnostics.

Additional columns are added where meaningful, including interaction, effect.scale, test, n.cases, n.controls, n.events, and n.competing.

term.type is especially important for downstream use:

  • main identifies a main-effect coefficient row

  • interaction identifies an interaction coefficient row

  • omnibus identifies a grouped test, such as the Kaplan-Meier log-rank row

Artifacts attached through slotArtifacts() hold the multi-row structures that do not belong in the inferential summary table, such as prediction grids, survival curves, risk tables, and profile tables. For survival families, the stored curves are analysis-loyal: Kaplan-Meier contributes the observed grouped curves requested by the grouped predictor, while Cox and competing-risk fits contribute adjusted curves from prediction profiles built from the same fitted model.

Value

  • “summary” returns a PolyGeniusAssociations object.

  • “models” returns a data.frame with one row per attempted fit and list-columns for fitted objects and diagnostics.

See Also

associate, PolyGeniusData, PolyGeniusAssociations

Examples

## Not run: 
# Direct binary endpoint association.
assoc <- associate$regression(
  data,
  outcomes = dementia,
  predictors = PRS_AD,
  covariates = c(age, sex, PC1, PC2)
)

# Age-at-dementia-onset with death before dementia as a competing risk.
# dementia is 1 only for dementia onset; death_without_dementia is 1 only
# for people who died non-demented; age_observed is age at dementia onset,
# death, or last dementia-free observation.
ad_onset <- associate$regression(
  data,
  outcomes = dementia,
  predictors = PRS_AD,
  time = age_observed,
  competing = death_without_dementia,
  covariates = c(sex, PC1, PC2),
  model = "crr"
)

## End(Not run)