associate$regression
Regression-based associations for PolyGeniusData
Description
associate$regression() fits one or more association models between observation-side clinical or biological endpoints and predictors resolved from a PolyGeniusData object.
The function is intentionally declarative: users describe what to analyze, while the regression engine chooses and executes the appropriate family.
Summary mode returns a PolyGeniusAssociations object with:
-
a tidy inferential summary table
-
family-specific plot-support artifacts
-
diagnostics describing fit warnings and failures
Model mode returns raw fitted-model objects and minimal metadata for advanced inspection.
Supported regression families are summarized below:
| Family | Meaning | Family documentation |
“lm”
|
Continuous outcomes via Gaussian linear regression. | LinearRegression |
“glm”
|
Binary outcomes via binomial logistic regression. | LogisticRegression |
“cox”
|
Right-censored survival outcomes. | CoxRegression |
“crr”
|
Competing-risk outcomes via Fine-Gray regression. | CompetingRiskRegression |
“km”
|
Grouped Kaplan-Meier / log-rank comparisons. | KaplanMeierRegression |
Usage
`associate$regression`(data, outcomes = NULL, predictors = everything(), scores.layer = X, split.by = NULL, interactions = NULL, covariates = NULL, model = c("auto", "lm", "glm", "cox", "crr", "km"), time = NULL, event = NULL, competing = NULL, reference.level = NULL, weights = NULL, conf.level = 0.95, p.adjust.method = "BH", output = c("summary", "models"), ...)
Arguments
data
|
A |
outcomes
|
Optional unquoted expression specifying one or more clinical or biological endpoints. For Cox and Fine-Gray analyses this is the endpoint/event-of-interest indicator, not the time variable. |
predictors
|
Unquoted expression specifying tested predictors. By default |
scores.layer
|
Score layer used for predictors that match PRS model names. |
split.by
|
Optional unquoted expression defining one or more strata. A separate fit is run per observed stratum. |
interactions
|
Optional unquoted expression defining one or more interaction variables. Classical and survival families interpret interactions on their family-specific model surface. |
covariates
|
Optional unquoted expression defining adjustment variables. |
model
|
Regression family to use. |
time
|
Optional unquoted expression defining follow-up time for survival-style analyses. |
event
|
Optional unquoted expression defining the event indicator for survival-style analyses. When omitted, |
competing
|
Optional unquoted expression defining a competing-event indicator for Fine-Gray regression. This should be |
reference.level
|
Optional character scalar naming the reference level for categorical predictors. Cox, Fine-Gray, and Kaplan-Meier analyses use this to relevel the predictor before fitting or curve construction. |
weights
|
Optional unquoted expression or numeric vector of regression weights. |
conf.level
|
Confidence level for interval estimates. |
p.adjust.method
|
P-value adjustment method passed to |
output
|
One of |
…
|
Additional arguments are currently not supported. Unknown arguments are rejected early so misspelled formal arguments produce a direct error. |
Details
With model = “auto”, the function chooses:
-
“crr”whentimeandcompetingare supplied -
“cox”whentimeis supplied withoutcompeting -
“glm”for factor outcomes -
“lm”otherwise
For survival-style analyses, outcomes still names the endpoint being modeled. The endpoint is represented as an event indicator: TRUE/1 when the endpoint occurred and FALSE/0 otherwise. time gives the observed time or age at endpoint, competing event, or censoring. competing gives a separate indicator for an event that prevents the endpoint from being observed later. For example, in an age-at-dementia-onset analysis with death before dementia as a competing risk, use:
| Observation status |
outcomes = dementia
|
competing = death_without_dementia
|
time = age_observed
|
| Dementia before death |
1
|
0
|
age at dementia onset |
| Died non-demented |
0
|
1
|
age at death |
| Alive and non-demented at last visit |
0
|
0
|
age at last dementia-free observation |
This convention keeps outcomes aligned with linear and logistic models: it always identifies the endpoint of interest. Survival families add the observation time and optional competing-event indicator needed to model that endpoint correctly.
Predictors can be PRSs or any other observation-side variables resolvable by data$fetch(). When a requested predictor matches a model name in data$mod_names, it is read from the requested PRS score layer; otherwise it is resolved through data$fetch().
Survival support currently covers standard right-censored workflows. The following are not supported:
-
multi-state outcomes
-
counting-process / start-stop inputs
-
interval-censored outcomes
-
artifact merging during meta-analysis
Regression summary rows use a stable schema so downstream filtering, plotting, and meta-analysis can operate on the returned summary table. Core columns are:
| Column | Meaning |
family
|
Regression family used for the fit. |
outcome
|
Resolved outcome variable name. |
predictor
|
Resolved tested predictor name. |
term
|
Model term represented by the row. |
term.type
|
Row type: main, interaction, or omnibus.
|
stratum
|
Observed analysis stratum, or “all”.
|
estimate, se, lower, upper
|
Estimate, standard error, and confidence interval on the family-specific effect scale. |
statistic, p.value, adj.p.value
|
Inferential statistic and raw/adjusted p-values. |
n
|
Effective sample size used for the fit. |
formula
|
Model formula or family-specific symbolic specification used for the fit. |
fit.id
|
Stable fit identifier shared with artifacts and diagnostics. |
Additional columns are added where meaningful, including interaction, effect.scale, test, n.cases, n.controls, n.events, and n.competing.
term.type is especially important for downstream use:
-
mainidentifies a main-effect coefficient row -
interactionidentifies an interaction coefficient row -
omnibusidentifies a grouped test, such as the Kaplan-Meier log-rank row
Artifacts attached through slotArtifacts() hold the multi-row structures that do not belong in the inferential summary table, such as prediction grids, survival curves, risk tables, and profile tables. For survival families, the stored curves are analysis-loyal: Kaplan-Meier contributes the observed grouped curves requested by the grouped predictor, while Cox and competing-risk fits contribute adjusted curves from prediction profiles built from the same fitted model.
Value
-
“summary”returns aPolyGeniusAssociationsobject. -
“models”returns adata.framewith one row per attempted fit and list-columns for fitted objects and diagnostics.
See Also
associate, PolyGeniusData, PolyGeniusAssociations
Examples
## Not run:
# Direct binary endpoint association.
assoc <- associate$regression(
data,
outcomes = dementia,
predictors = PRS_AD,
covariates = c(age, sex, PC1, PC2)
)
# Age-at-dementia-onset with death before dementia as a competing risk.
# dementia is 1 only for dementia onset; death_without_dementia is 1 only
# for people who died non-demented; age_observed is age at dementia onset,
# death, or last dementia-free observation.
ad_onset <- associate$regression(
data,
outcomes = dementia,
predictors = PRS_AD,
time = age_observed,
competing = death_without_dementia,
covariates = c(sex, PC1, PC2),
model = "crr"
)
## End(Not run)