Direct Regression Associations

Use associate$regression() when the question is:

Is X associated with Y, possibly after adjusting for Z?

where X is the focal predictor, Y is the outcome, and Z is the set of covariates. Note that while X refers by default to the computed PRS values it can in fact be given any variable present in the PolyGeniusData object. In PolyGenius, outcomes means the clinical or biological endpoint being modeled. For linear and logistic models that endpoint is the response itself. For survival models, the endpoint is represented by an event indicator while time supplies the observed time or age.

Basic Pattern

assoc <- associate$regression(
  data,
  outcomes = demented,
  predictors = PRS,
  covariates = c(age, sex, PCA)
)

This fits an association model between the resolved predictor and outcome. The covariates are included as adjustment variables, not as focal tested terms. If you wish to associate the predictor against multiple outcomes provide these as unquoted expressions as follows.

assoc <- associate$regression(
  data,
  outcomes = c(demented, log(amyloid.load)),
  predictors = PRS,
  covariates = c(age, sex, PCA)
)

When predictors is omitted, the default is everything(), which expands to all PRS models in the selected score layer:

assoc <- associate$regression(
  data,
  outcomes = demented,
  covariates = c(age, sex, PCA)
)

Use explicit predictors when the scientific question targets a particular score or observation-side variable:

assoc <- associate$regression(
  data,
  outcomes = c(demented, braaksc, ceradsc),
  predictors = c(PRS.AD, PRS.tau),
  covariates = c(age.death, sex, PCA)
)

The associate$regression function fits a single model for every pair of outcome and predictor, returning a single unified PolyGeniusAssociation object containing the results across all associations (see below). The function also corrects for multiple hypothesis testing, grouping all tests of a certain outcome variable.

Model Family Selection

With model = "auto", the function chooses a family from the input:

Outcome/input	Family	Native estimate
continuous outcome	`lm`	beta
binary/logical/factor outcome	`glm`	log-odds
`outcomes` plus `time`	`cox`	log-hazard
`outcomes` plus `time` plus `competing`	`crr`	log-subdistribution hazard

You can set the family explicitly when needed:

associate$regression(data, outcomes = bmi, predictors = PRS_BMI, model = "lm")
associate$regression(data, outcomes = case, predictors = PRS_CAD, model = "glm")

Survival families are covered in Survival associations.

Returned Object

The returned object is a PolyGeniusAssociations table with one or more inferential rows. A simple single-predictor association usually produces one main-effect row. The exact schema depends on the selected family.

lm: Continuous Outcome

Column	Meaning
`family`	`"lm"`.
`outcome`, `predictor`	Resolved outcome and tested predictor.
`term`, `term.type`	Coefficient row and row type, usually `main` or `interaction`.
`estimate`	Linear-regression beta on the identity scale.
`se`, `lower`, `upper`	Standard error and confidence interval for beta.
`statistic`, `p.value`, `adj.p.value`	Wald/t-test statistic and p-values.
`n`	Complete-case sample size used by the fit.
`effect.scale`	`"identity"`.
`formula`, `fit.id`	Resolved model formula and artifact/diagnostic key.

Added Artifacts

Artifact	Meaning
`prediction.grid`	Fitted outcome values across a predictor grid with covariates held at reference values.
`profile.table`	Analysis-ready observations with predictor, outcome, fitted value, and residual.

glm: Discrete Outcome

Column	Meaning
`family`	`"glm"`.
`outcome`, `predictor`	Resolved binary outcome and tested predictor.
`term`, `term.type`	Coefficient row and row type, usually `main` or `interaction`.
`estimate`	Logistic-regression coefficient on the log-odds scale.
`se`, `lower`, `upper`	Standard error and confidence interval on the log-odds scale.
`statistic`, `p.value`, `adj.p.value`	Wald/z statistic and p-values.
`n`, `n.cases`, `n.controls`	Complete-case sample size and binary outcome counts.
`effect.scale`	`"log.odds"`; forest plots display odds ratios.
`formula`, `fit.id`	Resolved model formula and artifact/diagnostic key.

Added Artifacts

Artifact	Meaning
`prediction.grid`	Predicted probabilities across a predictor grid with covariates held at reference values.
`profile.table`	Analysis-ready observations with predictor, binary outcome, and fitted probability.

Multiple Outcomes and Predictors

The function expands over outcomes and predictors:

assoc <- associate$regression(
  data,
  outcomes = c(demented, cognition, braaksc),
  predictors = c(PRS_AD, PRS_resilience),
  covariates = c(age, sex, PCA)
)

This returns one result object containing all requested fits. Multiple-testing adjustment is applied within result groups so adj.p.value can be used for screening.

Interactions In Classical Models

For lm and glm, interaction terms can be included directly:

assoc <- associate$regression(
  data,
  outcomes = demented,
  predictors = PRS,
  covariates = c(age, PCA),
  interactions = sex
)

This corresponds conceptually to:

demented ~ PRS * sex + age + PCA

Rows with term.type = "interaction" represent interaction coefficients. Use these rows when the question is whether a predictor effect differs by another variable in a linear or logistic model.

For a fuller question-driven treatment of subgroup reporting and formal heterogeneity tests, see Stratification and heterogeneity and Association comparisons.

Plotting

Forest plots are the default view for coefficient-like association rows:

visualize$associations$forest(assoc)

Use heatmaps when the object contains many outcomes, predictors, or cohorts:

visualize$associations$heatmap(assoc)

The plot functions read the summary schema directly. They use columns such as family, effect.scale, estimate, lower, upper, p.value, and adj.p.value to choose labels and transformations.

When This Is Not The Right Question

Do not use a plain direct association when the question is really:

Question	Better workflow
What is the association separately within groups?	`associate$regression(split.by = ...)`
Is the association different between groups?	`associate$compare(type = "heterogeneity")`
Do groups differ in survival curves?	`model = "km"` or `associate$compare(type = "group")`
Does the score improve prediction?	`evaluate$incremental()` or `evaluate$compare()`
Is the effect mediated through another variable?	`associate$mediation()`