Survival Associations

Survival questions include both association with event timing and group comparisons of survival curves.

Use this page when the endpoint has an event indicator and follow-up time. The core PolyGenius convention is that outcomes names the biological or clinical endpoint being modeled. For survival models, that endpoint is coded as an event indicator. time is the observed age or follow-up time at endpoint, competing event, or censoring. competing is only used when a second event can prevent the endpoint from being observed later.

Cox Regression

Use Cox regression when the question is:

Is X associated with the hazard of an event?

cox.assoc <- associate$regression(
  data,
  outcomes = demented,
  time = age.observed,
  predictors = PRS,
  covariates = c(sex, PCA)
)

This corresponds to:

Surv(age.observed, demented) ~ PRS + sex + PCA

The summary table stores estimates on the log-hazard scale. Forest plots display the corresponding hazard-ratio interpretation.

visualize$associations$forest(cox.assoc)
visualize$associations$survival(cox.assoc)

The survival plot for Cox results shows analysis-level curves derived from the analysis-ready data used for each fit. If the analysis was split with split.by, each stratum contributes its own curve artifact.

Competing-Risk Regression

Use Fine-Gray competing-risk regression when a second event can prevent the event of interest:

crr.assoc <- associate$regression(
  data,
  outcomes = dementia,
  time = age.observed,
  competing = non.dementia.death,
  predictors = PRS,
  covariates = c(sex, PCA),
  model = "crr"
)

Here outcomes = dementia is the event of interest, not the age-at-onset column. The age-at-onset information belongs in time, together with the corresponding age for competing events and censoring.

For an age-at-dementia-onset analysis with death before dementia as the competing risk, code observations as follows:

Observation status dementia non.dementia.death age.observed
Developed dementia, then may have died later 1 0 age at dementia onset
Died without dementia 0 1 age at death
Alive and non-demented at last visit 0 0 age at last dementia-free observation

This is the PolyGenius translation of the Fine-Gray status coding: the endpoint of interest is event 1, the competing event is event 2, and non-events are censored at their observed age.

The summary table stores estimates on the log-subdistribution-hazard scale. Artifacts include cumulative-incidence curves and group summaries where available.

visualize$associations$forest(crr.assoc)
visualize$associations$survival(crr.assoc)

Kaplan-Meier Group Comparisons

Use Kaplan-Meier when the primary question is:

Do survival curves differ across groups?

Create meaningful groups first, then use them as the predictor:

data$obs$prs.tertile <- cut(
  data$scores$X[, 1],
  breaks = unique(quantile(data$scores$X[, 1], c(0, .33, .66, 1), na.rm = TRUE)),
  labels = c("Low", "Mid", "High"),
  include.lowest = TRUE,
  ordered_result = TRUE
)

km.assoc <- associate$regression(
  data,
  outcomes = demented,
  time = age.observed,
  predictors = prs.tertile,
  model = "km"
)

The summary table contains an omnibus log-rank row with term.type = "omnibus". Group-specific curves, risk tables, and median survival summaries live in artifacts.

visualize$associations$survival(km.assoc, show.event.table = TRUE)

Kaplan-Meier is unadjusted. If the question is adjusted group differences, use Cox regression with the grouped variable as the predictor:

tertile.cox <- associate$regression(
  data,
  outcomes = demented,
  time = age.observed,
  predictors = prs.tertile,
  covariates = c(sex, PCA)
)

This asks whether Mid and High PRS tertiles differ from the reference tertile in adjusted hazard.

Group Comparison Interface

The comparison workflow is documented as the question-first interface for group and heterogeneity tests:

tertile.km <- associate$compare(
  data,
  outcome = demented,
  time = age.observed,
  by = prs.tertile,
  model = "km",
  type = "group"
)

tertile.cox <- associate$compare(
  data,
  outcome = demented,
  time = age.observed,
  by = prs.tertile,
  covariates = c(sex, PCA),
  model = "cox",
  type = "group"
)

The first asks for a log-rank style curve comparison. The second asks for an adjusted Cox group comparison.

See Association comparisons.

Returned Object

Survival workflows return PolyGeniusAssociations objects. The summary schema and added artifacts depend on whether the analysis is Cox, Fine-Gray, or Kaplan-Meier.

cox: Cox Regression

Column Meaning
family "cox".
outcome, time, predictor Event indicator, follow-up time, and tested predictor as represented in formula.
term, term.type Cox coefficient row, usually main.
estimate Cox coefficient on the log-hazard scale.
se, lower, upper Standard error and confidence interval on the log-hazard scale.
statistic, p.value, adj.p.value Wald/z statistic and p-values.
n, n.events Analysis sample size and number of events.
effect.scale "log.hazard"; forest plots display hazard ratios.
formula, fit.id Resolved survival formula and artifact/diagnostic key.

Added Artifacts

Artifact Meaning
curves Analysis-level survival curve for the fitted Cox analysis.
risk.table Numbers at risk, events, and censoring over time.
group.summary Analysis-level records, events, and median survival summaries where available.
profile.table Analysis-ready observations with time, event, predictor, and linear predictor.

crr: Competing-Risk Regression

Column Meaning
family "crr".
outcome, predictor Endpoint of interest and tested predictor as represented in formula. For Fine-Gray models, outcome is the event-of-interest indicator supplied through outcomes.
term, term.type Fine-Gray coefficient row, usually main.
estimate Coefficient on the log-subdistribution-hazard scale.
se, lower, upper Standard error and confidence interval on the same scale.
statistic, p.value, adj.p.value Wald/z statistic and p-values.
n, n.events, n.competing Analysis sample size, endpoint-event count, and competing-event count.
effect.scale "log.subdistribution.hazard"; forest plots display subdistribution hazard ratios.
formula, fit.id Resolved Fine-Gray specification and artifact/diagnostic key.

Added Artifacts

Artifact Meaning
curves Analysis-level cumulative-incidence curve for the endpoint/event of interest.
group.summary Analysis-level records, endpoint-event counts, and competing-event counts.
profile.table Analysis-ready observations with time, event, competing status, and predictor.

km: Kaplan-Meier Curves

Column Meaning
family "km".
outcome, time, predictor Event indicator, follow-up time, and grouped predictor.
term, term.type Grouped comparison row with term.type = "omnibus".
estimate, se, lower, upper Usually NA; the log-rank statistic is the inferential result.
statistic, p.value, adj.p.value Log-rank chi-square statistic and p-values.
n, n.events Analysis sample size and number of events.
effect.scale "median.time" for group summaries and survival plotting.
formula, fit.id Resolved Kaplan-Meier formula and artifact/diagnostic key.

Added Artifacts

Artifact Meaning
curves Survival curve steps for each group level.
risk.table Numbers at risk, events, and censoring over time by group.
group.summary Group-level records, events, median survival, and confidence intervals.
profile.table Analysis-ready observations with time, event, and group level.

Current Scope

Current survival regression support covers standard right-censored workflows. The following are outside the current regression surface:

  • multi-state outcomes;
  • counting-process or start-stop inputs;
  • interval-censored outcomes;
  • survival interactions in associate$regression();
  • artifact pooling during meta-analysis.

Formal survival heterogeneity and contrast questions are treated as comparison questions rather than as ordinary split analyses.