Stratification and Heterogeneity
Subgroup analyses are useful, but two different questions are often confused:
- What is the association within each subgroup?
- Is the association significantly different between subgroups?
PolyGenius treats these as different workflows.
Stratified Reporting
Use split.by when you want separate association estimates within each observed stratum:
stratified <- associate$regression(
data,
outcome = demented,
predictors = PRS,
covariates = c(age, PCA),
split.by = sex
)This fits one model among males and another model among females. The returned summary table stores the subgroup in the stratum column.
visualize$associations$forest(stratified)Use this when the goal is descriptive subgroup reporting:
- show the association in each sex;
- report associations within APOE4 carrier groups;
- inspect whether a signal is consistent across cohorts or ancestry groups;
- make split-specific survival curves from the stored artifacts.
What split.by Does Not Do
split.by does not formally compare the subgroup estimates. It should not be interpreted as evidence that effects differ just because one subgroup has a small p-value and another subgroup does not.
This is not a valid heterogeneity test:
# Descriptive only:
associate$regression(
data,
outcome = demented,
predictors = PRS,
split.by = sex
)The correct question is:
Is the PRS effect different by sex?
That is a heterogeneity or interaction question.
Heterogeneity In Classical Models
For linear and logistic regression, interaction terms can be fit directly:
interaction.assoc <- associate$regression(
data,
outcome = demented,
predictors = PRS,
covariates = c(age, PCA),
interactions = sex
)Conceptually:
demented ~ PRS * sex + age + PCARows with term.type = "interaction" describe interaction coefficients. For a two-level modifier, this row is the formal test that the predictor effect differs between the non-reference and reference group.
For multi-level modifiers, a full heterogeneity answer is often an omnibus test across several interaction coefficients. That belongs naturally in the comparison workflow.
Heterogeneity As A Comparison Question
The question-first comparison interface is:
heterogeneity <- associate$compare(
data,
outcome = demented,
predictors = PRS,
covariates = c(age, PCA),
by = sex,
type = "heterogeneity"
)This means:
Test whether the association between
PRSanddementeddiffers bysex.
The comparison fit should use the statistically appropriate pooled model rather than comparing subgroup p-values.
Use this pattern for:
- PRS effect differs by sex;
- PRS effect differs by APOE4 status;
- score association differs across ancestry groups;
- biomarker association differs across diagnostic groups.
Splitting A Comparison
split.by can still be useful inside a comparison workflow when the goal is to repeat a comparison within another grouping variable:
heterogeneity.by.apoe <- associate$compare(
data,
outcome = demented,
predictors = PRS,
by = sex,
split.by = APOE4.status,
type = "heterogeneity"
)This means:
Within each APOE4 stratum, test whether the PRS association differs by sex.
Tertiles: Predictor, Group, Or Split?
PRS tertiles can be used in different ways depending on the question.
| Question | Better specification |
|---|---|
| Do PRS tertile groups differ in outcome? | use prs.tertile as the predictor or by group |
| Do PRS tertile survival curves differ? | model = "km" or type = "group" with by = prs.tertile |
| Is another effect different within PRS tertiles? | split.by = prs.tertile for descriptive estimates, or heterogeneity comparison |
| Is continuous PRS associated with outcome? | use continuous PRS as predictors |
Avoid splitting by a variable derived from the predictor and then interpreting the continuous predictor effect inside each split without a clear scientific reason. Conditioning on the predictor can make the estimate hard to interpret.
Returned Object
Stratified regression results use the same family-specific regression schemas as ordinary association results, with the subgroup stored in stratum. Heterogeneity comparisons use the comparison schema.
lm: Stratified Linear Model
| Column | Meaning |
|---|---|
family |
"lm". |
outcome, predictor |
Resolved outcome and tested predictor. |
term, term.type |
Coefficient row and row type, usually main or interaction. |
estimate |
Linear-regression beta on the identity scale. |
se, lower, upper |
Standard error and confidence interval for beta. |
statistic, p.value, adj.p.value |
Wald/t-test statistic and p-values. |
n |
Complete-case sample size used by the fit. |
effect.scale |
"identity". |
formula, fit.id |
Resolved model formula and artifact/diagnostic key. |
Added Artifacts
| Artifact | Meaning |
|---|---|
prediction.grid |
Fitted outcome values across a predictor grid with covariates held at reference values. |
profile.table |
Analysis-ready observations with predictor, outcome, fitted value, and residual. |
glm: Stratified Logistic Model
| Column | Meaning |
|---|---|
family |
"glm". |
outcome, predictor |
Resolved binary outcome and tested predictor. |
term, term.type |
Coefficient row and row type, usually main or interaction. |
estimate |
Logistic-regression coefficient on the log-odds scale. |
se, lower, upper |
Standard error and confidence interval on the log-odds scale. |
statistic, p.value, adj.p.value |
Wald/z statistic and p-values. |
n, n.cases, n.controls |
Complete-case sample size and binary outcome counts. |
effect.scale |
"log.odds"; forest plots display odds ratios. |
formula, fit.id |
Resolved model formula and artifact/diagnostic key. |
Added Artifacts
| Artifact | Meaning |
|---|---|
prediction.grid |
Predicted probabilities across a predictor grid with covariates held at reference values. |
profile.table |
Analysis-ready observations with predictor, binary outcome, and fitted probability. |
comparison: Heterogeneity Test
| Column | Meaning |
|---|---|
comparison.type |
Planned comparison type: heterogeneity, group, contrast, or nested. |
outcome, predictor, by |
Outcome, focal predictor where applicable, and comparison-defining variable. |
contrast |
Pairwise or named contrast label where applicable. |
family / model |
Model family used for the comparison. |
estimate, se, lower, upper |
Effect estimate and uncertainty where the comparison has a coefficient-like scale. |
statistic, p.value, adj.p.value |
Primary comparison test result. |
n, n.events, n.competing |
Sample-size and event counts where relevant. |
fit.id |
Artifact/diagnostic key. |
Added Artifacts
Comparison artifacts depend on the comparison type. Survival group comparisons reuse the survival-family artifacts (curves, risk.table, group.summary). Contrast and nested-model comparisons may attach model diagnostics or contrast tables when those details do not belong in the one-row summary table.
Plotting
Use forest plots for subgroup estimates:
visualize$associations$forest(stratified)For survival results, split-specific curves are read from the survival artifacts:
visualize$associations$survival(stratified.survival)For formal heterogeneity comparisons, use comparison-oriented forest or heatmap views once the comparison rows are returned.