compute$population.structure

Compute Population Structure via PCA

Description

Computes samples’ population structure by performing PCA. PCA is either computed in-sample (samples’ own PC space, or by projecting onto a known reference panel. This function uses the GenotypeInfo$samples.PCA or GenotypeInfo$samples.PCAProject of data$genotype

Usage

`compute$population.structure`(data, npcs = 5, variants = NULL, reference.panel = NULL)

Arguments

`data`	A `PolyGeniusData` object.
`npcs`	An integer scalar for the number of principal components to compute (or project).
`variants`	A specification of which genetic markers to use when summarizing your samples’ DNA into a "sample × marker" dosage matrix for PCA. These markers define the signals (variants) that capture the population structure. You can provide them as: a path to a simple text file (BED1 format) listing each variant on one row (chromosome, start, end), or an in-memory data-frame with columns `chromosome`, `start`, and `end`. If `NULL` (the default), the `common20k` variant space is resolved via `workspace$catalogs$variantSpaces`.
`reference.panel`	(Optional) Name of a super-population to use as a reference panel. If `NULL`, PCA is run on your samples to produce the in-sample PC space; otherwise your samples are projected onto the reference. `referencePanels` for supported names.

Value

Invisibly returns the original data object, n_obs × npcs PCA embedding matrix under data$obsm[[key]]. The operation’s commandLog(data) stores additional misc information:add

In-sample PCA - eigenvalues corresponding PCs
Projected PCA - vector of variants that were used to produce the projection matrix