compute$population.structure

Compute Population Structure via PCA

Description

Computes samples’ population structure by performing PCA. PCA is either computed in-sample (samples’ own PC space, or by projecting onto a known reference panel. This function uses the GenotypeInfo$samples.PCA or GenotypeInfo$samples.PCAProject of data$genotype

Usage

`compute$population.structure`(data, npcs = 5, variants = NULL, reference.panel = NULL)

Arguments

data

A PolyGeniusData object.

npcs

An integer scalar for the number of principal components to compute (or project).

variants

A specification of which genetic markers to use when summarizing your samples’ DNA into a "sample × marker" dosage matrix for PCA. These markers define the signals (variants) that capture the population structure. You can provide them as:

  • a path to a simple text file (BED1 format) listing each variant on one row (chromosome, start, end), or

  • an in-memory data-frame with columns chromosome, start, and end.

If NULL (the default), the common20k variant space is resolved via workspace$catalogs$variantSpaces.

reference.panel

(Optional) Name of a super-population to use as a reference panel. If NULL, PCA is run on your samples to produce the in-sample PC space; otherwise your samples are projected onto the reference. referencePanels for supported names.

Value

Invisibly returns the original data object, n_obs × npcs PCA embedding matrix under data$obsm[[key]]. The operation’s commandLog(data) stores additional misc information:add

  • In-sample PCA - eigenvalues corresponding PCs

  • Projected PCA - vector of variants that were used to produce the projection matrix