as.bigsnpr

Convert GenotypeInfo to bigsnpr format

Description

Converts a GenotypeInfo object to bigsnpr’s file-backed big.matrix format (.rds + .bk files). This enables use of bigsnpr-based algorithms such as LDpred2 and lassosum2 while maintaining compatibility with PolyGenius workflows.

Usage

as.bigsnpr(
  genotype.info,
  output = NULL,
  output.name = NULL,
  ind.row = NULL,
  ind.col = NULL,
  ncores = 1,
  nthreads = NULL,
  logger = NULL
)

Arguments

`genotype.info`	A `GenotypeInfo` object to convert.
`output.name`	Character; base name for output files (without extension). Default uses `genotype.info$name`.
`ind.row`	Integer vector or `NULL`; optional row indices (samples) to include in the conversion. If `NULL`, all samples in `genotype.info$samples` are used. This parameter is passed directly to bigsnpr’s `snp_readBed2()`.
`ind.col`	Integer vector or `NULL`; optional column indices (variants) to include. If `NULL`, all variants are included.
`ncores`	Integer; number of cores for parallel processing during the bigsnpr conversion step. Default is 1 (no parallelization).
`nthreads`	Integer or `NULL`; PLINK thread count used for preprocessing steps (format conversion, merging, and optional sample filtering). When `NULL`, no `–threads` flag is passed to PLINK.
`logger`	Optional logger object. If `NULL`, a fresh logger is created.

Details

The conversion process:

Format conversion: If not already in “bfile” format, uses ⁠$tidy()⁠ to convert to PLINK .bed/.bim/.fam format.
File merging: If multiple genotype files exist, uses ⁠$merge()⁠ to combine them into a single fileset.
Sample subsetting: Respects the ⁠$samples⁠ field of the input GenotypeInfo object, converting only the specified working sample set.
bigsnpr conversion: Calls bigsnpr’s snp_readBed2() to create the file-backed matrix representation.
Variant IDs: Uses PolyGenius standardized variant IDs (chr:pos:a0:a1 in lexicographic allele order) as marker.ID, ensuring consistent matching with other PolyGenius operations like clumping.

The resulting bigsnpr object includes:

File-backed genotype matrix (memory-mapped for efficiency)
Variant map with chromosome, position, and alleles
Sample/family information
Standardized variant identifiers

Note on dependencies: This function requires the bigsnpr package to be installed but does not load it as a hard dependency. Install with install.packages(“bigsnpr”).

Value

A named list with the following components:

obj: The bigSNP object returned by bigsnpr::snp_attach().
rds: Character; path to the .rds file.
map: Data frame with variant information: chr (character), pos (integer), a0 (reference allele), a1 (effect allele), marker.id (standardized PolyGenius ID).
fam: Data frame with sample information from the bigSNP object.
genotypes: File-backed matrix (FBM) object; direct accessor for the genotype data. Dimensions are samples × variants.
build: Character; genome build from the input GenotypeInfo.

Examples

## Not run: 
# Convert a reference panel to bigsnpr format
ref_panel <- referencePanels$get("EUR", "GRCh37")
bigsnp.obj <- as.bigsnpr(ref.panel)

# Access the genotype matrix
G <- bigsnp.obj$genotypes
dim(G)  # samples × variants

# Use with bigsnpr functions
map <- bigsnp.obj$map
corr <- bigsnpr::snp_cor(
  G,
  infos.pos = map$pos,
  size = 3000,
  ncores = 4
)

# Convert with specific output location
bigsnp.perm <- as.bigsnpr(
  ref.panel,
  output = "/path/to/permanent/storage",
  output.name = "EUR_ref_GRCh37"
)

## End(Not run)