as.bigsnpr

Convert GenotypeInfo to bigsnpr format

Description

Converts a GenotypeInfo object to bigsnpr’s file-backed big.matrix format (.rds + .bk files). This enables use of bigsnpr-based algorithms such as LDpred2 and lassosum2 while maintaining compatibility with PolyGenius workflows.

Usage

as.bigsnpr(
  genotype.info,
  output = NULL,
  output.name = NULL,
  ind.row = NULL,
  ind.col = NULL,
  ncores = 1,
  nthreads = NULL,
  logger = NULL
)

Arguments

genotype.info

A GenotypeInfo object to convert.

output.name

Character; base name for output files (without extension). Default uses genotype.info$name.

ind.row

Integer vector or NULL; optional row indices (samples) to include in the conversion. If NULL, all samples in genotype.info$samples are used. This parameter is passed directly to bigsnpr’s snp_readBed2().

ind.col

Integer vector or NULL; optional column indices (variants) to include. If NULL, all variants are included.

ncores

Integer; number of cores for parallel processing during the bigsnpr conversion step. Default is 1 (no parallelization).

nthreads

Integer or NULL; PLINK thread count used for preprocessing steps (format conversion, merging, and optional sample filtering). When NULL, no –threads flag is passed to PLINK.

logger

Optional logger object. If NULL, a fresh logger is created.

Details

The conversion process:

  1. Format conversion: If not already in “bfile” format, uses ⁠$tidy()⁠ to convert to PLINK .bed/.bim/.fam format.

  2. File merging: If multiple genotype files exist, uses ⁠$merge()⁠ to combine them into a single fileset.

  3. Sample subsetting: Respects the ⁠$samples⁠ field of the input GenotypeInfo object, converting only the specified working sample set.

  4. bigsnpr conversion: Calls bigsnpr’s snp_readBed2() to create the file-backed matrix representation.

  5. Variant IDs: Uses PolyGenius standardized variant IDs (chr:pos:a0:a1 in lexicographic allele order) as marker.ID, ensuring consistent matching with other PolyGenius operations like clumping.

The resulting bigsnpr object includes:

  • File-backed genotype matrix (memory-mapped for efficiency)

  • Variant map with chromosome, position, and alleles

  • Sample/family information

  • Standardized variant identifiers

Note on dependencies: This function requires the bigsnpr package to be installed but does not load it as a hard dependency. Install with install.packages(“bigsnpr”).

Value

A named list with the following components:

obj

The bigSNP object returned by bigsnpr::snp_attach().

rds

Character; path to the .rds file.

map

Data frame with variant information: chr (character), pos (integer), a0 (reference allele), a1 (effect allele), marker.id (standardized PolyGenius ID).

fam

Data frame with sample information from the bigSNP object.

genotypes

File-backed matrix (FBM) object; direct accessor for the genotype data. Dimensions are samples × variants.

build

Character; genome build from the input GenotypeInfo.

See Also

GenotypeInfo, LDpred2Algorithm, bigsnpr::snp_readBed2()

Examples

## Not run: 
# Convert a reference panel to bigsnpr format
ref_panel <- referencePanels$get("EUR", "GRCh37")
bigsnp.obj <- as.bigsnpr(ref.panel)

# Access the genotype matrix
G <- bigsnp.obj$genotypes
dim(G)  # samples × variants

# Use with bigsnpr functions
map <- bigsnp.obj$map
corr <- bigsnpr::snp_cor(
  G,
  infos.pos = map$pos,
  size = 3000,
  ncores = 4
)

# Convert with specific output location
bigsnp.perm <- as.bigsnpr(
  ref.panel,
  output = "/path/to/permanent/storage",
  output.name = "EUR_ref_GRCh37"
)

## End(Not run)