Skip to main content
Ctrl+K
Logo image Logo image

Site Navigation

  • Getting Started
  • User Guide
  • Reading VCF
  • Examples
  • API reference
  • How do I …
  • Contributing to sgkit
  • About
  • News
  • Changelog

Site Navigation

  • Getting Started
  • User Guide
  • Reading VCF
  • Examples
  • API reference
  • How do I …
  • Contributing to sgkit
  • About
  • News
  • Changelog

Section Navigation

  • sgkit.io.bgen.bgen_to_zarr
  • sgkit.io.bgen.read_bgen
  • sgkit.io.bgen.rechunk_bgen
  • sgkit.io.plink.plink_to_zarr
  • sgkit.io.plink.read_plink
  • sgkit.io.plink.write_plink
  • sgkit.io.plink.zarr_to_plink
  • sgkit.io.vcf.read_vcf
  • sgkit.io.vcf.vcf_to_zarr
  • sgkit.io.vcf.partition_into_regions
  • sgkit.io.vcf.vcf_to_zarrs
  • sgkit.io.vcf.concat_zarrs
  • sgkit.io.vcf.zarr_array_sizes
  • sgkit.read_scikit_allel_vcfzarr
  • sgkit.io.vcf.write_vcf
  • sgkit.io.vcf.zarr_to_vcf
  • sgkit.load_dataset
  • sgkit.save_dataset
  • sgkit.call_allele_frequencies
  • sgkit.cohort_allele_frequencies
  • sgkit.count_call_alleles
  • sgkit.count_cohort_alleles
  • sgkit.count_variant_alleles
  • sgkit.count_variant_genotypes
  • sgkit.divergence
  • sgkit.diversity
  • sgkit.Fst
  • sgkit.Garud_H
  • sgkit.genee
  • sgkit.genomic_relationship
  • sgkit.gwas_linear_regression
  • sgkit.hardy_weinberg_test
  • sgkit.hybrid_inverse_relationship
  • sgkit.hybrid_relationship
  • sgkit.identity_by_state
  • sgkit.individual_heterozygosity
  • sgkit.ld_matrix
  • sgkit.ld_prune
  • sgkit.maximal_independent_set
  • sgkit.observed_heterozygosity
  • sgkit.pbs
  • sgkit.pedigree_inbreeding
  • sgkit.pedigree_inverse_kinship
  • sgkit.pedigree_kinship
  • sgkit.pc_relate
  • sgkit.regenie
  • sgkit.sample_stats
  • sgkit.Tajimas_D
  • sgkit.variant_stats
  • sgkit.Weir_Goudet_beta
  • sgkit.convert_call_to_index
  • sgkit.convert_probability_to_call
  • sgkit.display_genotypes
  • sgkit.filter_partial_calls
  • sgkit.infer_call_ploidy
  • sgkit.infer_sample_ploidy
  • sgkit.infer_variant_ploidy
  • sgkit.invert_relationship_matrix
  • sgkit.parent_indices
  • sgkit.simulate_genotype_call_dataset
  • sgkit.window_by_genome
  • sgkit.window_by_interval
  • sgkit.window_by_position
  • sgkit.window_by_variant
  • sgkit.variables.call_allele_count_spec
  • sgkit.variables.call_allele_frequency_spec
  • sgkit.variables.call_dosage_spec
  • sgkit.variables.call_dosage_mask_spec
  • sgkit.variables.call_genotype_complete_spec
  • sgkit.variables.call_genotype_complete_mask_spec
  • sgkit.variables.call_genotype_spec
  • sgkit.variables.call_genotype_mask_spec
  • sgkit.variables.call_genotype_fill_spec
  • sgkit.variables.call_genotype_phased_spec
  • sgkit.variables.call_genotype_probability_spec
  • sgkit.variables.call_genotype_probability_mask_spec
  • sgkit.variables.call_genotype_index_spec
  • sgkit.variables.call_genotype_index_mask_spec
  • sgkit.variables.call_heterozygosity_spec
  • sgkit.variables.call_ploidy_spec
  • sgkit.variables.cohort_allele_count_spec
  • sgkit.variables.cohort_allele_frequency_spec
  • sgkit.variables.covariates_spec
  • sgkit.variables.interval_contig_name_spec
  • sgkit.variables.interval_start_spec
  • sgkit.variables.interval_stop_spec
  • sgkit.variables.ld_prune_index_to_drop_spec
  • sgkit.variables.regenie_base_prediction_spec
  • sgkit.variables.regenie_loco_prediction_spec
  • sgkit.variables.regenie_meta_prediction_spec
  • sgkit.variables.parent_spec
  • sgkit.variables.parent_id_spec
  • sgkit.variables.pc_relate_phi_spec
  • sgkit.variables.sample_call_rate_spec
  • sgkit.variables.sample_cohort_spec
  • sgkit.variables.sample_id_spec
  • sgkit.variables.sample_n_called_spec
  • sgkit.variables.sample_n_het_spec
  • sgkit.variables.sample_n_hom_alt_spec
  • sgkit.variables.sample_n_hom_ref_spec
  • sgkit.variables.sample_n_non_ref_spec
  • sgkit.variables.sample_pca_component_spec
  • sgkit.variables.sample_pca_explained_variance_spec
  • sgkit.variables.sample_pca_explained_variance_ratio_spec
  • sgkit.variables.sample_pca_loading_spec
  • sgkit.variables.sample_pca_projection_spec
  • sgkit.variables.sample_ploidy_spec
  • sgkit.variables.stat_divergence_spec
  • sgkit.variables.stat_diversity_spec
  • sgkit.variables.stat_Fst_spec
  • sgkit.variables.stat_Garud_h1_spec
  • sgkit.variables.stat_Garud_h12_spec
  • sgkit.variables.stat_Garud_h123_spec
  • sgkit.variables.stat_Garud_h2_h1_spec
  • sgkit.variables.stat_genomic_kinship_spec
  • sgkit.variables.stat_genomic_relationship_spec
  • sgkit.variables.stat_Hamilton_Kerr_lambda_spec
  • sgkit.variables.stat_Hamilton_Kerr_tau_spec
  • sgkit.variables.stat_hybrid_relationship_spec
  • sgkit.variables.stat_hybrid_inverse_relationship_spec
  • sgkit.variables.stat_identity_by_state_spec
  • sgkit.variables.stat_inverse_relationship_spec
  • sgkit.variables.stat_observed_heterozygosity_spec
  • sgkit.variables.stat_pbs_spec
  • sgkit.variables.stat_pedigree_inbreeding_spec
  • sgkit.variables.stat_pedigree_inverse_kinship_spec
  • sgkit.variables.stat_pedigree_inverse_relationship_spec
  • sgkit.variables.stat_pedigree_kinship_spec
  • sgkit.variables.stat_pedigree_relationship_spec
  • sgkit.variables.stat_Tajimas_D_spec
  • sgkit.variables.stat_Weir_Goudet_beta_spec
  • sgkit.variables.traits_spec
  • sgkit.variables.variant_allele_spec
  • sgkit.variables.variant_allele_count_spec
  • sgkit.variables.variant_allele_frequency_spec
  • sgkit.variables.variant_allele_total_spec
  • sgkit.variables.variant_genotype_count_spec
  • sgkit.variables.variant_linreg_beta_spec
  • sgkit.variables.variant_call_rate_spec
  • sgkit.variables.variant_contig_spec
  • sgkit.variables.variant_hwe_p_value_spec
  • sgkit.variables.variant_id_spec
  • sgkit.variables.variant_n_called_spec
  • sgkit.variables.variant_n_het_spec
  • sgkit.variables.variant_n_hom_alt_spec
  • sgkit.variables.variant_n_hom_ref_spec
  • sgkit.variables.variant_n_non_ref_spec
  • sgkit.variables.variant_linreg_p_value_spec
  • sgkit.variables.variant_ploidy_spec
  • sgkit.variables.variant_position_spec
  • sgkit.variables.variant_score_spec
  • sgkit.variables.variant_linreg_t_value_spec
  • sgkit.variables.window_contig_spec
  • sgkit.variables.window_start_spec
  • sgkit.variables.window_stop_spec

sgkit.io.plink.plink_to_zarr#

sgkit.io.plink.plink_to_zarr(*, path=None, bed_path=None, bim_path=None, fam_path=None, output, chunks='auto', fam_sep=' ', bim_sep='\\t', bim_int_contig=False, lock=False, persist=True, storage_options=None)#

Convert a PLINK file to a Zarr on-disk store.

A convenience for read_plink() followed by sgkit.save_dataset().

Refer to read_plink() for details and limitations.

Parameters
path : str | Path | NoneUnion[str, Path, None] (default: None)

Path to PLINK file set. This should not include a suffix, i.e. if the files are at data.{bed,fam,bim} then only ‘data’ should be provided (suffixes are added internally). Either this path must be provided or all 3 of bed_path, bim_path and fam_path.

bed_path : str | Path | NoneUnion[str, Path, None] (default: None)

Path to PLINK bed file. This should be a full path including the .bed extension and cannot be specified in conjunction with path.

bim_path : str | Path | NoneUnion[str, Path, None] (default: None)

Path to PLINK bim file. This should be a full path including the .bim extension and cannot be specified in conjunction with path.

fam_path : str | Path | NoneUnion[str, Path, None] (default: None)

Path to PLINK fam file. This should be a full path including the .fam extension and cannot be specified in conjunction with path.

output : str | Path | MutableMapping[str, bytes]Union[str, Path, MutableMapping[str, bytes]]

Zarr store or path to directory in file system.

chunks : str | int | tupleUnion[str, int, tuple] (default: 'auto')

Chunk size for genotype (i.e. .bed) data, by default “auto”

fam_sep : str (default: ' ')

Delimiter for .fam file, by default ” “

bim_sep : str (default: '\\t')

Delimiter for .bim file, by default ” “

bim_int_contig : bool (default: False)

Whether or not the contig/chromosome name in the .bim file should be interpreted as an integer, by default False. If False, then the variant/contig field in the resulting dataset will contain the indexes of corresponding strings encountered in the first .bim field.

lock : bool (default: False)

Whether or not to synchronize concurrent reads of .bed file blocks, by default False. This is passed through to [dask.array.from_array](https://docs.dask.org/en/latest/array-api.html#dask.array.from_array).

persist : bool (default: True)

Whether or not to persist .fam and .bim information in memory, by default True. This is an important performance consideration as the plain text files for this data will be read multiple times when False. This can lead to load times that are upwards of 10x slower.

storage_options : {str: str} | NoneOptional[Dict[str, str]] (default: None)

Any additional parameters for the storage backend (see fsspec.open).

Return type

None

previous

sgkit.io.bgen.rechunk_bgen

next

sgkit.io.plink.read_plink

Show Source

© Copyright 2020, sgkit developers.

Built with the PyData Sphinx Theme 0.12.0.

Created using Sphinx 4.2.0.