Skip to main content
Ctrl+K
sgkit  documentation - Home sgkit  documentation - Home
  • Getting Started
  • User Guide
  • Examples
  • API reference
  • How do I …
    • Contributing to sgkit
    • About
    • News
    • Changelog
  • GitHub
  • Getting Started
  • User Guide
  • Examples
  • API reference
  • How do I …
  • Contributing to sgkit
  • About
  • News
  • Changelog
  • GitHub

Section Navigation

  • sgkit.io.bgen.bgen_to_zarr
  • sgkit.io.bgen.read_bgen
  • sgkit.io.bgen.rechunk_bgen
  • sgkit.io.plink.plink_to_zarr
  • sgkit.io.plink.read_plink
  • sgkit.io.plink.write_plink
  • sgkit.io.plink.zarr_to_plink
  • sgkit.load_dataset
  • sgkit.save_dataset
  • sgkit.call_allele_frequencies
  • sgkit.cohort_allele_frequencies
  • sgkit.count_call_alleles
  • sgkit.count_cohort_alleles
  • sgkit.count_variant_alleles
  • sgkit.count_variant_genotypes
  • sgkit.variant_stats
  • sgkit.sample_stats
  • sgkit.divergence
  • sgkit.diversity
  • sgkit.Fst
  • sgkit.Garud_H
  • sgkit.individual_heterozygosity
  • sgkit.observed_heterozygosity
  • sgkit.pbs
  • sgkit.Tajimas_D
  • sgkit.Weir_Goudet_beta
  • sgkit.genomic_relationship
  • sgkit.pc_relate
  • sgkit.pedigree_contribution
  • sgkit.pedigree_inbreeding
  • sgkit.pedigree_inverse_kinship
  • sgkit.pedigree_kinship
  • sgkit.hybrid_inverse_relationship
  • sgkit.hybrid_relationship
  • sgkit.identity_by_state
  • sgkit.ld_matrix
  • sgkit.ld_prune
  • sgkit.gwas_linear_regression
  • sgkit.regenie
  • sgkit.hardy_weinberg_test
  • sgkit.genee
  • sgkit.maximal_independent_set
  • sgkit.pairwise_distance
  • sgkit.convert_call_to_index
  • sgkit.convert_probability_to_call
  • sgkit.display_genotypes
  • sgkit.display_pedigree
  • sgkit.filter_partial_calls
  • sgkit.infer_call_ploidy
  • sgkit.infer_sample_ploidy
  • sgkit.infer_variant_ploidy
  • sgkit.invert_relationship_matrix
  • sgkit.parent_indices
  • sgkit.pedigree_sel
  • sgkit.simulate_genedrop
  • sgkit.simulate_genotype_call_dataset
  • sgkit.window_by_genome
  • sgkit.window_by_interval
  • sgkit.window_by_position
  • sgkit.window_by_variant
  • sgkit.variables.call_allele_count_spec
  • sgkit.variables.call_allele_frequency_spec
  • sgkit.variables.call_dosage_spec
  • sgkit.variables.call_dosage_mask_spec
  • sgkit.variables.call_genotype_complete_spec
  • sgkit.variables.call_genotype_complete_mask_spec
  • sgkit.variables.call_genotype_spec
  • sgkit.variables.call_genotype_mask_spec
  • sgkit.variables.call_genotype_fill_spec
  • sgkit.variables.call_genotype_phased_spec
  • sgkit.variables.call_genotype_probability_spec
  • sgkit.variables.call_genotype_probability_mask_spec
  • sgkit.variables.call_genotype_index_spec
  • sgkit.variables.call_genotype_index_mask_spec
  • sgkit.variables.call_heterozygosity_spec
  • sgkit.variables.call_ploidy_spec
  • sgkit.variables.cohort_allele_count_spec
  • sgkit.variables.cohort_allele_frequency_spec
  • sgkit.variables.covariates_spec
  • sgkit.variables.interval_contig_name_spec
  • sgkit.variables.interval_start_spec
  • sgkit.variables.interval_stop_spec
  • sgkit.variables.ld_prune_index_to_drop_spec
  • sgkit.variables.regenie_base_prediction_spec
  • sgkit.variables.regenie_loco_prediction_spec
  • sgkit.variables.regenie_meta_prediction_spec
  • sgkit.variables.parent_spec
  • sgkit.variables.parent_id_spec
  • sgkit.variables.pc_relate_phi_spec
  • sgkit.variables.sample_call_rate_spec
  • sgkit.variables.sample_cohort_spec
  • sgkit.variables.sample_id_spec
  • sgkit.variables.sample_n_called_spec
  • sgkit.variables.sample_n_het_spec
  • sgkit.variables.sample_n_hom_alt_spec
  • sgkit.variables.sample_n_hom_ref_spec
  • sgkit.variables.sample_n_non_ref_spec
  • sgkit.variables.sample_pca_component_spec
  • sgkit.variables.sample_pca_explained_variance_spec
  • sgkit.variables.sample_pca_explained_variance_ratio_spec
  • sgkit.variables.sample_pca_loading_spec
  • sgkit.variables.sample_pca_projection_spec
  • sgkit.variables.sample_ploidy_spec
  • sgkit.variables.stat_divergence_spec
  • sgkit.variables.stat_diversity_spec
  • sgkit.variables.stat_Fst_spec
  • sgkit.variables.stat_Garud_h1_spec
  • sgkit.variables.stat_Garud_h12_spec
  • sgkit.variables.stat_Garud_h123_spec
  • sgkit.variables.stat_Garud_h2_h1_spec
  • sgkit.variables.stat_genomic_kinship_spec
  • sgkit.variables.stat_genomic_relationship_spec
  • sgkit.variables.stat_Hamilton_Kerr_lambda_spec
  • sgkit.variables.stat_Hamilton_Kerr_tau_spec
  • sgkit.variables.stat_hybrid_relationship_spec
  • sgkit.variables.stat_hybrid_inverse_relationship_spec
  • sgkit.variables.stat_identity_by_state_spec
  • sgkit.variables.stat_inverse_relationship_spec
  • sgkit.variables.stat_observed_heterozygosity_spec
  • sgkit.variables.stat_pbs_spec
  • sgkit.variables.stat_pedigree_contribution_spec
  • sgkit.variables.stat_pedigree_inbreeding_spec
  • sgkit.variables.stat_pedigree_inverse_kinship_spec
  • sgkit.variables.stat_pedigree_inverse_relationship_spec
  • sgkit.variables.stat_pedigree_kinship_spec
  • sgkit.variables.stat_pedigree_relationship_spec
  • sgkit.variables.stat_Tajimas_D_spec
  • sgkit.variables.stat_Weir_Goudet_beta_spec
  • sgkit.variables.traits_spec
  • sgkit.variables.variant_allele_spec
  • sgkit.variables.variant_allele_count_spec
  • sgkit.variables.variant_allele_frequency_spec
  • sgkit.variables.variant_allele_total_spec
  • sgkit.variables.variant_genotype_count_spec
  • sgkit.variables.variant_linreg_beta_spec
  • sgkit.variables.variant_call_rate_spec
  • sgkit.variables.variant_contig_spec
  • sgkit.variables.variant_hwe_p_value_spec
  • sgkit.variables.variant_id_spec
  • sgkit.variables.variant_n_called_spec
  • sgkit.variables.variant_n_het_spec
  • sgkit.variables.variant_n_hom_alt_spec
  • sgkit.variables.variant_n_hom_ref_spec
  • sgkit.variables.variant_n_non_ref_spec
  • sgkit.variables.variant_linreg_p_value_spec
  • sgkit.variables.variant_ploidy_spec
  • sgkit.variables.variant_position_spec
  • sgkit.variables.variant_score_spec
  • sgkit.variables.variant_linreg_t_value_spec
  • sgkit.variables.window_contig_spec
  • sgkit.variables.window_start_spec
  • sgkit.variables.window_stop_spec
  • API reference
  • sgkit.io.plink.read_plink

sgkit.io.plink.read_plink#

sgkit.io.plink.read_plink(*, path=None, bed_path=None, bim_path=None, fam_path=None, chunks='auto', fam_sep=' ', bim_sep='\\t', bim_int_contig=False, count_a1=False, lock=False, persist=True)#

Read PLINK dataset.

Loads a single PLINK dataset as dask arrays within a Dataset from bed, bim, and fam files.

Parameters:
path str | Path | NoneUnion[str, Path, None] (default: None)

Path to PLINK file set. This should not include a suffix, i.e. if the files are at data.{bed,fam,bim} then only ‘data’ should be provided (suffixes are added internally). Either this path must be provided or all 3 of bed_path, bim_path and fam_path.

bed_path str | Path | NoneUnion[str, Path, None] (default: None)

Path to PLINK bed file. This should be a full path including the .bed extension and cannot be specified in conjunction with path.

bim_path str | Path | NoneUnion[str, Path, None] (default: None)

Path to PLINK bim file. This should be a full path including the .bim extension and cannot be specified in conjunction with path.

fam_path str | Path | NoneUnion[str, Path, None] (default: None)

Path to PLINK fam file. This should be a full path including the .fam extension and cannot be specified in conjunction with path.

chunks str | int | tupleUnion[str, int, tuple] (default: 'auto')

Chunk size for genotype (i.e. .bed) data, by default “auto”

fam_sep str (default: ' ')

Delimiter for .fam file, by default “ “

bim_sep str (default: '\\t')

Delimiter for .bim file, by default “ “

bim_int_contig bool (default: False)

Whether or not the contig/chromosome name in the .bim file should be interpreted as an integer, by default False. If False, then the variant/contig field in the resulting dataset will contain the indexes of corresponding strings encountered in the first .bim field.

count_a1 bool (default: False)

Whether or not allele counts should be for A1 or A2, by default False. Note that count_a1=True is not currently supported, please open an issue if this is something you need. See https://www.cog-genomics.org/plink/1.9/data#ax_allele for more details.

lock bool (default: False)

Whether or not to synchronize concurrent reads of .bed file blocks, by default False. This is passed through to [dask.array.from_array](https://docs.dask.org/en/latest/array-api.html#dask.array.from_array).

persist bool (default: True)

Whether or not to persist .fam and .bim information in memory, by default True. This is an important performance consideration as the plain text files for this data will be read multiple times when False. This can lead to load times that are upwards of 10x slower.

Return type:

Dataset

Returns:

: A dataset containing genotypes as 3 dimensional calls along with all accompanying pedigree and variant information. The content of this dataset includes:

  • sgkit.variables.variant_id_spec (variants)

  • sgkit.variables.variant_contig_spec (variants)

  • sgkit.variables.variant_position_spec (variants)

  • sgkit.variables.variant_allele_spec (variants)

  • sgkit.variables.sample_id_spec (samples)

  • sgkit.variables.call_genotype_spec (variants, samples, ploidy)

  • sgkit.variables.call_genotype_mask_spec (variants, samples, ploidy)

The following pedigree-specific fields are also included:

  • sample_family_id: Family identifier commonly referred to as FID,

    ”.” for missing

  • sample_member_id and sample_id: Within-family identifier for sample

  • sample_paternal_id: Within-family identifier for father of sample,

    ”.” for missing

  • sample_maternal_id: Within-family identifier for mother of sample,

    ”.” for missing

  • sample_sex: Sex code equal to 1 for male, 2 for female, and -1

    for missing

  • sample_phenotype: Phenotype code equal to 1 for control, 2 for case,

    and -1 for missing

See https://www.cog-genomics.org/plink/1.9/formats#fam for more details.

Raises:

ValueError – If path and one of bed_path, bim_path or fam_path are provided.

previous

sgkit.io.plink.plink_to_zarr

next

sgkit.io.plink.write_plink

On this page
  • read_plink()

This Page

  • Show Source

© Copyright 2020, sgkit developers.

Created using Sphinx 6.2.1.

Built with the PyData Sphinx Theme 0.16.1.