sgkit.simulate_genotype_call_dataset#

sgkit.simulate_genotype_call_dataset(n_variant, n_sample, n_ploidy=2, n_allele=2, n_contig=1, seed=0, missing_pct=None, phased=None, additional_variant_fields=None)#

Simulate genotype calls and variant/sample data.

Note that the data simulated by this function has no biological interpretation and that summary statistics or other methods applied to it will produce meaningless results. This function is primarily a convenience on generating xarray.Dataset containers so quantities of interest should be overwritten, where appropriate, within the context of a more specific application.

Parameters:
n_variant int

Number of variants to simulate

n_sample int

Number of samples to simulate

n_ploidy int (default: 2)

Number of chromosome copies in each sample

n_allele int (default: 2)

Number of alleles to simulate

n_contig int (default: 1)

optional Number of contigs to partition variants with, controlling values in variant_contig. Values will all be 0 by default when n_contig is 1.

seed int | NoneOptional[int] (default: 0)

Seed for random number generation, optional

missing_pct float | NoneOptional[float] (default: None)

The percentage of missing calls, must be within [0.0, 1.0], optional

phased bool | NoneOptional[bool] (default: None)

Whether genotypes are phased, default is unphased, optional

additional_variant_fields dict | NoneOptional[dict] (default: None)

Additional variant fields to add to the dataset as a dictionary of {field_name: field_dtype}, optional

Return type:

Dataset

Returns:

: A dataset containing the following variables: