sgkit.simulate_genotype_call_dataset#
- sgkit.simulate_genotype_call_dataset(n_variant, n_sample, n_ploidy=2, n_allele=2, n_contig=1, seed=0, missing_pct=None, phased=None, additional_variant_fields=None)#
Simulate genotype calls and variant/sample data.
Note that the data simulated by this function has no biological interpretation and that summary statistics or other methods applied to it will produce meaningless results. This function is primarily a convenience on generating
xarray.Dataset
containers so quantities of interest should be overwritten, where appropriate, within the context of a more specific application.- Parameters:
- n_variant
int
Number of variants to simulate
- n_sample
int
Number of samples to simulate
- n_ploidy
int
(default:2
) Number of chromosome copies in each sample
- n_allele
int
(default:2
) Number of alleles to simulate
- n_contig
int
(default:1
) optional Number of contigs to partition variants with, controlling values in
variant_contig
. Values will all be 0 by default whenn_contig
is 1.- seed
int
|None
Optional
[int
] (default:0
) Seed for random number generation, optional
- missing_pct
float
|None
Optional
[float
] (default:None
) The percentage of missing calls, must be within [0.0, 1.0], optional
- phased
bool
|None
Optional
[bool
] (default:None
) Whether genotypes are phased, default is unphased, optional
- additional_variant_fields
dict
|None
Optional
[dict
] (default:None
) Additional variant fields to add to the dataset as a dictionary of {field_name: field_dtype}, optional
- n_variant
- Return type:
- Returns:
: A dataset containing the following variables:
sgkit.variables.variant_contig_spec
(variants)sgkit.variables.variant_position_spec
(variants)sgkit.variables.variant_allele_spec
(variants)sgkit.variables.sample_id_spec
(samples)sgkit.variables.call_genotype_spec
(variants, samples, ploidy)sgkit.variables.call_genotype_mask_spec
(variants, samples, ploidy)sgkit.variables.call_genotype_phased_spec
(variants, samples), ifphased
is not NoneThose specified in
additional_variant_fields
, if provided