sgkit.simulate_genotype_call_dataset#
- sgkit.simulate_genotype_call_dataset(n_variant, n_sample, n_ploidy=2, n_allele=2, n_contig=1, seed=0, missing_pct=None)#
Simulate genotype calls and variant/sample data.
Note that the data simulated by this function has no biological interpretation and that summary statistics or other methods applied to it will produce meaningless results. This function is primarily a convenience on generating
xarray.Datasetcontainers so quantities of interest should be overwritten, where appropriate, within the context of a more specific application.- Parameters
- n_variant :
int Number of variants to simulate
- n_sample :
int Number of samples to simulate
- n_ploidy :
int(default:2) Number of chromosome copies in each sample
- n_allele :
int(default:2) Number of alleles to simulate
- n_contig :
int(default:1) optional Number of contigs to partition variants with, controlling values in
variant_contig. Values will all be 0 by default whenn_contigis 1.- seed :
int|NoneOptional[int] (default:0) Seed for random number generation, optional
- missing_pct :
float|NoneOptional[float] (default:None) Donate the percent of missing calls, must be within [0.0, 1.0], optional
- n_variant :
- Return type
- Returns
A dataset containing the following variables:
sgkit.variables.variant_contig_spec(variants)sgkit.variables.variant_position_spec(variants)sgkit.variables.variant_allele_spec(variants)sgkit.variables.sample_id_spec(samples)sgkit.variables.call_genotype_spec(variants, samples, ploidy)sgkit.variables.call_genotype_mask_spec(variants, samples, ploidy)