sgkit.simulate_genotype_call_dataset¶
- sgkit.simulate_genotype_call_dataset(n_variant, n_sample, n_ploidy=2, n_allele=2, n_contig=1, seed=0, missing_pct=None)¶
Simulate genotype calls and variant/sample data.
Note that the data simulated by this function has no biological interpretation and that summary statistics or other methods applied to it will produce meaningless results. This function is primarily a convenience on generating
xarray.Dataset
containers so quantities of interest should be overwritten, where appropriate, within the context of a more specific application.- Parameters
- n_variant :
int
int
Number of variants to simulate
- n_sample :
int
int
Number of samples to simulate
- n_ploidy :
int
int
(default:2
) Number of chromosome copies in each sample
- n_allele :
int
int
(default:2
) Number of alleles to simulate
- n_contig :
int
int
(default:1
) optional Number of contigs to partition variants with, controlling values in
variant_contig
. Values will all be 0 by default whenn_contig
is 1.- seed :
int
|None
Optional
[int
] (default:0
) Seed for random number generation, optional
- missing_pct :
float
|None
Optional
[float
] (default:None
) Donate the percent of missing calls, must be within [0.0, 1.0], optional
- n_variant :
- Return type
- Returns
A dataset containing the following variables:
sgkit.variables.variant_contig_spec
(variants)sgkit.variables.variant_position_spec
(variants)sgkit.variables.variant_allele_spec
(variants)sgkit.variables.sample_id_spec
(samples)sgkit.variables.call_genotype_spec
(variants, samples, ploidy)sgkit.variables.call_genotype_mask_spec
(variants, samples, ploidy)