sgkit.pc_relate#
- sgkit.pc_relate(ds, *, maf=0.01, call_genotype='call_genotype', call_genotype_mask='call_genotype_mask', sample_pc='sample_pca_projection', merge=True)#
Compute PC-Relate as described in Conomos, et al. 2016 [1].
This method computes the kinship coefficient matrix. The kinship coefficient for a pair of individuals
i
andj
is commonly defined to be the probability that a random allele selected fromi
and a random allele selected fromj
at a locus are IBD. Several of the most common family relationships and their corresponding kinship coefficient:Relationship
Kinship coefficient
Individual-self
1/2
full sister/full brother
1/4
mother/father/daughter/son
1/4
grandmother/grandfather/granddaughter/grandson
1/8
aunt/uncle/niece/nephew
1/8
first cousin
1/16
half-sister/half-brother
1/8
- Parameters:
- ds
Dataset
Dataset containing (S = num samples, V = num variants, D = ploidy, PC = num PC)
genotype calls: (SxVxD)
genotype calls mask: (SxVxD)
sample PCs: (SxPC)
- maf
float
(default:0.01
) individual minor allele frequency filter. If an individual’s estimated individual-specific minor allele frequency at a SNP is less than this value, that SNP will be excluded from the analysis for that individual. The default value is 0.01. Must be between (0.0, 0.1).
- call_genotype
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype. Defined by
sgkit.variables.call_genotype_spec
.- call_genotype_mask
Hashable
(default:'call_genotype_mask'
) Input variable name holding call_genotype_mask. Defined by
sgkit.variables.call_genotype_mask_spec
- sample_pc
Hashable
(default:'sample_pca_projection'
) Input variable name holding sample principal components. Defined by
sgkit.variables.sample_pca_projection_spec
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
Warning
This function is only applicable to diploid, biallelic datasets. This version is compatible with the R implementation of PC Relate method from the GENESIS package version 2.18.0.
- Return type:
- Returns:
: Dataset containing (S = num samples):
sgkit.variables.pc_relate_phi_spec
: (S,S) ArrayLike pairwise recent kinship coefficient matrix as float in [-0.5, 0.5]. The dimensions are namedsamples_0
andsamples_1
.
References
[1] - Conomos, Matthew P., Alexander P. Reiner, Bruce S. Weir, and Timothy A. Thornton. 2016. “Model-Free Estimation of Recent Genetic Relatedness.” American Journal of Human Genetics 98 (1): 127–48.
- Raises:
ValueError – If ploidy of provided dataset != 2
ValueError – If maximum number of alleles in provided dataset != 2
ValueError – Input dataset is missing any of the required variables
ValueError – If maf is not in (0.0, 1.0)