sgkit.pc_relate#

sgkit.pc_relate(ds, *, maf=0.01, call_genotype='call_genotype', call_genotype_mask='call_genotype_mask', sample_pc='sample_pca_projection', merge=True)#

Compute PC-Relate as described in Conomos, et al. 2016 [1].

This method computes the kinship coefficient matrix. The kinship coefficient for a pair of individuals i and j is commonly defined to be the probability that a random allele selected from i and a random allele selected from j at a locus are IBD. Several of the most common family relationships and their corresponding kinship coefficient:

Relationship

Kinship coefficient

Individual-self

1/2

full sister/full brother

1/4

mother/father/daughter/son

1/4

grandmother/grandfather/granddaughter/grandson

1/8

aunt/uncle/niece/nephew

1/8

first cousin

1/16

half-sister/half-brother

1/8

Parameters:
ds Dataset

Dataset containing (S = num samples, V = num variants, D = ploidy, PC = num PC)

  • genotype calls: (SxVxD)

  • genotype calls mask: (SxVxD)

  • sample PCs: (SxPC)

maf float (default: 0.01)

individual minor allele frequency filter. If an individual’s estimated individual-specific minor allele frequency at a SNP is less than this value, that SNP will be excluded from the analysis for that individual. The default value is 0.01. Must be between (0.0, 0.1).

call_genotype Hashable (default: 'call_genotype')

Input variable name holding call_genotype. Defined by sgkit.variables.call_genotype_spec.

call_genotype_mask Hashable (default: 'call_genotype_mask')

Input variable name holding call_genotype_mask. Defined by sgkit.variables.call_genotype_mask_spec

sample_pc Hashable (default: 'sample_pca_projection')

Input variable name holding sample principal components. Defined by sgkit.variables.sample_pca_projection_spec

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Warning

This function is only applicable to diploid, biallelic datasets. This version is compatible with the R implementation of PC Relate method from the GENESIS package version 2.18.0.

Return type:

Dataset

Returns:

: Dataset containing (S = num samples):

sgkit.variables.pc_relate_phi_spec: (S,S) ArrayLike pairwise recent kinship coefficient matrix as float in [-0.5, 0.5]. The dimensions are named samples_0 and samples_1.

References

[1] - Conomos, Matthew P., Alexander P. Reiner, Bruce S. Weir, and Timothy A. Thornton. 2016. “Model-Free Estimation of Recent Genetic Relatedness.” American Journal of Human Genetics 98 (1): 127–48.

Raises:
  • ValueError – If ploidy of provided dataset != 2

  • ValueError – If maximum number of alleles in provided dataset != 2

  • ValueError – Input dataset is missing any of the required variables

  • ValueError – If maf is not in (0.0, 1.0)