### TIPS ###
# We first load all the libraries we will need for this practical. 
# Make sure you have those installed 
# If not then please visit https://www2.unil.ch/popgen/teaching/SISG22/
library(gaston)

## Warning: le package 'gaston' a été compilé avec la version R
## 4.2.3

## Warning: le package 'Rcpp' a été compilé avec la version R
## 4.2.3

## Warning: le package 'RcppParallel' a été compilé avec la
## version R 4.2.3

library(hierfstat)
library(JGTeach)

For this practical we will use both simulated pedigrees, simulated genetic data on these pedigrees and the small chunck of chromosome 22 from the 1000 genomes.

Kinship and inbreeding from pedigrees

load a pedigree from a monogamous population using read.table. Explore the pedigree: what are the first, second and third column? How many founders in this pedigree? Find at least two families with sibships of four (just inspect the pedigree).

Code

ped <- read.table("https://www2.unil.ch/popgen/teaching/SISGData/pedmono.txt",header=TRUE)
str(ped)
dim(ped)
#first column is the individual identifier
#second column is the female id 
#third column is the male id 


# number of founders:
# founders are individuals with missing sires and dams

sum(is.na(ped$sire) & is.na(ped$dam))

Answers

The first column is the individual identifier. The second column is the female id (dam is the term used in animal breeding for mother), while the third column is the male id (sire is the term used in animal breeding for father).

The number of founders is obtained from identifying how many members in the pedigree have unidentified parents. We find 20 founders.

individuals with IDs 28 to 32 share the same parents, and thus belong to the same family. Same for individuals 59 to 62.:

ped[28:32,]

##    ind dam sire
## 28  28  14   10
## 29  29  14   10
## 30  30  14   10
## 31  31  14   10
## 32  32  14   10

ped[59:62,]

##    ind dam sire
## 59  59  24   35
## 60  60  24   35
## 61  61  24   35
## 62  62  24   35

#first inds of gen 1:5 are   c(21,47,77,117,159)

Calculate the additive relationship matrix of individuals from this pedigree using JGTeach::pedARM. What is the pedigree relatedness among the sibship previously identified? and of these individuals with their parents? Examine sibship 92:95. What is special about it?

Code

# calculate relatedness matrix
ARM <- JGTeach::pedARM(ped$sire,ped$dam)

# a sibship 
ARM[28:32,28:32]

##      [,1] [,2] [,3] [,4] [,5]
## [1,]  1.0  0.5  0.5  0.5  0.5
## [2,]  0.5  1.0  0.5  0.5  0.5
## [3,]  0.5  0.5  1.0  0.5  0.5
## [4,]  0.5  0.5  0.5  1.0  0.5
## [5,]  0.5  0.5  0.5  0.5  1.0

ped[28:32,]

##    ind dam sire
## 28  28  14   10
## 29  29  14   10
## 30  30  14   10
## 31  31  14   10
## 32  32  14   10

# relatedness with parents
ARM[28:32,c(ped[28,2],ped[28,3])]

##      [,1] [,2]
## [1,]  0.5  0.5
## [2,]  0.5  0.5
## [3,]  0.5  0.5
## [4,]  0.5  0.5
## [5,]  0.5  0.5

# another example
ARM[59:62,59:62]

##      [,1] [,2] [,3] [,4]
## [1,]  1.0  0.5  0.5  0.5
## [2,]  0.5  1.0  0.5  0.5
## [3,]  0.5  0.5  1.0  0.5
## [4,]  0.5  0.5  0.5  1.0

ARM[59:62,c(ped[59,2],ped[59,3])]

##      [,1] [,2]
## [1,]  0.5  0.5
## [2,]  0.5  0.5
## [3,]  0.5  0.5
## [4,]  0.5  0.5

Answers

The pedigree relatedness among these sibship is 0.5, as is the relatedness of the sibships with their parents.

Sibship 92 to 95 is special in that the relatedness among sibs is larger than 0.5:

#relatedness higher than 0.5 with their siblings
ARM[92:95,92:95]

##         [,1]    [,2]    [,3]    [,4]
## [1,] 1.03125 0.56250 0.56250 0.56250
## [2,] 0.56250 1.03125 0.56250 0.56250
## [3,] 0.56250 0.56250 1.03125 0.56250
## [4,] 0.56250 0.56250 0.56250 1.03125

#and their parents
ARM[92:95,c(ped[92,2],ped[92,3])]

##         [,1]    [,2]
## [1,] 0.59375 0.53125
## [2,] 0.59375 0.53125
## [3,] 0.59375 0.53125
## [4,] 0.59375 0.53125

This is because their parents are related to each other:

#their parents are related
ARM[c(ped[92,2],ped[92,3]),c(ped[92,2],ped[92,3])]

##        [,1]   [,2]
## [1,] 1.1250 0.0625
## [2,] 0.0625 1.0000

Plot a histogram of the relatedness using only off diagonal elements. The function hierfstat::mat2vec will be useful. Comments? Transform the relatedness matrix into a kinship and inbreeding matrix using hierfstat::GRM2kinship. Plot a heatmap of resulting matrix using image and interpret it.

Code

hist(mat2vec(ARM), xlab='relatedness')

ped.kin <- hierfstat::grm2kinship(ARM)
# plot kinship with generation delimiters 
image(ped$ind,ped$ind,ped.kin)
genlim<-c(21,47,77,117,159)
#add vertical and horizontal lines between generations
abline(h=genlim-.5,v=genlim-.5)

### TIP
# a different way of plotting it using the corrplot function:
# library(corrplot) ; corrplot(ARM/2, method = 'shade', tl.pos='n', is.corr=F,diag=F)
# notice that the ARM/2 will translate relatedness to kinship but only for the off-diagonal elements

Answers

The histogram of relatedness shows a (small) majority of individuals are unrelated, and most have a relatedness coefficient \(\leq 0.25\). The heatmap shows relatedness to increase from left to right and bottom to top, that is, as pedigree depth increases. From the 1st generation following founders, we see redish blocks along the main diagonal corresponding to sibships, and, as we move right and top, these blocks become reder, meaning that relatedness among sibs increase due to the relatedness of their parents. We see that four generations after founding of the pedigree, most individuals are related. This is due to the small number of founders of this pedigree, and may not happen in larger ones, but it is a genaral observation that as the depth of pedigree increases, individuals of later generations are more related.

Kinship and inbreeding from markers, compared to pedigrees

We have generated genotypes dosage from these individuals. First download, then load this file using readRDS, and compute

Allele sharing kinship using hierfstat::beta.dosage
Standard GRM (ratio of averages) using JGTeach::Kc0
Standard GRM (average of ratios) as computed in e.g. GCTA using gaston::GRM Compare each to the expected values obtained from the pedigree and discuss.

Code

genoped <- readRDS("../geno_mono.RDS")
# calculate allele sharing kinship
genoped.Kas <- hierfstat::beta.dosage(genoped,inb=FALSE,Mb=TRUE)
# calculate ratio of averages
genoped.M <- with(genoped.Kas,betas*(1-MB)+MB)
genoped.Kc0 <- JGTeach::Kc0(genoped.M,matching=TRUE)
# calculate average of ratios
bed.genoped <- gaston::as.bed.matrix(genoped)
genoped.std.GRM <- gaston::GRM(bed.genoped,autosome.only=FALSE)

# Plotting
par(mfrow=c(1,3))
plot(mat2vec(ARM)/2,mat2vec(genoped.Kas$betas),
     cex=0.5,col="red",main=expression(K[AS]),
     xlab="ped kinship",ylab="est. kinship", pch=20)
abline(c(0,1))
plot(mat2vec(ARM)/2,mat2vec(genoped.Kc0),
     cex=0.5,col="red",main=expression(K[c0]),
     xlab="ped kinship",ylab="est. kinship", pch=20)
abline(c(0,1))

# gaston::GRM reports relatedness rather than kinship, hence the /2
plot(mat2vec(ARM)/2,mat2vec(genoped.std.GRM)/2,
     cex=0.5,col="red",main="standard kinship",
     xlab="ped kinship",ylab="est. kinship", pch=20)
abline(c(0,1))

par(mfrow=c(1,1))

Answers

We first notice that most genomic based estimates are lower than the corresponding pedigree values, as all points are below the \(y=x\) line. We see that genomic estimates are often negative, while all pedigree based one are \(\geq 0\). We also observe that the allele sharing (\(K_{AS}\), left) estimates are less variable than the standard estimates based on ratio of averages (\(K_{c0}\), middle) or average of ratios (standard, right), this later one also showing a lower slope than the other twos.

Redo these comparisons after having “rescaled” the pedigree kinship such that mean expected kinship is 0 and discuss

Code

#we divide by two because we want kinships rather than relatedness
mARM<-mean(mat2vec(ARM)/2)
# (kinship - mean) / (1 - mean)
ARMc<-(ARM/2-mARM)/(1-mARM)

# Plotting 
par(mfrow=c(1,3))
plot(mat2vec(ARMc),mat2vec(genoped.Kas$betas),
     cex=0.5,col="red",main=expression(K[AS]),
     xlab="ped kinship",ylab="est. kinship",pch=20)
abline(c(0,1))
plot(mat2vec(ARMc),mat2vec(genoped.Kc0),
     cex=0.5,col="red",main=expression(K[c0]),
     xlab="ped kinship",ylab="est. kinship",pch=20)
abline(c(0,1))
plot(mat2vec(ARMc),mat2vec(genoped.std.GRM)/2,
     cex=0.5,col="red",main="standard kinship",
     xlab="ped kinship",ylab="est. kinship",pch=20)
abline(c(0,1))

par(mfrow=c(1,1))

Answers

Once rescaled, we observe that the allele sharing estimates \(K_{AS}\) are all very close to the \(y=x\) line, meaning a good correspondence between pedigree and genomic inbreeding. The discrepancies between \(K_{c0}\) and pedigree kinship is reduced, but not completely eliminated, and large variation for specific pedigree values exist. Last, the standard estimate is worse than the two others, and showing a huge variance.

Note that genomic estimates and pedigree measures of kinship will not always closely match, as pedigree measures are an expectation, whereas genomic values are realization of the process, which depend on the vagaries of Mendelian sampling which will be exemplified in genomes with low recombination.

For a further discussion of theses issues, see Goudet etal., 2018 Molecular Ecology and Weir and Goudet, 2017 Genetics, Fig. 5 and followings

kinship and inbreeding in the 1000 genomes

We will now compare these three estimators of genomic kinship using the subset of Chromosome 22 we used in previous practical. Start by downloading this file and computing the three estimators.

Code

ch22 <- read.VCF("../chr22_Mb0_20.recode.vcf.gz")
ch22.M<-readRDS("../matching.ch22.RDS")
# for scaling 
Mb <- mean(mat2vec(ch22.M))

ch22.Kas<-(ch22.M-Mb)/(1-Mb)
# takes a while to run - hang in there 
ch22.Kc0<-JGTeach::Kc0(ch22.M,matching=TRUE) 
ch22.std.GRM<-gaston::GRM(ch22)

Produce a heat map for each (this may take some time, you might want to save the results to an external file for future inspection, hence the commented call to png), after having removed the main diagonal (which contains the self kinships), and arranging samples according to super_population first, and then population.

Code

#extract self kinship and convert it to inbreeding
ch22.Kas.inb<-diag(ch22.Kas)*2-1/2
diag(ch22.Kas)<-NA
ch22.Kc0.inb<-diag(ch22.Kc0)*2-1/2
diag(ch22.Kc0)<-NA
ch22.std.Inb<-diag(ch22.std.GRM)-1
diag(ch22.std.GRM)<-NA

# sample description file
samp.desc<-read.table("https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/integrated_call_samples_v3.20130502.ALL.panel",header=TRUE,stringsAsFactors = TRUE)

#to order samples 
ospp<-with(samp.desc,order(super_pop,pop))
#to plot superpop lines
lsp<-cumsum(table(samp.desc$super_pop))

#png("1kg_ch22chunck_Kas.png")
image(1:2504,1:2504,ch22.Kas[ospp,ospp], main = 'Kas')
abline(h=lsp,v=lsp)

#dev.off()
#png("1kg_ch22chunck_Kc0.png")
image(1:2504,1:2504,ch22.Kc0[ospp,ospp], main = 'Kc0')
abline(h=lsp,v=lsp)

#dev.off()
#png("1kg_ch22chunck_Kstdk.png")
image(1:2504,1:2504,ch22.std.GRM[ospp,ospp]/2, main = 'standard')
abline(h=lsp,v=lsp)

#dev.off()

Discuss the resulting heatmaps, in the light of what you know about human population demographic history.

Answer

The different blocks correspond to continent, organized in order Africa (AFR), America (AMR), East Asia (EAS), Europe (EUR) and South Asia (SAS) from left to right and bottom to top. lighter colors mean lower kinship, reder means higher kinship.

The first heatmap corresponds to \(K_{AS}\) kinship, and shows AFR samples to be lest related among themselves and with all others than samples from the other continents, who show higher kinship, in particular within the EAS, but also the EUR continent. Noteworthy is the high kinship shown for samples belonging to different, non African continents (the large red block at the top right)

The second heatmap corresponds to \(K_{c0}\) and shows a completely different picture: This time, African samples are the most related of all samples, with EAS and EUR showing also high kinship, but lower than among AFR samples. Samples from different continent but AFR do not show the high relatedness shown with \(K_{AS}\).

The third heatmap corresponds to the standard estimates of kinship and does not show any variation.

We will discuss these important results and their implications at the end of the practical session

[optional] Filter the dataset on a minor allele frequency of 5% and remake the standard GRM kinship plot. Does filtering improve the kinship inferred ?

Code

# index snps with maf > .05 
maf05.indx <- which(ch22@snps$maf > .05)
# make a new GRM with the subseted loci
ch22.std.maf05 <- GRM( ch22[, maf05.indx] )
# diagonal muted
diag(ch22.std.maf05) <- NA
# plot
image(ch22.std.maf05[ospp,ospp]/2, main = 'standard maf05')
abline(h=lsp,v=lsp)

Answer

Once filtered at maf \(\geq 0.05\), the standard GRM shows a similar pattern to \(K_{c0}\), demonstrating that the low MAF alleles are indeed affecting the estimates of kinship and should be removed if one wants to use this standard GRM. We advise not to use the standard GRM.

END OF PRACTICAL 3.

practical 3: Inbreeding and kinshhip

Jerome Goudet and Bruce Weir

2023-07-17

Kinship and inbreeding from pedigrees

Kinship and inbreeding from markers, compared to pedigrees

kinship and inbreeding in the 1000 genomes