### TIPS ###
# We first load all the libraries we will need for this practical.
# Make sure you have those installed
# If not then please visit https://www2.unil.ch/popgen/teaching/SISG22/
library(gaston)
## Warning: le package 'gaston' a été compilé avec la version R
## 4.2.3
## Warning: le package 'Rcpp' a été compilé avec la version R
## 4.2.3
## Warning: le package 'RcppParallel' a été compilé avec la
## version R 4.2.3
library(hierfstat)
library(JGTeach)
For this practical we will use both simulated pedigrees, simulated genetic data on these pedigrees and the small chunck of chromosome 22 from the 1000 genomes.
read.table
.
Explore the pedigree: what are the first, second and third column? How
many founders in this pedigree? Find at least two families with sibships
of four (just inspect the pedigree).ped <- read.table("https://www2.unil.ch/popgen/teaching/SISGData/pedmono.txt",header=TRUE)
str(ped)
dim(ped)
#first column is the individual identifier
#second column is the female id
#third column is the male id
# number of founders:
# founders are individuals with missing sires and dams
sum(is.na(ped$sire) & is.na(ped$dam))
The first column is the individual identifier. The second column is the female id (dam is the term used in animal breeding for mother), while the third column is the male id (sire is the term used in animal breeding for father).
The number of founders is obtained from identifying how many members in the pedigree have unidentified parents. We find 20 founders.
individuals with IDs 28 to 32 share the same parents, and thus belong to the same family. Same for individuals 59 to 62.:
ped[28:32,]
## ind dam sire
## 28 28 14 10
## 29 29 14 10
## 30 30 14 10
## 31 31 14 10
## 32 32 14 10
ped[59:62,]
## ind dam sire
## 59 59 24 35
## 60 60 24 35
## 61 61 24 35
## 62 62 24 35
#first inds of gen 1:5 are c(21,47,77,117,159)
JGTeach::pedARM
. What is the pedigree
relatedness among the sibship previously identified? and of these
individuals with their parents? Examine sibship 92:95
. What
is special about it?# calculate relatedness matrix
ARM <- JGTeach::pedARM(ped$sire,ped$dam)
# a sibship
ARM[28:32,28:32]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0 0.5 0.5 0.5 0.5
## [2,] 0.5 1.0 0.5 0.5 0.5
## [3,] 0.5 0.5 1.0 0.5 0.5
## [4,] 0.5 0.5 0.5 1.0 0.5
## [5,] 0.5 0.5 0.5 0.5 1.0
ped[28:32,]
## ind dam sire
## 28 28 14 10
## 29 29 14 10
## 30 30 14 10
## 31 31 14 10
## 32 32 14 10
# relatedness with parents
ARM[28:32,c(ped[28,2],ped[28,3])]
## [,1] [,2]
## [1,] 0.5 0.5
## [2,] 0.5 0.5
## [3,] 0.5 0.5
## [4,] 0.5 0.5
## [5,] 0.5 0.5
# another example
ARM[59:62,59:62]
## [,1] [,2] [,3] [,4]
## [1,] 1.0 0.5 0.5 0.5
## [2,] 0.5 1.0 0.5 0.5
## [3,] 0.5 0.5 1.0 0.5
## [4,] 0.5 0.5 0.5 1.0
ARM[59:62,c(ped[59,2],ped[59,3])]
## [,1] [,2]
## [1,] 0.5 0.5
## [2,] 0.5 0.5
## [3,] 0.5 0.5
## [4,] 0.5 0.5
The pedigree relatedness among these sibship is 0.5, as is the relatedness of the sibships with their parents.
Sibship 92 to 95 is special in that the relatedness among sibs is larger than 0.5:
#relatedness higher than 0.5 with their siblings
ARM[92:95,92:95]
## [,1] [,2] [,3] [,4]
## [1,] 1.03125 0.56250 0.56250 0.56250
## [2,] 0.56250 1.03125 0.56250 0.56250
## [3,] 0.56250 0.56250 1.03125 0.56250
## [4,] 0.56250 0.56250 0.56250 1.03125
#and their parents
ARM[92:95,c(ped[92,2],ped[92,3])]
## [,1] [,2]
## [1,] 0.59375 0.53125
## [2,] 0.59375 0.53125
## [3,] 0.59375 0.53125
## [4,] 0.59375 0.53125
This is because their parents are related to each other:
#their parents are related
ARM[c(ped[92,2],ped[92,3]),c(ped[92,2],ped[92,3])]
## [,1] [,2]
## [1,] 1.1250 0.0625
## [2,] 0.0625 1.0000
hierfstat::mat2vec
will be useful.
Comments? Transform the relatedness matrix into a kinship and inbreeding
matrix using hierfstat::GRM2kinship
. Plot a heatmap of
resulting matrix using image
and interpret it.hist(mat2vec(ARM), xlab='relatedness')
ped.kin <- hierfstat::grm2kinship(ARM)
# plot kinship with generation delimiters
image(ped$ind,ped$ind,ped.kin)
genlim<-c(21,47,77,117,159)
#add vertical and horizontal lines between generations
abline(h=genlim-.5,v=genlim-.5)
### TIP
# a different way of plotting it using the corrplot function:
# library(corrplot) ; corrplot(ARM/2, method = 'shade', tl.pos='n', is.corr=F,diag=F)
# notice that the ARM/2 will translate relatedness to kinship but only for the off-diagonal elements
The histogram of relatedness shows a (small) majority of individuals are unrelated, and most have a relatedness coefficient \(\leq 0.25\). The heatmap shows relatedness to increase from left to right and bottom to top, that is, as pedigree depth increases. From the 1st generation following founders, we see redish blocks along the main diagonal corresponding to sibships, and, as we move right and top, these blocks become reder, meaning that relatedness among sibs increase due to the relatedness of their parents. We see that four generations after founding of the pedigree, most individuals are related. This is due to the small number of founders of this pedigree, and may not happen in larger ones, but it is a genaral observation that as the depth of pedigree increases, individuals of later generations are more related.
readRDS
, and computehierfstat::beta.dosage
JGTeach::Kc0
gaston::GRM
Compare each to the expected values obtained
from the pedigree and discuss.genoped <- readRDS("../geno_mono.RDS")
# calculate allele sharing kinship
genoped.Kas <- hierfstat::beta.dosage(genoped,inb=FALSE,Mb=TRUE)
# calculate ratio of averages
genoped.M <- with(genoped.Kas,betas*(1-MB)+MB)
genoped.Kc0 <- JGTeach::Kc0(genoped.M,matching=TRUE)
# calculate average of ratios
bed.genoped <- gaston::as.bed.matrix(genoped)
genoped.std.GRM <- gaston::GRM(bed.genoped,autosome.only=FALSE)
# Plotting
par(mfrow=c(1,3))
plot(mat2vec(ARM)/2,mat2vec(genoped.Kas$betas),
cex=0.5,col="red",main=expression(K[AS]),
xlab="ped kinship",ylab="est. kinship", pch=20)
abline(c(0,1))
plot(mat2vec(ARM)/2,mat2vec(genoped.Kc0),
cex=0.5,col="red",main=expression(K[c0]),
xlab="ped kinship",ylab="est. kinship", pch=20)
abline(c(0,1))
# gaston::GRM reports relatedness rather than kinship, hence the /2
plot(mat2vec(ARM)/2,mat2vec(genoped.std.GRM)/2,
cex=0.5,col="red",main="standard kinship",
xlab="ped kinship",ylab="est. kinship", pch=20)
abline(c(0,1))
par(mfrow=c(1,1))
#we divide by two because we want kinships rather than relatedness
mARM<-mean(mat2vec(ARM)/2)
# (kinship - mean) / (1 - mean)
ARMc<-(ARM/2-mARM)/(1-mARM)
# Plotting
par(mfrow=c(1,3))
plot(mat2vec(ARMc),mat2vec(genoped.Kas$betas),
cex=0.5,col="red",main=expression(K[AS]),
xlab="ped kinship",ylab="est. kinship",pch=20)
abline(c(0,1))
plot(mat2vec(ARMc),mat2vec(genoped.Kc0),
cex=0.5,col="red",main=expression(K[c0]),
xlab="ped kinship",ylab="est. kinship",pch=20)
abline(c(0,1))
plot(mat2vec(ARMc),mat2vec(genoped.std.GRM)/2,
cex=0.5,col="red",main="standard kinship",
xlab="ped kinship",ylab="est. kinship",pch=20)
abline(c(0,1))
par(mfrow=c(1,1))
Once rescaled, we observe that the allele sharing estimates \(K_{AS}\) are all very close to the \(y=x\) line, meaning a good correspondence between pedigree and genomic inbreeding. The discrepancies between \(K_{c0}\) and pedigree kinship is reduced, but not completely eliminated, and large variation for specific pedigree values exist. Last, the standard estimate is worse than the two others, and showing a huge variance.
Note that genomic estimates and pedigree measures of kinship will not always closely match, as pedigree measures are an expectation, whereas genomic values are realization of the process, which depend on the vagaries of Mendelian sampling which will be exemplified in genomes with low recombination.
For a further discussion of theses issues, see Goudet etal., 2018 Molecular Ecology and Weir and Goudet, 2017 Genetics, Fig. 5 and followings
ch22 <- read.VCF("../chr22_Mb0_20.recode.vcf.gz")
ch22.M<-readRDS("../matching.ch22.RDS")
# for scaling
Mb <- mean(mat2vec(ch22.M))
ch22.Kas<-(ch22.M-Mb)/(1-Mb)
# takes a while to run - hang in there
ch22.Kc0<-JGTeach::Kc0(ch22.M,matching=TRUE)
ch22.std.GRM<-gaston::GRM(ch22)
png
), after having removed the main
diagonal (which contains the self kinships), and arranging samples
according to super_population first, and then population.#extract self kinship and convert it to inbreeding
ch22.Kas.inb<-diag(ch22.Kas)*2-1/2
diag(ch22.Kas)<-NA
ch22.Kc0.inb<-diag(ch22.Kc0)*2-1/2
diag(ch22.Kc0)<-NA
ch22.std.Inb<-diag(ch22.std.GRM)-1
diag(ch22.std.GRM)<-NA
# sample description file
samp.desc<-read.table("https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/integrated_call_samples_v3.20130502.ALL.panel",header=TRUE,stringsAsFactors = TRUE)
#to order samples
ospp<-with(samp.desc,order(super_pop,pop))
#to plot superpop lines
lsp<-cumsum(table(samp.desc$super_pop))
#png("1kg_ch22chunck_Kas.png")
image(1:2504,1:2504,ch22.Kas[ospp,ospp], main = 'Kas')
abline(h=lsp,v=lsp)
#dev.off()
#png("1kg_ch22chunck_Kc0.png")
image(1:2504,1:2504,ch22.Kc0[ospp,ospp], main = 'Kc0')
abline(h=lsp,v=lsp)
#dev.off()
#png("1kg_ch22chunck_Kstdk.png")
image(1:2504,1:2504,ch22.std.GRM[ospp,ospp]/2, main = 'standard')
abline(h=lsp,v=lsp)
#dev.off()
The different blocks correspond to continent, organized in order Africa (AFR), America (AMR), East Asia (EAS), Europe (EUR) and South Asia (SAS) from left to right and bottom to top. lighter colors mean lower kinship, reder means higher kinship.
The first heatmap corresponds to \(K_{AS}\) kinship, and shows AFR samples to be lest related among themselves and with all others than samples from the other continents, who show higher kinship, in particular within the EAS, but also the EUR continent. Noteworthy is the high kinship shown for samples belonging to different, non African continents (the large red block at the top right)
The second heatmap corresponds to \(K_{c0}\) and shows a completely different picture: This time, African samples are the most related of all samples, with EAS and EUR showing also high kinship, but lower than among AFR samples. Samples from different continent but AFR do not show the high relatedness shown with \(K_{AS}\).
The third heatmap corresponds to the standard estimates of kinship and does not show any variation.
We will discuss these important results and their implications at the end of the practical session
# index snps with maf > .05
maf05.indx <- which(ch22@snps$maf > .05)
# make a new GRM with the subseted loci
ch22.std.maf05 <- GRM( ch22[, maf05.indx] )
# diagonal muted
diag(ch22.std.maf05) <- NA
# plot
image(ch22.std.maf05[ospp,ospp]/2, main = 'standard maf05')
abline(h=lsp,v=lsp)
END OF PRACTICAL 3.