Now some more about population specific betas, and their interpretation. We will look at slides 176-181 in some details, using equations from slides 150-154, but without the sample size correction since we have allele frequencies.
np<-4
#Bruce's example (slides 176-177)
paf1<-c(0.5,0.5)
paf2<-c(0.3,0.7)
pin1<-c(0.28,0.72)
pin2<-c(0.32,0.68)
mp<-cbind(paf1,paf2,pin1,pin2) # all pop all freq
hs<-1-t(mp)%*%mp #all diversities
(hb<-1/np/(np-1)*sum(hs-diag(4)*diag(hs))) # world Hb
## [1] 0.4601333
(hba<-1/2*sum(hs[1:2,1:2]-diag(2)*hs[1:2,1:2])) # afr Hb
## [1] 0.5
(hbi<-1/2*sum(hs[3:4,3:4]-diag(2)*hs[3:4,3:4])) # In Hb
## [1] 0.4208
(1-diag(hs)/hb) # betai world
## paf1 paf2 pin1 pin2
## -0.08664155 0.08722110 0.12373225 0.05418719
(1-hs[1,1]/hba) # beta af1 rel africa
## [1] 0
(1-hs[2,2]/hba) # beta af2 rel africa
## [1] 0.16
(1-hs[3,3]/hbi) # beta in1 rel in
## [1] 0.0418251
(1-hs[4,4]/hbi) # beta in2 rel in
## [1] -0.03422053
Now we can write a function to carry out the calculations, basically a wrapper around the code of teh previous section:
betas.freq<-function(freq){
np<-dim(freq)[2]
hs<-1-t(freq)%*%freq #all diversities
hb<-1/np/(np-1)*sum(hs-diag(np)*diag(hs)) # world Hb
betas<-1-diag(hs)/hb
betaw<-mean(betas)
return(list(betaw=betaw,betai=betas,hs=hs,hb=hb))
}
betas.freq(mp) #world
## $betaw
## [1] 0.04462475
##
## $betai
## paf1 paf2 pin1 pin2
## -0.08664155 0.08722110 0.12373225 0.05418719
##
## $hs
## paf1 paf2 pin1 pin2
## paf1 0.5 0.500 0.5000 0.5000
## paf2 0.5 0.420 0.4120 0.4280
## pin1 0.5 0.412 0.4032 0.4208
## pin2 0.5 0.428 0.4208 0.4352
##
## $hb
## [1] 0.4601333
betas.freq(mp[,1:2]) #africa only
## $betaw
## [1] 0.08
##
## $betai
## paf1 paf2
## 0.00 0.16
##
## $hs
## paf1 paf2
## paf1 0.5 0.50
## paf2 0.5 0.42
##
## $hb
## [1] 0.5
betas.freq(mp[,3:4]) #Inuit only
## $betaw
## [1] 0.003802281
##
## $betai
## pin1 pin2
## 0.04182510 -0.03422053
##
## $hs
## pin1 pin2
## pin1 0.4032 0.4208
## pin2 0.4208 0.4352
##
## $hb
## [1] 0.4208
Now, imagine that we are dealing with microsats (STRs) rather than snps. Each locus can have many alleles. For instance, 4 alleles. We are thinking of the first 2 populations being from Africa, with a lot of variation, while the other two are for instance inuit, and have been populated from a sample of the african variation. For instance:
paf1<-c(0.23,0.22,0.25,0.30)
paf2<-c(0.24,0.23,0.27,0.26)
pin1<-c(0.82,0.18,0,0)
pin2<-c(0.87,0.13,0,0)
mp<-cbind(paf1,paf2,pin1,pin2) # all pop all freq
We can now estimate beta worlwide, or by continent:
betas.freq(mp) #world
## $betaw
## [1] 0.2582575
##
## $betai
## paf1 paf2 pin1 pin2
## -0.0978642 -0.1019838 0.5656801 0.6671980
##
## $hs
## paf1 paf2 pin1 pin2
## paf1 0.7462 0.7487 0.7718 0.7713
## paf2 0.7487 0.7490 0.7618 0.7613
## pin1 0.7718 0.7618 0.2952 0.2632
## pin2 0.7713 0.7613 0.2632 0.2262
##
## $hb
## [1] 0.6796833
betas.freq(mp[,1:2]) #africa only
## $betaw
## [1] 0.001469213
##
## $betai
## paf1 paf2
## 0.0033391211 -0.0004006945
##
## $hs
## paf1 paf2
## paf1 0.7462 0.7487
## paf2 0.7487 0.7490
##
## $hb
## [1] 0.7487
betas.freq(mp[,3:4]) #Inuit only
## $betaw
## [1] 0.00949848
##
## $betai
## pin1 pin2
## -0.1215805 0.1405775
##
## $hs
## pin1 pin2
## pin1 0.2952 0.2632
## pin2 0.2632 0.2262
##
## $hb
## [1] 0.2632