Relatedness is defined really only for the comparison between two individuals. So in the case when COI = 2, we have relatedness between the two strains equal to r (relatedness).
The next way to define relatedness I suppose is the mean relatedness when we do all the pairwise comparisons as you have detailed for 4 strains. Let's look at the case with 3 strains (A, B, C) and the steps we take, and have r be 0.5 and lets say there are 12 loci.
Strain C will actually have 4.5 loci shared with A and B, because half of the loci it shares with B will be also shared with A due to step 3.
Therefore, the mean shared number of loci we would expect would be ((6 + 4.5 + 4.5)/3)/12 = 5/12
We can generalise this mathematically (something like this - haven't figured it out fully, it will collapse down I am sure), for k = COI, and r, mean relatedness:
Crucially though this decays away from r with increasing COI. So maybe we should change this so that each strain is equally related to each other to begin with. We may need to think about this a bit more though as coding that is quite hard. If we make it so that strain C above shared 6 loci with A and B we then end up increasing relatedness with increasing COI due to similar reasons.
dummy_rel <- function(COI = 3, nl = 12, r = 0.5) {
s <- matrix(data = seq_len(COI*nl), nrow = COI, ncol = nl, byrow = TRUE)
if (r > 0 && COI > 1) {
# If there is relatedness we iteratively step through each lineage
for (i in seq_len(COI - 1)) {
# For each loci we assume that it is related with probability relatedness
rel_i <- as.logical(rbinom(nl, size = 1, prob = r))
# And for those sites that related, we draw the other lineages
if (any(rel_i)) {
if (i == 1) {
s[i+1, rel_i] <- s[seq_len(i), rel_i]
} else {
s[i+1, rel_i] <- apply(s[seq_len(i), rel_i, drop = FALSE], 2, sample, size = 1)
}
}
}
}
return(s)
}
relatedness_2 <- function(s) {
sum(s[1,] == s[2,])/ncol(s)
}
relatedness <- function(s) {
Triangular <- function(n) choose(seq(n),2)
rs <- tail(Triangular(nrow(s)),1)
rs <- rep(0, rs)
count <- 1
for (i in seq_len(nrow(s)-1)) {
for(j in seq(i+1, nrow(s))) {
rs[count] <- relatedness_2(s[c(i, j),])
count <- count + 1
}
}
return(rs)
}
ks <- 2:10
r_mean <- vector("list",length(ks)+1)
for(c in ks) {
ss <- replicate(10000, dummy_rel(COI = c))
rs <- apply(ss, 3, relatedness)
if(c > 2) {
rs <- colMeans(rs)
}
r_mean[[c]] <- data.frame("COI" = c, "rel" = rs)
}
r_df <- do.call(rbind, r_mean)
ggplot(r_df, aes(as.factor(COI), rel)) +
geom_boxplot() +
geom_smooth(method = "loess", aes(group = 1)) +
xlab("COI") + ylab("Relatedness")