Tag Archives: MtDNA

R code for calculating Jost D for MtDNA sequences

23 Oct

2018 Updated link to example files and code:  DROPBOX.

Earlier I posted a piece of R code to calculate Jost D and Gst and the associated p-values (using a permutation test) for MtDNA sequences. I repost it here in two versions. The first one calculates pairwise values (between all pairs of populations in your data) and the second one calculates one global value. You may wonder how & why we calculate Jost D for sequence data. We did it by reducing the sequences to alleles, so that two individuals either carry the same or a different allele. We ignore information about how different different alleles are. The advantage of this approach is that we can directly compare MtDNA statistics with microsatellite statistics and thus learn something about the differences between male and female dispersal. All of this is described in our JEB paper from 2011.

R code for global Jost D values

R code for pairwise Jost D values

Here is an example file that works with both of the R scripts.

Both Jost D and Gst depend on diversity of the markers. For sequence data, this means that the length of the sequences will affect the outcomes. If you use longer sequences, Gst will go down whereas Jost D will go up. This is equivalent to the effects seen in microsatellites. For more variable microsatellites Gst becomes very small, and Jost D becomes high. In the extreme case where every individual carries a different allele, Jost D will be 1, but a simple permutation test shows that the associated p-value is 1 as well.

This figure is similar to figure 4 of our 2011 JEB paper.

It shows how Jost D (indicated by D) and Gst (indicated by G) change with the length of the sequences analyzed. For each sequence length, I repeated the calculations and the permutation test 20 times. The P’s show the fraction of these repeats that let to a significant result. With longer sequences, it is more likely to detect significant population structure. Had the sequences been even longer, then P would have gone down again, because with very long sequences every individual is different.

There is now an R package, written by David Winter from the University of Otago, that allows you to calculate Jost D and other useful statistics for microsatellite data, find the paper here and the package here.