Sample Processing and Sequencing
Ancient DNA was extracted from dental cementum of molar or premolar teeth from archaeological skeletal remains studied by the Baikal Archaeology Project. Sampling was conducted in dedicated clean facilities at the Lundbeck Foundation Centre for GeoGenetics (Copenhagen) and the Institute of Archaeology, University College London.
Cementum was isolated using a sterilized rotary saw, pulverized, demineralized, and digested enzymatically. Aliquots were 50–100 mg. Library preparation followed Allentoft et al. with double-stranded protocol (Margaryan et al.) or single-stranded protocol for low-template samples. Libraries were sequenced on Illumina NovaSeq 6000 S4 flowcells (100 bp paired-end reads) at GeoGenetics Sequencing Core (Copenhagen). Samples were screened without UDG treatment; some later libraries were built with UDG treatment.
Y. pestis Capture and Sequencing
Libraries with Y. pestis DNA detected during screening underwent hybridization capture using Arbor Sciences myBaits kit (single round of enrichment). Captured libraries were re-amplified (16 cycles) and sequenced on the same platform.
Preliminary Bioinformatics
Base calling used CASAVA v.1.8.2; adapter trimming with AdapterRemoval v.2.0. Reads aligned to human reference genome GRCh38 using bwa aln v.0.7.18. BAM files were merged, sorted, filtered, duplicates marked with Picard MarkDuplicates (optical distance 12000, tagging all). Duplicates and reads with mapping quality <30 removed. Summary statistics generated with BEDtools and pysam. Human contamination and damage patterns assessed at library level using contamMix, ANGSD, and mapDamage2.0.
Human DNA Analysis
Chromosomal sex inferred from Y/X read ratio. Mitochondrial haplogroups via haplogrep v.2.4.0 and mutserve v.1.3.0. Y haplogroups assigned as in ref. 11.
PCA: pseudohaploid genotypes from random variant selection, projected onto ancient Eurasian reference panel (2,086,279 SNPs, transversions only, MAF>0.1%). Diploid genotypes called with bcftools; imputation with GLIMPSE (coverage ≥0.1×). IBD segments called with IBDseq, genetic clustering by IBD. Runs of homozygosity detected using IBDseq and hapRoH. Biological kinship inferred with KIN35 and validated with IBDseq.
Pathogen Screening
Shotgun reads screened for known human pathogens using pathopipe workflow (KrakenUniq with custom database). For each genus, reads aligned to reference genomes via bowtie2. Taxa assigned based on thresholds: unique reads >30, k-mer rank=1, corrected coverage ratio >0.5, average nucleotide identity >0.97, average soft-clipped bases <8.
Key threshold: Unique reads >30, k-mer rank=1, corrected coverage ratio >0.5, average nucleotide identity >0.97.
Y. pestis DNA Analysis
Reads from plague-positive samples aligned to CO92 reference genome with bowtie2 (parameters: -D 20 -R 3 -N 1 -L 20 -i S,1,0.50 --end-to-end --no-unal). Duplicates and MQ<30 removed. Average coverage calculated with BEDtools genomecov. Plague cases classified: tentative (<0.01×), lower-coverage (0.01–1×), higher-coverage (>1×).
For higher-coverage genomes: variants called with GATK HaplotypeCaller and joint genotyping. Filters applied: GQ≥50, allele balance ≥0.9, depth 3–1000. Low-mapping-quality regions masked. Genotypes converted to multi-FASTA. Published Y. pseudotuberculosis and Y. similis genomes realigned to Y. pestis reference.
Phylogenetic tree reconstructed with RAxML-NG (GTR+G, Y. similis outgroup). Multiple sequence alignment converted to haploid VCF (faToVcf), then mutation-annotated tree built with UShER using Fitch-Sankoff algorithm.
Lower-coverage samples: SNP-only VCF called with bcftools (MQ≥30, baseQ≥30, depth≤1000). Variable sites in masked regions removed. Placed on mutation-annotated tree using UShER. All 8 lower-coverage samples placed at root of Y. pestis clade.
Gubbins v.3.4.3 run on multi-FASTA of LNBA and pre-LNBA genomes plus outgroups. BactDating (100,000 iterations, relaxed gamma model) dated phylogeny accounting for recombination. Median age estimates and 95% CI reported.
Variation Graph Analysis
Pan-genome variation graph of Y. pseudotuberculosis complex built using pggb on all available assemblies (chromosome or complete). Separate graphs for chromosome and plasmids merged with vg tools. Graph indexed; ancient and modern shotgun reads mapped with Giraffe.
Classification: Graph nodes present in Lake Baikal strains but absent in modern plague classified as ancestral, Y. pseudotuberculosis-derived, or Y. similis-derived based on presence/absence in those species.
Age-at-Death Estimation
Non-adults: dental formation and eruption, epiphyseal measurements and union. Adults: pubic symphysis, iliac auricular surface, suture closure. Multiple methods applied depending on preservation.
Radiocarbon Dating
Dates obtained at Oxford Radiocarbon Accelerator Unit. New determinations from Bratskii Kamen, Serovo, Shumilkha reported, with previously published dates. Human dates corrected for freshwater reservoir effect using regression equation for southwest Baikal/Angara. Red deer tooth pendant dates preferred where available.
Bayesian modelling in OxCal 4.4 (single-phase models, uniform boundaries). Summed dates visualized with kernel density estimation.
Reporting Summary
Further research design information available in Nature Portfolio Reporting Summary.