This builds upon the work of tskit community - there is an advanced manuscript in the pipeline on how to build relationship/relatedness matrix from tree sequence encoding of an ARG!
Comments
Log in with your Bluesky account to leave a comment
After building the ARG, we demonstrated it captures biological signals using genealogical nearest neighbors (GNN) - it clearly distinguished indica and japonica rice subspecies and effectively represented population structure.
Local trees from two genomic regions showed distinct patterns: (A) revealed deep separation between indica and japonica, linked to the DST gene associated with panicle length in japonica. The (B) region segregated in both subspecies and was linked to panicle traits in both.
The ARG encoded genomic data more efficiently than the standard VCF: the tree sequence file for all chromosomes was 62 MB, compared to 228 MB for the VCF—nearly four times smaller!
The age distributions for (A) nodes (ancestors), (B) mutations, and (C) SNP sites (i.e., first mutation at each site) were heavily right-skewed towards the present (as expected).
(A) The standard site-based relationship matrix (SRM, VanRaden’s) and (B) the ARG-based branch relationship matrix (BRM) revealed similar population structure, with highly correlated (C) diagonal and (D) off-diagonal elements, though on different scales.
Comments