sinamajidian.bsky.social
How are species compared to one another across different genomic regions? Postdoc at Langmead Lab, Johns Hopkins | Comparative #genomics | Phylogeny and indexing at scale | Formerly at UNIL/SIB/WUR | sinamajidian.github.io
47 posts
1,134 followers
1,960 following
Prolific Poster
comment in response to
post
@jimshaw.bsky.social
comment in response to
post
For more context: Logan is a collection of all public sequencing data (until end of 2023) assembled into contigs. It is freely hosted on the cloud, and contains hundreds of terabytes of valuable genomic data: github.com/IndexThePlan...
comment in response to
post
7/ To detect and avoid homology based leakage, we created hashFrag, which leverages BLAST to identify similar sequences and then either (1) filter out the leaked sequences from the test set, (2) stratify the test set into subgroups by distance, or (3) create leakage-free train-test splits.
comment in response to
post
The input to CASTER is a multiple genome alignment. So the sequencing reads should be first assembled, with quite high coverage. Then, all genomes should be aligned against each other using tools like ProgressiveCactus/Mauve or SibeliaZ. Wouldn't be possible to get a good assembly with 10x...
comment in response to
post
A core idea for a pan-genome is to figure out what is shared and what is unique across all these genomes. If you start with linear genomes and "glue together" any sequences that are the same (or highly similar) you will quickly enter into a "pan-genome graph" , e.g. www.nature.com/articles/s41...
comment in response to
post
example 1 is an alternative splicing: exon 10 in dark blue can be entirely included or excluded from the component traversal, depending on the chosen path. I could confirm this by mapping the alternative node with blat on the refs.