Announcing myloasm, a new long-read (ONT R10/PacBio) metagenome assembler that I've been working on during my postdoc in the Heng Li lab (@lh3lh3.bsky.social).
https://myloasm-docs.github.io/
https://myloasm-docs.github.io/
Comments
Split 1 circ contig but resolved the mess into individual circ contigs :D
I miss having a proper gfa with seqs though... having it or a way to generate one would be good addition imho
Is there a reason people like gfas with sequences included? Can definitely make that a postprocessing option.
FYI: we were very impressed with metaMDBG. On some datasets, it's kind of miraculous how metaMDBG can disentangle these assembly graphs.
Conceptually, I've always been very unsatisfied with the procedure of (a) assemble and then (b) map reads to assembly. Instead, I've always wished the assembler shows its work by showing where *it* thinks the reads should go
And when checking with a read mapper, it becomes unclear, are issues due to the assembler or due to the mapper?
This also saves compute time (you don't have to do your within sample mapping)
Any chance you can make it an option to output BAM?
1. Overlap assemblers like hifiasm/myloasm directly show which reads construct the contigs in their gfa files. This is a nice perk over DBG methods.
But this is a sparse set of reads because reads contained in other reads are removed
Myloasm has an internal aligner that is inspired by minimap2, hence the map_to_unitigs file. But this is done for polishing purposes --- I.e. the reads are mapped to unpolished contigs. But like any mapper, this isn't 100%
We can utilize new algorithmic techniques that take advantage of higher baseline accuracies, allowing myloasm to produce high-resolution assemblies.
The main strength of myloasm: it seems like it can assemble more circularized complete genomes than before, and on diverse metagenomes.
We assembled 6 single-contig Prevotella copri genomes of > 97% ANI for one metagenome. 4 of them were circular.
(The largest metaFlye P. copri contig was 13.4% complete)
We try to be upfront about this and discuss it here https://myloasm-docs.github.io/qc/. We provide additional info for plotting and curation too.
https://www.biorxiv.org/content/10.1101/2025.04.22.649783v1
Chimeras are hard to quantify; we use checkm2 as a proxy. Based on results (see docs), we probably get more chimeras but also more complete contigs. Chimeras are the toughest part of assembly.
Metagenomic R10.4 data is scarce (for now), but we're grateful for:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4960739 (Minich et al.)
https://www.biorxiv.org/content/10.1101/2024.09.30.615745v1 (Kiguchi et al.)
https://www.biorxiv.org/content/10.1101/2024.12.19.629313v1 (Sereika et al.)
In case of interest, we generated a bunch of metagenomic R10.4 data (longitudinal patient gut samples) here https://www.biorxiv.org/content/10.1101/2025.03.16.643550v1 — would be interested to run myloasm on these