Announcing myloasm, a new long-read (ONT R10/PacBio) metagenome assembler that I've been working on during my postdoc in the Heng Li lab (@lh3lh3.bsky.social). myloasm-docs.github.io - ThreadSky

jimshaw.bsky.social • 4 days ago

Announcing myloasm, a new long-read (ONT R10/PacBio) metagenome assembler that I've been working on during my postdoc in the Heng Li lab (@lh3lh3.bsky.social).

https://myloasm-docs.github.io/

Comments

zaminiqbal.bsky.social•4 days ago

congrats!

xabivc.bsky.social•3 days ago

tested on simple metaG with a pesky 34Mbp tangle with flye (could only partially resolve changing params):
Split 1 circ contig but resolved the mess into individual circ contigs :D

I miss having a proper gfa with seqs though... having it or a way to generate one would be good addition imho

jimshaw.bsky.social•3 days ago

Amazing to hear! :)

Is there a reason people like gfas with sequences included? Can definitely make that a postprocessing option.

xabivc.bsky.social•3 days ago

Helps when working with gfas for qc, e.g. with Bandage, comparing/matching MAGs...

gaetanbenoit.bsky.social•4 days ago

Great work, and can't wait to read your preprint! I also had the idea of snp-mers when developing the minimizer-space correction method, it's really cool that you manage to use them :)

jimshaw.bsky.social•4 days ago

Thanks Gaetan!

FYI: we were very impressed with metaMDBG. On some datasets, it's kind of miraculous how metaMDBG can disentangle these assembly graphs.

acritschristoph.bsky.social•3 days ago

Something I'm very intrigued by is your map_to_unitigs output.

Conceptually, I've always been very unsatisfied with the procedure of (a) assemble and then (b) map reads to assembly. Instead, I've always wished the assembler shows its work by showing where *it* thinks the reads should go

acritschristoph.bsky.social•3 days ago

If the assembler thinks it, it should prove it!

And when checking with a read mapper, it becomes unclear, are issues due to the assembler or due to the mapper?

This also saves compute time (you don't have to do your within sample mapping)

Any chance you can make it an option to output BAM?

jimshaw.bsky.social•2 days ago

Hey Alex, this is something I thought about quite a bit too.

1. Overlap assemblers like hifiasm/myloasm directly show which reads construct the contigs in their gfa files. This is a nice perk over DBG methods.

But this is a sparse set of reads because reads contained in other reads are removed

jimshaw.bsky.social•2 days ago

2. These contained reads must be mapped back by _some_ aligner.

Myloasm has an internal aligner that is inspired by minimap2, hence the map_to_unitigs file. But this is done for polishing purposes --- I.e. the reads are mapped to unpolished contigs. But like any mapper, this isn't 100%

jimshaw.bsky.social•2 days ago

3. It's not possible to turn it into a truthful BAM because the polished contigs are not mapped, only unpolished, so the coords/CIGAR are off... but only slightly

jimshaw.bsky.social•4 days ago

The main idea behind this project is that even simplex nanopore reads are accurate nowadays.

We can utilize new algorithmic techniques that take advantage of higher baseline accuracies, allowing myloasm to produce high-resolution assemblies.

usadellab.bsky.social•3 days ago

Fantastic metagenomics is really needs some new developments in long read data. We will definitely try it out. Are you planning to allow mixing ONT and Pacbio data (sorry if this is in already I only looked at usage and glimpsed at cli rs).

jimshaw.bsky.social•3 days ago

Thanks! We probably won't specially support mixing since the design of the algo is pretty flexible so it should work okay even if you do mix

usadellab.bsky.social•3 days ago

Great thanks - we will give it a shot!

jimshaw.bsky.social•4 days ago

Preprint will come in a couple of months. For a brief algorithmic overview and preliminary results, see https://myloasm-docs.github.io/results/

The main strength of myloasm: it seems like it can assemble more circularized complete genomes than before, and on diverse metagenomes.

jimshaw.bsky.social•4 days ago

It seems we can assemble (reasonably simple populations) of co-existing strains with ONT data now.

We assembled 6 single-contig Prevotella copri genomes of > 97% ANI for one metagenome. 4 of them were circular.

(The largest metaFlye P. copri contig was 13.4% complete)

jimshaw.bsky.social•4 days ago

Limitations of myloasm are that it takes slightly more memory and, like other assemblers, can occasionally produce errors.

We try to be upfront about this and discuss it here https://myloasm-docs.github.io/qc/. We provide additional info for plotting and curation too.

stevenjrobbins.bsky.social•4 days ago

Great to see this in the wild, Jim! Still have to ask, because I will be asked myself, not because I want to be pedantic, but can you quantify how Myloasm stacks up in terms of the benchmarks discussed here vs metaFlye and HifiASM, like clipped reads, chimeras, SNPs?

https://www.biorxiv.org/content/10.1101/2025.04.22.649783v1

jimshaw.bsky.social•4 days ago

We're benchmarking stuff right now: clipped reads are lower probably because it's an overlap assembler.

Chimeras are hard to quantify; we use checkm2 as a proxy. Based on results (see docs), we probably get more chimeras but also more complete contigs. Chimeras are the toughest part of assembly.

jimshaw.bsky.social•4 days ago

Just want to end with a note thanking those who make sequencing data available.

Metagenomic R10.4 data is scarce (for now), but we're grateful for:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4960739 (Minich et al.)

https://www.biorxiv.org/content/10.1101/2024.09.30.615745v1 (Kiguchi et al.)

https://www.biorxiv.org/content/10.1101/2024.12.19.629313v1 (Sereika et al.)

eventhandler.bsky.social•4 days ago

This is very exciting, especially the demonstrated potential to resolve such complex population heterogeneity!
In case of interest, we generated a bunch of metagenomic R10.4 data (longitudinal patient gut samples) here https://www.biorxiv.org/content/10.1101/2025.03.16.643550v1 — would be interested to run myloasm on these

jjminich.bsky.social•3 days ago

Happy you could use the dataset -

Comments

Posting Rules

Reply