stevenjrobbins.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Only annoying thing is if you terminate at stop X and return via a different stop Y, it charges you the maximum, as if you road all over the city. So if you’re commuting, cheaper to return via the same stop in which you terminated.

submitted 15 hours ago

comment in response to post

So a ticket is unnecessary. Can only speak directly to London, though.

submitted 15 hours ago

comment in response to post

In London you literally just tap on and off the platform with your credit card for normal commuter trains and it records where you tap on and off and charges accordingly. I thought it was like that in most of the UK.

submitted 15 hours ago

comment in response to post

It seems like today’s Resistance History talk from Tad Stoermer (Johns Hopkins) is instructive about how the Nazis first used laws about citizenship, then & deportation as the predicate steps to genocide in concentration camps. It’s about messaging & dehumanizing over time. Just like Trump

submitted 2 days ago

comment in response to post

CRISPR spacers are known to assemble poorly from metaG, so you need to look at the reads. Here we built a high-throughput tool (code.jgi.doe.gov/SRoux/spacerextractor) and mined ~800 million spacers from SRA metaG reads, all linked to a repeat and a sample (with taxonomy and metadata when possible).

submitted 6 days ago

comment in response to post

Is the other one Genomad? At this point, I feel like if one wanted to be the most confident in their viral set, one would just run Genomad and FDR correction and be done with it.

submitted 8 days ago

comment in response to post

You’re 100% right. 🙂 In that, I’m not criticizing the CheckV authors. It’s just a really hard problem for the reasons you point out. I’m a little surprised though it doesn’t seem to filter out contigs that have a higher number of host than viral markers, though, as a broad sanity check.

submitted 8 days ago

comment in response to post

Sub-thread 4: In constructing the viral (vMAG) component of the GBR-MGD, we find that some commonly used tools for viral metagenomics are unsuitable for use on long-read metagenome assemblies. bsky.app/profile/stev...

submitted 9 days ago

comment in response to post

We saw this same issue with ML-based "deep" classifiers for plasmids. DeepPlasmid, PlasClass, and Mobile-OG-db showed similar results when plotting Genomad's plasmid, chromosomal, viral markers. Contigs pred by these tools showed higher enrichment in chromosomal and viral markers than other tools.

submitted 9 days ago

comment in response to post

It surprised us how fragile this sort of viral pipeline is and how different CheckV is conceptually to CheckM. You can use CheckM to tell you where something is a good pMAG, you really can't use CheckV like that. It can only give you a completeness estimate, trusting that the contig is viral.

submitted 9 days ago

comment in response to post

Interesting! Would you mind sharing your threshold? We've noticed that long reads break a few machine learning based tools in this same way, for plasmids as well. Seems like if these tools are to be useful, new benchmarking has to be done to establish parameters that make sense for long contigs.

submitted 9 days ago

comment in response to post

I've noticed the same with DVF and CheckV for long contigs. My way of dealing with it was to use a very high score threshold for DVF to remove non-viral contigs. Of course, shorter true viral contigs are also lost, but I'm using multiple predictors, so hopefully that compensates for their loss

submitted 9 days ago

comment in response to post

So we moved forward with the 5 remaining viral identification tools to create the GBR-MGD vMAGs and recommend avoiding DeepVirFinder for long-reads. We advocate taking care to investigate machine learning-based tools when used on data types they're not benchmarked on--here, long read vs short read.

submitted 9 days ago

comment in response to post

What's also interesting is that if you look at the same plot for Illumina-only metagenomes, this issue becomes much less pronounced, simply because the contigs are much shorter and do not often reach the range of erroneous assignment. So on short-read metagenomes, DeepVirFinder/CheckV may be fine.

submitted 9 days ago

comment in response to post

but it shows that if you fed this high proportion of erroneously assigned "viral" contigs to CheckV, it would tell you that you have a lot of high quality viruses that aren't. This result surprised us and i'm happy to take feedback on it.

submitted 9 days ago

comment in response to post

You can see that, as contig length increases, DeepVirFinder's chance of designating a non-viral contig as viral goes up, as does the chance of CheckV assigning a high quality score to the that non-viral long contig. CheckV does not assess contamination, only completeness, so this might be expected,

submitted 9 days ago

comment in response to post

Most meta-omic viral identification tools are tested on short-read metagenomes. We wanted to see if any looked wonky on long-read contigs. Most tools hold up, DeepVirFinder didn't. Plot shows the ratio of CheckV host to viral markers vs contig length for ONT assemblies, colored by CheckV quality.

submitted 9 days ago

comment in response to post

@xrefugee13.bsky.social

submitted 9 days ago

comment in response to post

Heck yeah! I’m gonna shamelessly plug our recent preprint here in the spirit of WYMM. Kind of the first third is everything you’re missing with short reads, at least in seawater. bsky.app/profile/stev...

submitted 10 days ago

comment in response to post

Because traditional short-read platforms struggle with low GC/high strain diversity, NTMR indicator taxa like Pelagibacter/SAR86 would have been invisible to genomic analysis without @nanoporetech.com long reads. The ability to identify interesting biological insights required a comprehensive DB.

submitted 10 days ago

comment in response to post

And indeed, we find that NTMR reefs show statistically less nitrate than Fished reefs. We don’t really understand why, but the establishment of No Take Marine Reserves appears to lead to lower dissolved nitrogen, selecting for streamlined taxa with low GC.

submitted 10 days ago

comment in response to post

But why do indicators of NTMRs seem to always show streamlined genomes? The Giovannoni model would say that streamlining (low GC, smaller genomes) occurs in large population size, low nutrient communities, specifically low nitogen, because G and C bases require more nitrogen. Low GC is efficient.

submitted 10 days ago

comment in response to post

Sub-thread 3: using the GBR-MGD, we identify microbial indicators of reef management status, most of which are from low-GC taxa, “streamlined” taxa that could not be recovered using short-read sequencing. bsky.app/profile/stev...

submitted 11 days ago

comment in response to post

In contrast, indicators of fished reefs all had larger, higher GC genomes than the rest of their phylogenetic group. For example, the genera UBA11663, UBA8752, and species UBA10364 sp003445735 in the class Bacteroidia had genomes with 45-63 GC%, 9-27% higher than the rest of the Bacteroidia.

submitted 11 days ago

comment in response to post

The thing that stuck out about all microbes indicative of NTMRs is that they are exactly the microbes that could not be recovered using traditional short reads—e.g. Pelagibacteraceae, SAR86, HIMB59, the phylum Marinisomatota. All with low-GC, streamlined genomes.

submitted 11 days ago

comment in response to post

By mapping the metagenomic reads against the GBR-MGD MAGs to calculate relative abundance, we identify microbial taxa that can predict whether a reef is an NTMR or fished reef with 71% accuracy. Note that we didn’t have an extensive sample catalog, this number would likely rise with more samples.

submitted 11 days ago