Profile avatar
pashadag.bsky.social
Algorithmic Bioinformatics Researcher and Teacher. Posts about research results and educational/mentorship topics (for details, see http://bit.ly/380vX22).
51 posts 1,773 followers 157 following
Prolific Poster
Conversation Starter

Slides from my talk (with @kamilsjaron.bsky.social) on an history of k-mers in bioinformatics: rayan.chikhi.name/pdf/2025-kme...

Happy to share that CloseRead paper is out at Genome Biology! CloseRead reports intuitive metrics for assessing genome assembly quality based on read alignments. We benchmarked it on IG loci which are known to be hotspots for SVs and assembly errors. genomebiology.biomedcentral.com/articles/10....

A new preprint on indexing pangenome graphs using an FM-index of the haplotypes and a tag array. Joint work with Parsa Eskandar and @benedictpaten.bsky.social.

@alexanderjpetri.bsky.social's isONclust3 algorithm is now published doi.org/10.1093/bioi.... isONclust3 performs de novo clustering of long-read cDNA sequencing data. A key step in reference-free transcriptome analysis.

The deadline for WABI 2025 has been extended (but is still rapidly approaching) wabiconf.github.io/2025/ * abstract deadline: May 12 (AoE) * paper deadline: May 15 (AoE) Consider submitting your exciting algorithmic bioinformatics work to the WABI conference!

IG loci of widely used lymphobastoid cell lines contain somatic VDJ recombinations. Our novel toolkit, IGLoo, detects somatic events and removes them from a library thus enabling accurate reassembly of these regions. @maojanlin.bsky.social @benlangmead.bsky.social www.cell.com/cell-reports...

Nice article on (frustrating) life as performance engineer. purplesyringa.moe/blog/why-per...

The list of proceedings papers for #ISMB2025 is up on the website www.iscb.org/ismbeccb2025... ! It's an exciting collection of papers, as always :).

We finally concluded the meeting. Thanks to all attendees for their scientific contributions and for traveling (near or far) to the meeting! Thanks to the local organizers for the infrastructure and catering, and thanks to the co-organizers @yaronorenstein.bsky.social @camillemrcht.bsky.social!

The primate T2T paper is out at Nature! Our team led a comparative analysis of adaptive immune loci across great apes and revealed that these rapidly evolving regions harbor various SVs and species-specific genes. Check out all exciting stories in the peer-reviewed version.

We are hiring PhD students in Computational Mathematics and Mathematics at Stockholm University in various subjects: su.varbi.com/en/what:job/... Application deadline: April 22. (1/3)

PSA: if you are applying to a CS grad program & a faculty member is asking for a verbal commitment before an official offer letter, this is a HUGE 🚩! There is an April 15th resolution to avoid this behavior (cgsnet.org/resources/fo...). I'd urge you to avoid those departments!

Passionate about open science and FAIR data principles for microbiome data? Consider becoming an NMDC Ambassador next year! I was an Ambassador in 2023 and happy to answer any questions about the experience!

🎶 Last Christmas,i gave you m̶y̶ ̶h̶e̶a̶r̶t̶ 40 pages of delicious combinatorics 🎶 Choose any word W of size m. How many words of size k>m admit W as their smallest lexicographical subword of size m ? Find out in my latest preprint!

Over and over again, I come to the conclusion that the process of writing comes down to finding the most intuitive topological order through a high-dimentional space of results. Can be: sort by time, sort by paper, sort by topic, sort by previous work vs new stuff, first simple, then in-depth...

Since indirects are in the news again, and everybody and their dog has an opinion on how much "research overhead" should cost, here's an excellent book that explains where exactly the money goes. escholarship.org/uc/item/59p1...

Finally; the preprint on Cuttlefish 3 is available! This is the most recent in a long line of work led by Jamshed Khan, a recent PhD graduate from my lab. Cuttlefish 3 further improves the efficiency of Cuttlefish 2, while adding support for *colored* compacted de Bruijn graphs. 1/x

0/ Essential reading for anyone training or using sequence-function models trained on genomic sequences! 🚨 In our new preprint, we explore the ways homology within genomes can cause leakage when training sequence-based models and ways to prevent it

I'm glad to announce that the simd-minimizers library is out! 🧬🖥️ @curiouscoding.nl and I have been optimizing the computation of minimizers down to the smallest detail. The result is an order of magnitude faster than existing methods ; processing an entire human genome takes only 4s on my laptop! 🧵

🚨 Deadline Extended: Call for Papers - RECOMB-seq 2025 🚨 Great news! You now have an extra week to submit your work. 🎉 Updated Deadlines: 🔹 Abstract Registration: Jan 31, 2025 🔹 Submission: Feb 7, 2025 recomb-seq.github.io/papers/

How helpful is a de Bruijn graph for visualizing alternative RNA variants? Here I requested our graph visualization tool Vizitig to show me the CIC human gene in my data (3 RNA-seq samples). I connected sequences to known genes using an annotation and colored CIC's exons in different tints.

My PhD adviser Liliana Florea has developed a Coursera course "Bioinformatics Methods for Transcrptomics". A great resource to learn cutting-edge short- and long-read RNA-seq data analysis techniques: www.coursera.org/learn/bioinf...

Yes, they can hallucinate papers that don't exist, discuss results that seem to be imaginary, and can be confusing and inconsistent. But talking to tenured professors may still be helpful

I understand the reasons that people are codifying responsibilities and expectations for academic positions, but I can't help but feel like all this formalization risks extinguishing the things that make academia special to begin with.

I wonder if it is better to measure productivity techniques less by how much time they save but instead by how much more time you spend doing things you want. I suspect many common productivity frameworks like inbox zero et al would not fare well by this metric.

62 years later, the book that changed everything is still a must read. Kuhn distinguished between 'normal science' and 'revolutionary science', where in the former we work within the paradigm but if anomalies add up, a new paradigm emerges in a period of revolutionary science.

Hi all, here's my passion project for December - a Web site for ~realtime DNA sample screening/composition analysis. Let me know what you think!

Check out the thread by @elisarosix.bsky.social describing the latest efforts of the TR-IG Nomenclature Review Committee. TR-IG was a huge part of my work in 2024, and I am proud to be on the team that develops robust and transparent policies for annotation and naming adaptive immune genes.

Merry lexicographic minimiz... Christmas arxiv.org/abs/2412.17492

Do you want to learn systematic ways in which you can revise your research papers? I've posted a short collection of 4 lectures youtube.com/playlist?lis... 1/n

We're thrilled to introduce LexicMap v0.5.0🎉 It's more accurate and slightly faster! LexicMap has helped some scientists align genes and plasmids in AllTheBacteria and GenBank, each has > 2 million prokaryotic genomes! We'll provide an index for ATB on AWS later. github.com/shenwei356/L...

I wrote about the backstory of our recently published metagenomic profiler called sylph (www.nature.com/articles/s41...), partially to celebrate the new migration to bluesky Check out the blog here: jim-shaw-bluenote.github.io/blog/2024/de... -- and I apologize in advance for the lack of brevity :)

Jim notes in this blog that the first bioinformatics paper he ever read was Mash. I definitely had an agenda when writing that paper: I thought min-hashing was awesome and really wanted to teach other people about it. So, it's super gratifying to learn stars like Jim read it and were inspired!

Very cool work from Yang Lu et al. demonstrating miscalibration of BLASTP’s E-values and generating well-calibrated values via a knockoff-based approach (cc @mikelove.bsky.social) - academic.oup.com/bioinformati...! More analyses could benefit from knockoff-based approaches.

PSA: when listing the CPU you used to run experiments for your paper, include *ALL* of the below: - model name - base clock speed - L1, L2 cache size PER CORE - L3 cache size (total) - Number of sockets (if >1), cores, and threads - Whether hyperthreading was on? - Whether turboboost was disabled?