Profile avatar
stephenturner.us
Data scientist, bioinformatician, #Rstats enthusiast, dad, runner, guitar noise-maker. Head of Genomic Strategy at Colossal Biosciences 🦣🧬🖥️ Views my own. Web: https://stephenturner.us/ Blog (Paired Ends): https://blog.stephenturner.us
1,490 posts 7,909 followers 350 following
Prolific Poster

New in @natrevbiodiv.bsky.social - Harnessing AI to fill global shortfalls in biodiversity knowledge https://www.nature.com/articles/s44358-025-00022-3 (read free: https://rdcu.be/ebqdD) 🧬🖥️🧪

A compendium of human gene functions derived from evolutionary modelling https://www.nature.com/articles/s41586-025-08592-0 🧬🖥️🧪

Oh, Slack is down. Time to do real work.

AlphaFold as a Prior: Experimental Structure Determination Conditioned on a Pretrained Neural Network https://www.biorxiv.org/content/10.1101/2025.02.18.638828v1 🧬🖥️🧪

Watch parties, real time support from Seqera and Nextflow ambassadors, oh my 🤩

An atlas of transcription initiation reveals regulatory principles of gene and transposable element expression in early mammalian development https://www.cell.com/cell/fulltext/S0092-8674(24)01426-0 🧬🖥️🧪 https://github.com/meoomen/Smartseq5

Do protein language models learn phylogeny? https://academic.oup.com/bib/article/26/1/bbaf047/8030578 🧬🖥️🧪 https://github.com/santule/pLMEvo

Hey, that’s me 🤩

gcSV: a unified framework for comprehensive structural variant detection https://www.biorxiv.org/content/10.1101/2025.02.10.637589v1 🧬🖥️🧪 https://github.com/hitbc/gcSV

Gotta pay for that AI nobody wants somehow

Applying the FAIR Principles to computational workflows https://www.nature.com/articles/s41597-025-04451-9 🧬🖥️🧪 Nextflow = batteries included 🔋

Sequencing by Expansion (SBX) — a novel, high-throughput single-molecule sequencing technology https://www.biorxiv.org/content/10.1101/2025.02.19.639056v1

Whole-genome sequencing analysis identifies rare, large-effect noncoding variants and regulatory regions associated with circulating protein levels https://www.nature.com/articles/s41588-025-02095-4 🧬🖥️🧪 https://github.com/ExeterGenetics/WGS_50k_Proteins_2024/

Tissue reassembly with generative AI https://www.biorxiv.org/content/10.1101/2025.02.13.638045v1 🧬🖥️🧪 https://github.com/mlbio-epfl/LUNA

Secure and federated GWAS for biobank-scale datasets https://www.nature.com/articles/s41588-025-02109-1 🧬🖥️🧪 https://github.com/hhcho/sfgwas

nf-core/variantbenchmarking v1.0.0: Nextflow pipeline to evaluate and validate the accuracy of variant calling methods https://github.com/nf-core/variantbenchmarking 🧬🖥️🧪

EvANI benchmarking workflow for evolutionary distance estimation https://www.biorxiv.org/content/10.1101/2025.02.23.639716v1 🧬🖥️🧪 https://github.com/sinamajidian/EvANI

Small, Open-Source Text-Embedding Models as Substitutes to OpenAI Models for Gene Analysis https://www.biorxiv.org/content/10.1101/2025.02.15.638462v1 🧬🖥️🧪 https://github.com/RavenGan/FinetuneEmbed

mettannotator: a comprehensive and scalable Nextflow annotation pipeline for prokaryotic assemblies https://academic.oup.com/bioinformatics/article/41/2/btaf037/7978911 🧬🖥️🧪 https://github.com/EBI-Metagenomics/mettannotator

Intrinsically disordered regions as facilitators of the transcription factor target search https://www.nature.com/articles/s41576-025-00816-3 (read free: https://rdcu.be/eaMBK)

We're looking for a next-gen sequencing engineer to help run our sequencing core 🧬🦣🦤 Position is onsite in Dallas TX colossal.com/careers/?gh_...

@lionelhenry.bsky.social and I are so excited to finally announce Air - an extremely fast R code formatter! 🎉 With Air, you'll never need to worry about styling your #rstats code ever again. All you need to do is save, and Air takes care of the rest. www.tidyverse.org/blog/2025/02...

Results section 🍅

panHiTE: a comprehensive and accurate pipeline for TE detection in large-scale population genomes https://www.biorxiv.org/content/10.1101/2025.02.15.638472v1 🧬🖥️🧪 nextflow: https://github.com/CSU-KangHu/HiTE tutorial https://github.com/CSU-KangHu/HiTE/wiki/panHiTE-tutorial

In silico generation of synthetic cancer genomes using generative AI https://www.biorxiv.org/content/10.1101/2024.10.17.618896v2 🧬🖥️🧪 https://github.com/LincolnSteinLab/oncoGAN

CHOPOFF: symbolic alignments enable fast and sensitive CRISPR off-target detection https://www.biorxiv.org/content/10.1101/2025.01.06.603201v2 🧬🖥️🧪 Julia code https://github.com/JokingHero/CHOPOFF.jl #Rstats package wrapper https://github.com/JokingHero/crisprCHOPOFF

The Practical Guide to Biotech Partnerships

Weekly Recap (Feb 2025, part 2): Predicting RNA-seq coverage from DNA sequence with Borzoi, doing it faster with Flashzoi, phylogenetic inference from structure, prioritization in GWAS vs rare variants, LLMs for patient interactions doi.org/10.59350/zga...

AI for modelling infectious disease epidemics https://www.nature.com/articles/s41586-024-08564-w (read free: https://rdcu.be/eaA6O) 🧬🖥️🧪

Why does @weare.rladies.org have less than 1000 followers? We can do better than this #rstats folks. It is a terrific account to follow because every week not only we get to know a new member of the #RLadies community but we also get to read their R favorites, tips, and jokes.

Genomic data sharing: you don’t know what you’ve got (till it’s gone) https://www.nature.com/articles/s41576-025-00820-7 (read free: https://rdcu.be/d9DqG)

Accelerating scientific breakthroughs with an AI co-scientist https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

MARTi: a real-time analysis and visualisation tool for nanopore metagenomics 🧬🖥️🧪 Preprint https://www.biorxiv.org/content/10.1101/2025.02.14.638261v1 Source https://github.com/richardmleggett/MARTi Docs https://marti.readthedocs.io Demo app https://marti.cyverseuk.org/

Automated Hypothesis Validation with Agentic Sequential Falsifications https://arxiv.org/abs/2502.09858 🧬🖥️🧪 https://github.com/snap-stanford/POPPER "Popper validates a hypothesis using LLM agents that design and execute falsification experiments targeting its measurable implications"

Evo 2 Can Design Entire Genomes https://www.asimov.press/p/evo-2 🧬🖥️🧪

A few years ago, we started asking if the adaptation of Iberian wolves to anthropogenic landscapes could be associated with ancient introgression from domestic dogs. In our recently accepted paper at @genomeresearch.bsky.social we provide new insights: genome.cshlp.org/cgi/content/... 🐺🧬🐶

Want to learn how to create a simple and delightful Quarto dashboard with Python? In this video, I show how to make one from scratch in Positron using water insecurity data from the #TidyTuesday project 🐍 Check it out here! youtu.be/uLGe9zuuNl0x #Quarto #Python

Perplexity's Deep Research is actually really good. And free (up to 3 deep research reports per day without paying). www.perplexity.ai/hub/blog/int...

Why is it so hard to rewrite a genome?

PGSXplorer: an integrated nextflow pipeline for comprehensive quality control and polygenic score model development https://peerj.com/articles/18973/ 🧬🖥️🧪 https://github.com/tutkuyaras/PGSXplorer

PyOrthoANI, https://github.com/althonos/orthoani; PyFastANI, https://github.com/althonos/pyfastani; Pyskani, https://github.com/althonos/pyskani: a suite of Python libraries for computation of average nucleotide identity https://www.biorxiv.org/content/10.1101/2025.02.13.638148v1 🧬🖥️🧪

In Brief: Real-life Jurassic Park startup Colossal BioSciences has raised $200 million to bring back three extinct animals: the woolly mammoth, the Tasmanian tiger and the dodo bird www.nature.com/articles/s41... rdcu.be/eaosC

Selective State Space Models Outperform Transformers at Predicting RNA-Seq Read Coverage https://www.biorxiv.org/content/10.1101/2025.02.13.638190v1 🧬🖥️🧪 https://github.com/ihh/bilby

Accurate Somatic SV detection via sequence graph model-based local pan-genome optimization https://www.biorxiv.org/content/10.1101/2025.02.11.636543v1 🧬🖥️🧪 https://github.com/Goatofmountain/TDScope

Beyond the Hype: The Complexity of Automated Cell Type Annotations with GPT-4 https://www.biorxiv.org/content/10.1101/2025.02.11.637659v2 🧬🖥️🧪 https://github.com/soulbio/cell_type_annotation

my thoughts on the destructive chaos targeting university research right now. Btw I've left @forbes and moved to @SubstackInc, where all my content is free: stevensalzberg.substack.com/p/the-pointl...

ActSeek: Fast and accurate search algorithm of active sites in Alphafold database https://www.biorxiv.org/content/10.1101/2025.02.11.637678v1 🧬🖥️🧪 https://github.com/vttresearch/ActSeek (noncommercial)