Profile avatar
vanlightly.bsky.social
Researcher, advisor, writer, formal verification eng @ Confluent. Everything data (dist sys, databases, messaging, data eng/analytics). https://jack-vanlightly.com, https://www.hotds.dev Credit: ESO/B. Tafresh
228 posts 3,522 followers 111 following
Regular Contributor
Active Commenter

A new log replication disaggregation survey post is out! The Kafka Replication Protocol: 🔹Separation of control plane from data plane. 🔹Role separation with minimal coupling. 🔹Kafka’s alignment with Paxos roles. jack-vanlightly.com/blog/2025/2/...

Spotify is so bad at recommendations, but ChatGPT is pretty good at it. I give it a song and it lists the different characteristics of the song, then makes a set of recommendations based on those different characteristics.

The first post in the survey of disaggregated log replication systems is out! It looks at Neon's serverless Postgres write-path, which weaves consensus from heterogeneous components, based on MultiPaxos. jack-vanlightly.com/blog/2025/2/...

I updated my How to Disaggregate a Log Replication Protocol to include "separating ordering from IO". Basically I couldn't ignore CORFU as a way of separating responsibilities! So now we have A-F of ways of breaking apart the monolith. jack-vanlightly.com/blog/2025/2/...

Table virtualization, stream-table materialization redefine how we think about data sharing, composability, and interoperability. The OTFs (Iceberg, Delta Lake), metadata separation, & cloud storage are paving the way for modular, composable data platforms. jack-vanlightly.com/blog/2025/2/...

So many databases claim to be compatible with #Postgres these days, but what does that really mean? @pgdba.bsky.social tries to provide an objective answer with the Postgres Compatibility Index, which verifies compatibility in a wide range of criteria. Loving to see this effort!

The latest Humans of the Data Sphere is out, with issue #8! Additional topics in this post include systems correctness practices at AWS, Datadog's Husky compaction and solving for the distributed case first in ambitious systems projects. www.hotds.dev/p/humans-of-...

This reminds me of Paxos vs Raft. Paxos formalized the responsibilities of reaching consensus and acting on the agreed values into *distinct roles* (proposer, acceptor, learner). These roles can be put in a monolith or distributed. The basis for the Paxos family of protocols are these roles.

Confluent + Databricks next-level partnership 💪 Bi-directional flow between Confluent and Databricks. Kafka topics appearing as Delta tables in Databricks. Delta tables appearing as Kafka topics in Confluent. Simply amazing.

New dist sys post on log replication! In this one, I classify five approaches to disaggregating log replication protocols—laying the groundwork for a survey of real-world systems through the lens of disaggregation. jack-vanlightly.com/blog/2025/2/...

Yesterday, I covered Virtual Consensus—today, let’s dive deeper into a crucial distinction: 👉 Failure-free ordering vs. Fault-tolerant consensus Decoupling these concepts can change how we think about consensus protocols. jack-vanlightly.com/blog/2025/2/...

New distributed systems protocol write-up! This write-up dives into the Virtual Consensus in Delos paper and why it makes sense as the default log replication protocol in the era of object storage and hybrid environments. jack-vanlightly.com/blog/2025/2/...

Speculation is growing that Snowflake is planning to acquire Redpanda—but why? What justifies the rumored high price tag? When you consider market trends, the Snowflake vs. Databricks rivalry, and the AI shift, the rationale starts to become clear. Here’s my take. jack-vanlightly.com/blog/2025/2/...

Humans of the Data Sphere issue #7 is out. Obviously, DeepSeek happened but there have been plenty of other conversations happening. In this issue we also look at well and ill-conditioned APIs and the challenge of contextual data quality in AI-powered data pipelines. www.hotds.dev/p/humans-of-...

I read this blogpost 3 times, not interested in the investment aspect, but there's so many valuable insights in here on current development of AI. Also the writing alone deserves its own separate praise. youtubetranscriptoptimizer.com/blog/05_the_...

I had RSI in my wrists so bad 20 years ago (too much Counterstrike) that I resorted to typing using pencils with rubbers on the end. I would basically jab the keys and mouse buttons, holding the pencils in my fists. It looked ridiculous but it allowed me to carry on working.

AI-related stocks are selling off because DeepSeek’s new model was cheaper to train and has lower inference costs. I suppose investors see that and see a threat to the AI industry. But they’ve got it backwards...

Regarding Restate and its distributed log, many people talk about Delos but Apache BookKeeper is also highly relevant/similar, so I like to remind people that it also exists! I've written extensively about how BookKeeper works if you're interested: 1\ medium.com/splunk-maas/...

Great stuff. I'm watching the durable execution space closely and personally I'm quite bullish on it. I'll be writing my own thoughts on durable execution soon.

Over the last week I've been working on finishing the formal verification of Kafka's KRaft with pre-vote and reconfiguration. The protocol is looking good from a design perspective. The team use deterministic simulation testing to catch implementation bugs. Defensive in depth!

The pace of AI advancement is not slowing down, and now AI agents are breaking onto the scene, leveraging ever more powerful models to interact with the real world. In this post, I cover some of the voices talking about AI agents and the challenges ahead. jack-vanlightly.com/blog/2025/1/...

My top favorite quotes in issue #6 of HOTDS are: 1) Marc Brooker's blog post: Snapshot Isolation vs Serializability.