petereliaskraft.net
Co-founder @ http://dbos.dev • Stanford PhD • Database Geek • Building https://github.com/dbos-inc/dbos-transact-py
26 posts
390 followers
110 following
Prolific Poster
Conversation Starter
comment in response to
post
Python docs: docs.dbos.dev/python/tutor...
TypeScript docs: docs.dbos.dev/typescript/t...
comment in response to
post
However, this saved state also enables replay debugging! The debugger uses the checkpointed information to re-execute the workflow, reconstructing its state at each step of execution. Then, you can single-step through the workflow and see exactly what it did.
comment in response to
post
Under the hood, this works because DBOS checkpoints the execution state of your workflows (what steps have completed and what their outputs were) to Postgres. It does this for reliability, using the checkpoints to resume interrupted workflows from their last completed steps.
comment in response to
post
There’s a common tendency when building new infra/devtools to want to rebuild everything from scratch, but leveraging a popular and proven foundation often makes everything work better.
comment in response to
post
- Extensibility: We’ve seen some users build additional tooling for their app that accesses the DBOS system tables directly, usually for a very specific use-case that our APIs don’t cover. They can do that because it’s just Postgres, and they know Postgres.
comment in response to
post
- A huge ecosystem: There are countless Postgres providers and tools, and by building on Postgres we support them all out-of-the-box. Just to give a few examples, we see users building on Supabase, RDS, or Timescale, managing vector data with pgvector, and viewing their data with pgAdmin or DBeaver.
comment in response to
post
- Incredible flexibility: We use Postgres to checkpoint workflow state for reliability, but with a slightly different set of SQL queries, we also built observability tools for workflows so you can see what they’re doing in real time.
comment in response to
post
- Built-in reliability and trust: Everyone knows Postgres, it’s been around forever, serious bugs are rare and it doesn’t lose data. By using it, that’s a huge class of problems you just don’t need to worry about.
comment in response to
post
Yeah, methods still have to be idempotent. To be precise, DBOS guarantees steps are tried at least once but are never re-executed after they complete. Some more detail in the docs: docs.dbos.dev/explanations...
comment in response to
post
The big difference is that DBOS is lightweight. It's a library you can install into any program. By contrast, Inngest/Temporal/etc require you to set up an external workflow server to orchestrate your code. More detail in this post: www.dbos.dev/blog/what-is...
comment in response to
post
We wrote about this design pattern, how we implemented it in DBOS, and how you can use it in your applications.
www.dbos.dev/blog/what-is...
comment in response to
post
But how do you do this practically? How do you store a program’s execution state in a database in a way that’s both performant and easy to use, given that neither programming languages nor databases are built for this?
comment in response to
post
If your programs could be as durable as your data already is, a ton of reliability and fault-tolerance problems in apps and microservices would disappear!
comment in response to
post
The core idea is: every serious program makes its data durable, usually by storing it in a database (like Postgres). But no one ever thinks about making programs durable. When you restart your server, your data is safe in the database, but the programs you were running are gone forever.
comment in response to
post
Python guide: docs.dbos.dev/python/progr...
TypeScript guide: docs.dbos.dev/typescript/p...
Please comment, DM me, file an issue, or whatever is best for you–thanks a lot!
comment in response to
post
I think serverless is a great fit for agents (as we've been exploring at @dbos.dev). However, AWS Lambda itself is not great for agents, because it's stateless and short-lived. To support long-running agentic workflows, you'll need to stitch Lambdas together via Step Functions and other services.