Profile avatar
dani-sola.com
Interested in people, distributed systems, sustainability, and all things data.
6 posts 23 followers 33 following
Regular Contributor

We are continuing with our series of posts on some non-trivial use cases for XGBoost. In this latest posts we talk about using Shapley *interaction* values for feature engineering. 1/2

Just published a post about building smart services at CLARK. A pragmatic approach that worked very well for us, going from heuristics to ML. Thoughts and feedback welcome! #datasky #data #databs medium.com/clark-engine...

Despite patriarchy's persistence, growing numbers of men believe they have it worse off than women. And, new research shows this "male victimhood" ideology is most common among men who aren't facing hardship. Which means what they're really feeling is status loss. 1/ www.psypost.org/male-victimh...

DeepSeek-R1! ⚑ Performance on par with OpenAI-o1 πŸ“– Fully open-weight model & technical report πŸ† MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Demo: chat.deepseek.com Models: huggingface.co/deepseek-ai

First post of the year! @andypavlo.bsky.social got me thinking about why Confluent didn't build WarpStream. My conclusion: legacy infrastructure companies are going to have a tough time against cloud native, AI-enabled, post-ZIRP competitors.

The MemoryDB paper shows the power of separating responsibilities through clever composition. I think this DB frontend/execution plus a distributed transaction log pattern can be promising for creating serverless variants of many popular databases. E.g., Aurora adopts a similar decoupling approach.

OLTP Through the Looking Glass 16 Years Later: Communication is the New Bottleneck www.cs.cit.tum.de/fi...

I love dbt, but sdf.com looks very promising: faster runtime, improved reports, column-level lineage, etc. Does anyone have experience running it in production? #databs #datasky

S3 (Iceberg) Tables is everything I dreamt of, and more. I blogged some long-form thoughts: meltware.com/2024/12/04/s... I think we're about to see an explosion of data tools (@materialize.com, @clickhouse.com, @duckdb.org, et al.) learn to write Iceberg tables via S3 table buckets. #databs

Seems like a safe bet that object storage as a foundation of data systems architecture is here to stay blog.colinbreck.com/predicting-t...

πŸ“•A Portable Introduction to Data Analysis (open access) 2024. By Michael Bulmer πŸ‘‰(uq.pressbooks.pub/portable-int...) #Statistics #Datavisualization #MachineLearning #DataScience #Python #rstudio #PhD #bioinformatics #Rstudio #neuroscience #postdoc #research #stats #AI

New blog post! Big data isn’t dead; it’s just going incremental. But bad things happen when uncontrolled changes collide with incremental jobs. Reacting to changes is a losing strategy. jack-vanlightly.com/...