Profile avatar
campbell.fi
Data enthusiast, Father, consultant
27 posts 89 followers 1,064 following
Regular Contributor
Active Commenter

NovaSky AI's S*: Test-Time Scaling for Code Generation S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* > o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).

Improve the performance of gradient-boosted decision trees like XGBoost allowing them to read text column headers and to benefit from massive pretraining: replace the first tree with an LLM or TabPFN!

Reasoning Datasets collections by @philschmid.bsky.social 1️⃣ ServiceNow-AI/R1-Distill-SFT 2️⃣ open-thoughts/OpenThoughts-114k 3️⃣ bespokelabs/Bespoke-Stratos-17k 4️⃣ EricLu/SCP-116K 5️⃣ cognitivecomputations/dolphin-r1 huggingface.co/collections/...

What industrial recsys papers have you enjoyed or found useful in the past year or two? Sharing my list: # 1. Integrating LLMs into recsys 1.1. LLM-augmented recommenders • Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations - arxiv.org/abs/2306.08121

Have you ever wondered how Deepseek compares to OpenAi and Anthropic? I put a business test case to it to find out! #databs www.linkedin.com/posts/john-c...

Given the r1 furor, folks should really read this paper on policy-gradient based search: arxiv.org/abs/1904.03646

(1/3) Continuing from my previous thread on infrastructure as code for managing #Databricks. I have recently had the pleasure to work with an open source tool called Laktory, which is an abstraction that sits on top of Terraform/Pulumi to manage your Databricks workflow using YAML. #databs

One of the biggest developer productivity gains is learning how to efficiently navigate through a codebase If you are using the file sidebar + search to navigate around, I've got 15 techniques that will reframe and make you absolutely fly in VS Code and Cursor www.youtube.com/watch?v=c0HO...

Does anyone have a favorite open source tool for making ER Diagrams and one for making TADs? I played around a bit with the VS Code ERD extension, but haven’t jumped in too much. Wondering if any #databs folks have favorites #datamodeling #systemarchitecture #opensource

New year, new blog post: I had a random question, what happens when LLMs are prompted to write better code, again and again? Do they actually write better code? The answer is yes*! minimaxir.com/2025/01/writ...

So based on some earlier comments, I threw together a starter kit type program that will let you monitor the firehose for keywords and then add any accounts it picks up to a list or lists. This will work for both moderation lists and follow lists.

uv is really really really close to replacing about half a dozen tools (and making python the default scripting language) treyhunner.com/2024/12/lazy...

It turns out AI is very good at using AI. Yesterday, in my Tobiko SQLMesh advent series, I reached the point where I could generate a JSON representation of the relationships between models in an SQLMesh project. open.substack.com/pub/davidsj/...

I'm a man of simplicity. I don't know any other data stack that gets you from 0 to 1 as quickly... Except Excel. New vid 🎥: youtu.be/bbclf8ibIwM #dataengineering #databs

I’m releasing a series of experiment to enhance Retrieval augmented generation using attention scores. colab.research.google.com/drive/1HEUqy... Basic idea is to leverage the internal reading process, as the model goes back and forth to the sources to find information and potential quotes.

I'm thinking something like this. The "engine" is basically only transpiling from config to the actual data stack with multiple adapters— e.g. dbt, SDF, SQLMesh for `transform()`. I can't help but think about DWH Automation (DWA). Config = template Engine = DWA DDS = gen. SQLs Any thoughts? 🤔

Great rant about dbt and `ref`. I'm currently trialing SDF, which auto-detects your tables and has a strong compiler built-in to check your SQL before running a single SQL. They even use Datafusion to run tests based on data types and definitions during build time. Has anyone else tried SDF?

✍️ "Hard truths about AI-assisted coding" tips & tricks in my latest article: bit.ly/ai-assisted While AI-Assisted coding can get you 70% of the way there (great for prototypes or MVPs), the final 30% requires significant human intervention for quality and maintainability.

So MCP servers are really cool for giving your LLMs superpowers... but also pretty complex to build and debug. I created FastMCP to make it easy. Let me know what you think! github.com/jlowin/fastmcp

MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity Introduces an RL framework that dynamically selects optimal retrieval strategies based on query complexity. 📝 arxiv.org/abs/2412.01572 👨🏽‍💻 github.com/FUTUREEEEEE/...

Was so into building I forgot to share this! I'm excited to work with @thedsp.bsky.social to bring FastMCP into the official SDK and make it as easy as possible to build MCP servers. More to come! www.jlowin.dev/blog/introdu...

zero to MCP server in a couple lines and two CLI commands this one texts me using surgemsg.com (which satisfies the "omg twilio just let me text myself" need)

New Bluesky community answering everyone’s technical questions with flying colors. I’m gonna do a social media test myself: Recommend me something, anything. I’ll recommend you something back.

What's the best podcast app for android? I use pocket casts and still raging mad about Google podcasts #databs #podcast

#databs I have a question for you, has anyone implemented a gui-based business rules system lately? I've been looking through dead repos like pyke and failing to see anything compelling. It seems everyone stopped working on these during corona

Great Article here @joshtpm.bsky.social talkingpointsmemo.com/edblog/a-fol... I think one thing missing from the conversation is the OODA loop in the campaign context. Dems were briefly winning the loops up until the interview drumbeat started in Aug. Dems still haven't found an effective counter