campbell.fi - Profile | ThreadSky | a Reddit-style client for Bluesky

NovaSky AI's S: Test-Time Scaling for Code Generation S enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* > o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).

submitted 23 hours ago • 1 comment

Improve the performance of gradient-boosted decision trees like XGBoost allowing them to read text column headers and to benefit from massive pretraining: replace the first tree with an LLM or TabPFN!

submitted 16 days ago • 2 comments

Reasoning Datasets collections by @philschmid.bsky.social 1️⃣ ServiceNow-AI/R1-Distill-SFT 2️⃣ open-thoughts/OpenThoughts-114k 3️⃣ bespokelabs/Bespoke-Stratos-17k 4️⃣ EricLu/SCP-116K 5️⃣ cognitivecomputations/dolphin-r1 huggingface.co/collections/...

submitted 21 days ago • 1 comment

What industrial recsys papers have you enjoyed or found useful in the past year or two? Sharing my list: # 1. Integrating LLMs into recsys 1.1. LLM-augmented recommenders • Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations - arxiv.org/abs/2306.08121

submitted 23 days ago • 1 comment

Have you ever wondered how Deepseek compares to OpenAi and Anthropic? I put a business test case to it to find out! #databs www.linkedin.com/posts/john-c...

submitted 26 days ago • 0 comments

Given the r1 furor, folks should really read this paper on policy-gradient based search: arxiv.org/abs/1904.03646

submitted 33 days ago • 2 comments

(1/3) Continuing from my previous thread on infrastructure as code for managing #Databricks. I have recently had the pleasure to work with an open source tool called Laktory, which is an abstraction that sits on top of Terraform/Pulumi to manage your Databricks workflow using YAML. #databs

submitted 40 days ago • 1 comment

One of the biggest developer productivity gains is learning how to efficiently navigate through a codebase If you are using the file sidebar + search to navigate around, I've got 15 techniques that will reframe and make you absolutely fly in VS Code and Cursor www.youtube.com/watch?v=c0HO...

submitted 40 days ago • 16 comments

Does anyone have a favorite open source tool for making ER Diagrams and one for making TADs? I played around a bit with the VS Code ERD extension, but haven’t jumped in too much. Wondering if any #databs folks have favorites #datamodeling #systemarchitecture #opensource

submitted 44 days ago • 0 comments

New year, new blog post: I had a random question, what happens when LLMs are prompted to write better code, again and again? Do they actually write better code? The answer is yes*! minimaxir.com/2025/01/writ...

submitted 52 days ago • 8 comments

So based on some earlier comments, I threw together a starter kit type program that will let you monitor the firehose for keywords and then add any accounts it picks up to a list or lists. This will work for both moderation lists and follow lists.

submitted 55 days ago • 3 comments

uv is really really really close to replacing about half a dozen tools (and making python the default scripting language) treyhunner.com/2024/12/lazy...

submitted 65 days ago • 1 comment

It turns out AI is very good at using AI. Yesterday, in my Tobiko SQLMesh advent series, I reached the point where I could generate a JSON representation of the relationships between models in an SQLMesh project. open.substack.com/pub/davidsj/...

submitted 66 days ago • 1 comment

I'm a man of simplicity. I don't know any other data stack that gets you from 0 to 1 as quickly... Except Excel. New vid 🎥: youtu.be/bbclf8ibIwM #dataengineering #databs

submitted 69 days ago • 1 comment

I’m releasing a series of experiment to enhance Retrieval augmented generation using attention scores. colab.research.google.com/drive/1HEUqy... Basic idea is to leverage the internal reading process, as the model goes back and forth to the sources to find information and potential quotes.

submitted 70 days ago • 2 comments

I'm thinking something like this. The "engine" is basically only transpiling from config to the actual data stack with multiple adapters— e.g. dbt, SDF, SQLMesh for `transform()`. I can't help but think about DWH Automation (DWA). Config = template Engine = DWA DDS = gen. SQLs Any thoughts? 🤔

submitted 73 days ago • 3 comments

Great rant about dbt and `ref`. I'm currently trialing SDF, which auto-detects your tables and has a strong compiler built-in to check your SQL before running a single SQL. They even use Datafusion to run tests based on data types and definitions during build time. Has anyone else tried SDF?

submitted 76 days ago • 7 comments

✍️ "Hard truths about AI-assisted coding" tips & tricks in my latest article: bit.ly/ai-assisted While AI-Assisted coding can get you 70% of the way there (great for prototypes or MVPs), the final 30% requires significant human intervention for quality and maintainability.

submitted 80 days ago • 9 comments

So MCP servers are really cool for giving your LLMs superpowers... but also pretty complex to build and debug. I created FastMCP to make it easy. Let me know what you think! github.com/jlowin/fastmcp

submitted 84 days ago • 6 comments

MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity Introduces an RL framework that dynamically selects optimal retrieval strategies based on query complexity. 📝 arxiv.org/abs/2412.01572 👨🏽‍💻 github.com/FUTUREEEEEE/...

submitted 82 days ago • 0 comments

Was so into building I forgot to share this! I'm excited to work with @thedsp.bsky.social to bring FastMCP into the official SDK and make it as easy as possible to build MCP servers. More to come! www.jlowin.dev/blog/introdu...

submitted 81 days ago • 3 comments

zero to MCP server in a couple lines and two CLI commands this one texts me using surgemsg.com (which satisfies the "omg twilio just let me text myself" need)

submitted 84 days ago • 1 comment

New Bluesky community answering everyone’s technical questions with flying colors. I’m gonna do a social media test myself: Recommend me something, anything. I’ll recommend you something back.

submitted 105 days ago • 49 comments

What's the best podcast app for android? I use pocket casts and still raging mad about Google podcasts #databs #podcast

submitted 89 days ago • 1 comment

#databs I have a question for you, has anyone implemented a gui-based business rules system lately? I've been looking through dead repos like pyke and failing to see anything compelling. It seems everyone stopped working on these during corona

submitted 103 days ago • 1 comment

Great Article here @joshtpm.bsky.social talkingpointsmemo.com/edblog/a-fol... I think one thing missing from the conversation is the OODA loop in the campaign context. Dems were briefly winning the loops up until the interview drumbeat started in Aug. Dems still haven't found an effective counter

submitted 104 days ago • 1 comment