Profile avatar
narphorium.bsky.social
Building AI powered tools to augment human creativity and problem solving in San Francisco. Previously @GitHub Copilot, @Google, 🇨🇦
89 posts 1,212 followers 510 following
Regular Contributor
Active Commenter

Imagine trying to explain to a non-technical person that this has nothing to do with GitHub Copilot 🫤

MUTAGREP improves code generation efficiency by navigating repositories, breaking tasks into grounded plans, achieving notable performance with under 5% of context. This enhances accuracy and lets smaller models rival GPT-4o, marking a leap in AI programming. https://arxiv.org/abs/2502.15872

100 Best Coffee Shops in the World and the only one I've been to is Stumptown? I need to be more discerning about where I get my bean juice theworlds100bestcoffeeshops.com/top-100-coff...

Neat visualization that came up in the ARBOR project: this shows DeepSeek "thinking" about a question, and color is the probability that, if it exited thinking, it would give the right answer. (Here yellow means correct.)

Claude Sonnet 3.7 with Claude Code has been a great help pairing with me on MCP SDK work. So stoked to see what people will build with it. www.anthropic.com/news/claude-...

Needed to display some code in my canvas and ended setting up a full Rich Text editor with Tiptap inside @tldraw.com . Shapes as React components is a very powerful paradigm.

I really like this idea from the SWE-Lancer paper where they give the agent a tool which simulates a user clicking through the app to verify that it works correctly. arxiv.org/abs/2502.12115

Colorpad - lets you analyze text by highlighting passages of text, defining what each color means and then exporting to JSON so that it can be easily fed into an LLM as a prompt open.substack.com/pub/shawnfro...

Love this idea of embedding R1 CoT to visualize it's thought process github.com/dhealy05/fra...

We are still looking for AI's #durablemutation. My latest blog post at: ideaspaces.net/unifinishede... @hrheingold.bsky.social @narphorium.bsky.social @edufuturist.bsky.social @hegland.bsky.social

Graphista : Dynamic Graph-Based LLM-Powered Memory System It integrates LLM-powered ingestion and querying tools to enable dynamic knowledge management. - ingest() will use read tools to dedupe nodes/edges, and add/update edges and nodes - ask() will use read tools to find answers in the graph

Forget Microsoftian, AI UI was getting downright Gordian. It's great to hear (if not yet *see*) that it's seemingly on the verge of some major simplification from an end-user perspective.

"Value creation will come from opening up new avenues for thinking about what AI applications could look like, what kinds of functionality could be useful, how users could interact with AI applications, and what kinds of business models could make sense." open.substack.com/pub/uncertai...

What if instead of searching the entire web you could just search your preferred pocket of it?

sharing an old exploration that's been top of mind recently the future is... ✨ a shared artifact between humans and AI ✨ content at different levels of abstraction ✨ multimodal inputs and NL controls

We give reasoning models credit for showing their work, just like on school tests. But what if they're just working backward from memorized answers—like we did?

I used the new citations feature in the Anthropic API to identify a set of supporting facts for each thought in an R1 CoT. I'm surprised at how well it works.

Finally got this prototype to work after 1000 years of debugging. Sweet release 🥲 The idea is I dump a bunch of unstructured notes on a topic in, feed that into gpt-4o-mini, ask it to label sections with a set of types – claim, evidence, assumption, etc. – and display them as coloured highlights

What if reasoning models structured their thoughts as nested bullet points; branching and backtracking through different ways of solving a problem?

We need to rethink interfaces for reasoning models like R1. Instead of shoehorning reasoning to chat, what if we designed the entire experience around reasoning from the start?

One of my favorite new tips pairing with an LLM is having it document "Lessons Learned" in a file in the codebase itself. Actually helps me solidify what I learned in the process as well.

looking at community .cursorrules is an interesting view into what characteristics people value in their code. there's a lot of "avoiding classes" in ts/js. cursor.directory

My flight was so cold tonight that I opened my laptop and ran an R1 eval to stay warm

Enjoying this charming collection of stories about Ted Nelson and how his ideas have changed our world

This taxonomy of failure modes for collaborative agents is super helpful when designing new types of AI-powered tools

LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? 🤖➕👤 Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)

glazkov.com/2025/01/15/r...

"I love what I do and I get to work on stuff I want to work on. I wish everybody had that opportunity."—David Lynch