datageek-pl.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

Concordia is a library for generative agent-based modeling that works like a table-top role-playing game. It's open source and model agnostic. Try it today! github.com/google-deepm...

submitted 103 days ago • 2 comments

Still cooking my aibricks project. Simple example of what I mean by "configuration driven":

submitted 76 days ago • 0 comments

From the same people (lmarena.ai) who brought you Chatbot Arena, they are introducing WebDev Arena. Leaderboard: web.lmarena.ai/leaderboard

submitted 76 days ago • 0 comments

LEXICO defining the Pareto frontier of KV cache compression

submitted 78 days ago • 0 comments

Textual 1.0 has been released. 🥳 Three years in the making. A TUI framework that is bigger than the terminal. To celebrate, I want to give away some trade secrets. Because I am appalling at keeping secrets. Tell me what you think of the diagrams... textual.textualize.io/blog/2024/12...

submitted 78 days ago • 7 comments

1000x inference cost reduction by converting qwen/llama to rwkv architecture without retraining from scratch. Big if true/without-any-major-issues. Definitely worth checking. huggingface.co/recursal/QRW...

submitted 78 days ago • 0 comments

Q-RWKV-6 32B Instruct Preview substack.recursal.ai/p/q-rwkv-6-3...

submitted 79 days ago • 0 comments

developers.googleblog.com/en/the-next-...

submitted 79 days ago • 0 comments

In case you didn't know, Bluesky has built-in RSS support 👀 openrss.org/blog/bluesky...

submitted 81 days ago • 0 comments

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities arxiv.org/abs/2411.04986

submitted 80 days ago • 0 comments

entrapix — your LLM should raise a ConfusedAgentError when it doesn’t know Why? Because as an application developer, I have a lot of things I can do to give the LLM more information github.com/tkellogg/oll...

submitted 81 days ago • 1 comment

Leaked Windsurf #codegen prompt: www.reddit.com/r/LocalLLaMA...

submitted 83 days ago • 0 comments

Please don’t mind any unfollow from my side - I’m experimenting with the atproto api.

submitted 85 days ago • 0 comments

This is how happiness looks like.

submitted 87 days ago • 0 comments

Now cooking the config layer and streaming. Middleware for streaming might require some trial and error but hey, thats the fun part! The results might be nice - imagine streaming but line by line instead by random number of tokens - the latency is still low and you can easily act on produced text!

submitted 87 days ago • 0 comments

This 1h free course is an excellent intro to memgpt like agent architecture. No frameworks - just doing fun things from scratch. I highly recommend this one even to people not interested in AI powered games - agentic internals are very similar everywhere. www.deeplearning.ai/short-course...

submitted 89 days ago • 0 comments

This is crazy, run all workflows for generative AI on local GPU from one simple to use interface, includes snap-ins for all the popular workflows. pinokio.computer #ai #aiTools #aiArtist

submitted 90 days ago • 0 comments

Twitter's algo has made us forget the value of links in short messages from people who share our interests. To say that I’m in tears would be an overstatement but I’m seriously touched.

submitted 90 days ago • 2 comments

New study shows LLMs outperform neuroscience experts at predicting experimental results in advance of experiments (86% vs 63% accuracy). They use a fine-tuned Mistral 7B but other models worked too. Suggests LLMs can integrate scientific knowledge at scale to support research.

submitted 91 days ago • 12 comments

My answer to AiSuite - focused even more on the GPU-poor. It's still in early development, but you can take a look here: github.com/mobarski/ai-... All feedback is more than welcome (comments, likes, github stars, reposts).

submitted 92 days ago • 1 comment

note to the gpu poor: you can still train

submitted 102 days ago • 0 comments

wow, fp8 training works well apparently pytorch.org/blog/trainin...

submitted 92 days ago • 0 comments

new Aider #codegen release

submitted 92 days ago • 0 comments

One of the best articles for understanding how Deepseek and Qwen are possible in Open-source LLMs. Deepseek: The Quiet Giant Leading China’s AI Race Annotated translation of its CEO's deepest interview @jordanschneider.bsky.social China AI context: www.chinatalk.media/p/deepseek-c...

submitted 92 days ago • 0 comments

The wave of reasoning models from the Chinese community has arrived 🌊 ✨ Marco-o1 by AIDC huggingface.co/AIDC-AI/Marc... ✨ QwQ by Alibaba huggingface.co/collections/... ✨ Skywork-o1 by Kunlun Tech huggingface.co/collections/...

submitted 92 days ago • 1 comment

smear campaign ideas against grammar-constrained LLM decoding: "structured generation is like aim assist for language models --- only cheaters use it"

submitted 92 days ago • 2 comments

On-Chip Implementation of Backpropagation for Spiking Neural Networks on Neuromorphic Hardware #DL #AI #ML #DeepLearning #ArtificialIntelligence #MachineLearning #ComputerVision #AutonomousVehicles #Robotics #LLM #VLM #LVLM https://buff.ly/3COjDY4

submitted 93 days ago • 0 comments

This new LLM API adapter looks really nice == very similar to what I was doing in my project XD Less code to maintain == good news. github.com/andrewyng/ai...

submitted 94 days ago • 0 comments

Here’s a nice reddit post showing how quantization affects benchmarking results: www.reddit.com/r/LocalLLaMA...

submitted 98 days ago • 0 comments