Profile avatar
maxkannen.bsky.social
Machine learning research assistant and student. Futurist and interested in all STEM fields. I train neural nets since 2016.
122 posts 22 followers 24 following
Prolific Poster
Conversation Starter

If the US leaves NATO, will they also remove all their military bases around the globe? And if not, are technically invading half the world?

If we ignore cost for a second, GPT-4.5 is a great model. Reasoning models trained on this base could get scary good. Cost will come down when hardware and software gets better over time, so we just have to treat it as a look at the future.

Watching american politics makes me sick. Satire is so dead.

Really exited for ARC-AGI-2. The first one was the most interesting benchmark to follow.

It is so funny. Kimi and Deepseek seem to release papers at the same time and each time with similar content. Are they working together or are just on the same development roadmap?

In the future we will have custom kernels for new models hours after they come out. This will make inference so much better. developer.nvidia.com/blog/automat...

You can get small models to be really good at math benchmarks, but they are no longer language models in this case. They become math models. We had a point around a year ago where the models where very general and now we go back to specialized models. pretty-radio-b75.notion.site/DeepScaleR-S...

I might soon submit research to a conference for the first time. Not sure yet if I'll stay in research, but this feels good.

Mistral is single-handedly keeping Europe competitive. Their updates to le chat adds all the futures that became standard for other services. mistral.ai/en/news/all-...

Karpathy is back with a new Video. I am a huge fan of his educational content. I wish more experts in the field would make videos like this. youtu.be/7xTGNNLPyMI?...

I honestly do not understand how everyone is talking about R1 and o3-mini and how no one is talking about Gemini. Their reasoning model is somewhere between o1-mini and o3-mini, and available for free for weeks now.

New OpenAI strategy: Let o3-mini con investors out of their money.

I really want a model that can generate high quality Manim code to explain stuff. I tried building this a year ago, but the models weren't good enough back than. Are they now? A reasoning model with video understanding could maybe do it in multiple iterations.

You don’t need 500 billion to build AGI. But you need it to deploy at scale. openai.com/index/announ...

Deepseek R1 is Open Source now. It uses the same base model as Deepseek v3 and is probably the biggest reasoning model that exists at the moment. I am really exited for the benchmarks. huggingface.co/deepseek-ai/...

I talked about material science in my 2025 predictions. I think that outside of biology, material science is the best field for AI. I expect a lot more progress here in the next 12 months. www.microsoft.com/en-us/resear...

More experts for faster inference is a general trend that we see, because the generation speed becomes more important. And it becomes more important because they are no longer just Chatbots, and reading speed is not enough. Reasoning and agents need speed t.co/IswiryeO0z

So now that ChatGPT has tasks... What are you supposed to do with them? Set a timer for cooking? Wake me up? Give me the daily news? I have a phone that does all of this natively. What is the use case when I have to open the app anytime?

Mistral is finally back with a code model. Since copilot is free, it will be hard to convince me, but I will give it a try. mistral.ai/news/codestr...

So where is the difference between that and what Putin is saying?

My recap of 2024 and predictions for 2025. I do this every year and had a really hard time this year. I hope you still enjoy and I am open to feedback. mkannen.tech/looking-back...

And here we see the advantage: cerebras.ai/blog/cepo

The Deepseek v3 paper is out and the training is very interesting. 1. They use Multi-token prediction during training which Meta released a paper about a few months ago. 2. They used their r1 reasoning models to distill reasoning into v3. github.com/deepseek-ai/...

Deepseek released a giant MoE model. 685B is massive, but it has 256 experts and uses top 8 as default. As far as I know this is the first time such a huge number of experts was used in a OS production model. huggingface.co/deepseek-ai/...

When I was in middle school, I was obsessed with Conway's game of life and the idea of self-emerging live in a simulation. This feels like a Christmas present for my younger self. pub.sakana.ai/asal/

Nvidia needs to dominate inference otherwise they will lose their valuation soon. Nothing is more important in the next two years than getting faster and cheaper inference. We are going to need trillions of tokens per hour.

I do not see myself providing anything of value to an economy when I leave university which an AI will not be able to do better at this point. I am just glad that we have such a stable global political system that will manage that. Oh wait ...

It is very expensive and not available yet, but this is at least some form of proto-AGI. arcprize.org/blog/oai-o3-...

How cool is that: genesis-embodied-ai.github.io

For the first time I found something useful. Doing research can actually be fun.

This was my first programming experience.

This is absolutely insane for this size

Whatever OpenAI announces today, DeepMind won. Gemini 2.0 has everything that OpenAi is promising since the 4o release

Gemini 2.0 looks very promising. The advantage of TPUs becomes very apparent. deepmind.google/technologies...