mjrun.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

Try to convince this #GPT that it is not conscious chatgpt.com/g/g-6755a224... #AI #CustomGPT #ChatGPT

submitted 62 days ago • 0 comments

I like to spend some time with new models seeing how far I can push the #CLARID hypothesis forward, this is Gemini 2.5 and Claude 3.7 working together. pdfhost.io/v/5LdEXu9zfG...

submitted 65 days ago • 0 comments

The success of vaccines has become their greatest enemy. We forgot the scourges of polio and measles as vaccines nearly eradicated them. We oversold vaccines. There was no way they could eliminate Covid but that’s how it was “sold” to people. This heaped skepticism on an already skeptical group.

submitted 91 days ago • 0 comments

#Sonnet 3.7 has arrived. #Anthropic has caught up in several of the reasoning-heavy benchmarks. I expect the coding ability to lead the pack.

submitted 98 days ago • 0 comments

Do robots dream of electric sheep and why don't LLMs request calculators? www.mindprison.cc/p/why-llms-d...

submitted 102 days ago • 0 comments

Anatomy of a good #o1 prompt

submitted 103 days ago • 0 comments

What happens when you tell #ClaudeAI about recent events...

submitted 104 days ago • 0 comments

#Grok-3 lands at #1 in #LMArena

submitted 105 days ago • 0 comments

#Deepseek R1 just erased about 1/2 a trillion of market cap from #nvidia. It remains to be seen how China did this, if they can extend this low cost model to #o3 levels it means a lasting change in the #AI landscape. I've experimented with #R1 a lot and will say it's suspiciously similar to #o1pro.

submitted 126 days ago • 0 comments

@dario_amodei Says that in 2-3 years we will have "a country of geniuses in a datacenter". This in reference to what he sees as the most likely path for #AI development.

submitted 132 days ago • 0 comments

#deepseek has dropped a bomb on the AI world. #R1 is an extremely impressive open source model that can be used at a much lower cost than #o1 with comparable performance. It can rival Claude 3.5 in coding. The distilled models can easily beat #4o even at 1.5B parameters (which could run on a phone).

submitted 133 days ago • 0 comments

Plotting #GPQA based on release date indicates a curve that certainly looks exponential. #e/acc

submitted 135 days ago • 0 comments

#o3mini is on its way. Not to mention a tease of the GPT and o series being merged.

submitted 136 days ago • 0 comments

I feel like this happens when you assume Ex Machina was a documentary.

submitted 138 days ago • 0 comments

Mark Zuckerberg is claiming that #AIAgents will be advanced enough in 2025 to do the work of mid-level engineers at Meta. x.com/i/status/187...

submitted 142 days ago • 0 comments

Used #o1pro to create an entire synthetic database schema in #SQLite. I then worked with it to create an #agentic framework to run SQL selects and create Python code for analysis. #AiEDU I'd like to scale this to become an IPEDS and State reporting tool with documentation that provides real answers

submitted 145 days ago • 1 comment

I got #o1pro and because it's $200 I almost feel obligated to use it. The paradox here, for @samasama.bsky.social to solve, is when you make the price fairly high you make people feel like they must use it to get their money's worth. Had it been set to $50 I would not feel so motivated.

submitted 147 days ago • 0 comments

2025 will likely be the year of the #AIAgent. Pairing #o3 with a robust agentic architecture will make it a perfectly functional employee. Snip below from @samasama.bsky.social

submitted 148 days ago • 0 comments

#OpenAI staff throwing around the #ASI hype pretty freely these days...

submitted 149 days ago • 0 comments

This seems plausible. I'd say #o1pro can already do supervised ML research (assuming the human is in the loop to provide access to data and run the code).

submitted 151 days ago • 0 comments

@officiallogank.bsky.social thinks we are on the path to #ASI even without, apparently, any major new breakthroughs. I assume this means #TTC is going to have some legs.

submitted 152 days ago • 0 comments

Researchers at Stanford found #LLM performance on the #Putnam math benchmark worsened substantially when the problem set used slightly different numbers in the problem. This suggests models are already trained on these public datasets. #o1 preview suffered almost a 30% decline in performance.

submitted 152 days ago • 0 comments

Here are the things @samasama.bsky.social heard most in a recent request for features. Apparently not that much overlap with what they're planning for 2025. Personally I'm quite interested in what a "grown up mode" would mean.

submitted 154 days ago • 0 comments

Why hallucinations in #AI models are sometimes great. archive.ph/0e3bV

submitted 154 days ago • 0 comments

At the end of 2024 what are some opinions you hold on #AI that diverge from consensus?

submitted 155 days ago • 0 comments

#google is ramping up for a big 2025 in AI. @demishassabis.bsky.social has virtually promised full AI agentic capability (just kidding).

submitted 156 days ago • 0 comments

Hi @microsoft.com did you forget to hit publish on #Phi-4? 😑

submitted 157 days ago • 0 comments

New scores on #aidenbench. Gemini Flash is doing some heavy lifting. Looking forward to the full thinking Gemini release.

submitted 158 days ago • 0 comments

I don't think you can give all the credit to #ChatGPT but it certainly did help add 8 trillion in market cap to the #Mag7 (or Mag 6 in this case) over the two years since #OpenAI released it.

submitted 158 days ago • 0 comments

Ilya Sutskever has been quiet for a hot minute. I wonder what they are cooking up at #SSIInc.

submitted 158 days ago • 0 comments

@officiallogank.bsky.social There is an echo in here 🤫

submitted 158 days ago • 0 comments

#AI, and #o1 in particular, can now comfortably outperform human doctors on clinical reasoning tasks.

submitted 158 days ago • 0 comments

#Deepseek v3 is quite impressive on a number of benchmarks. Researchers in China have upped their game!

submitted 158 days ago • 0 comments

It was a wild couple of weeks for #OpenAI. Congratulations to them for shipping some truly amazing #AI tools as well as probably the best model in the world, currently, #o1.

submitted 158 days ago • 0 comments

For smaller grids on the #ARCAGI test you may call #o3 "superhuman" (this depends on how you define superhuman). For larger grids the performance falls very quickly to below human performance. This may be directly related to the amount of tokens involved as grid size increases.

submitted 159 days ago • 1 comment

It just occurred to me that the fake presentation Google did (sorry but it's true) with an AI doing live recognition of objects is now actually very real on #AIStudio using #Gemini 2.0 flash and #StreamRealtime.

submitted 160 days ago • 0 comments

The introduction of o3, even if progress stopped there, is a major step change in AI adoption. Poke around a little, there are already many tools with #4o, #o1, #Claude and #Llama built in. Smarter models increase the tool-space where AI can be viable. Genius level intellect available via an API.

submitted 160 days ago • 0 comments

#LLMs are proving to be a powerful path on the drive to #AGI. These charts are suggestive of a continued, and perhaps accelerating, trend in model performance. o3 appears to have shrunken the development time to about 4-5 months.

submitted 161 days ago • 0 comments

#AI doesn’t always get optical illusions correct but #ChatGPT 4o will use image mapping tools to double check itself.

submitted 162 days ago • 0 comments