gregory-marton.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

Removing the gears part results in better performance, and that's surprising because it feels different from how humans learn. Perhaps relatedly, though anecdotally, telling e.g. an image generator what you didn't like about the previous response results in more, not less, of what you didn't like.

submitted 10 days ago • 0 comments

you fucked up a perfectly good computer is what you did. look at it. it's got innumeracy

submitted 12 days ago • 109 comments

We straight white men have got to be collectively embarrassed about this. I mean, I'm mediocre enough to need this kind of leg up, I guess, but all y'all?

submitted 18 days ago • 1 comment

«Kim pointed to newer introductory offerings such as “Python for Humanities and Social Sciences,” “AI for Future Presidents” and “C Programming Language and Linux.”» and it's still available free online www.edx.org/cs50 Love the homage to Richard Muller, too!

submitted 18 days ago • 0 comments

Are linguists paying a lot of attention to LLMs? Because this seems like a fascinating finding with large implications: LLMs share highly abstract grammatical concept representations, even across unrelated languages, so even models trained mostly on English do well in other languages.

submitted 19 days ago • 11 comments

Tech oligarchs made their fortunes thanks in large part to government funded research done by scientists based in universities. The tech industry’s complicity in dismantling these govt agencies and higher ed is not only immoral, it’s also shortsighted. Where will new science breakthroughs come from?

submitted 19 days ago • 5 comments

We launched a bunch of Gemini 2.0 models today. Compared to the 1.5 series models, each of the 2.0 models is generally better than the "one size up" model in the 1.5 series. 2.0 Flash & Flash-Lite set new standards in the quality/cost Pareto frontier. More details: blog.google/technology/g...

submitted 19 days ago • 4 comments

open-Deep-Research by huggingface as posted by @aymeric-roucher.bsky.social An entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data...

submitted 20 days ago • 1 comment

submitted 21 days ago • 7 comments

Exa & Deepseek R1 Chat App Exa & Deepseek Chat App is a free and open-source chat app that uses Exa's API for web search and Deepseek R1 LLM for reasoning. github.com/exa-labs/exa...

submitted 24 days ago • 1 comment

The Internet Archive has to date downloaded 500 terabytes of US government websites, which it crawls at the end of every presidential term. The whole archive is fully searchable. This effort's housed by a donation-funded nonprofit, not a branch of the US government. blog.archive.org/2024/05/08/e...

submitted 24 days ago • 503 comments

Researchers claim Linux kernel tweak could reduce data center energy use by 30% https://www.techspot.com/news/106501-linux-kernel-upgrade-promises-up-30-energy-savings.html #AI #climate

submitted 26 days ago • 0 comments

More info on the Open R1 initiative, as well as a nice explanation of DeepSeek's models and why they are so interesting huggingface.co/blog/open-r1

submitted 27 days ago • 0 comments

As someone who has reported on AI for 7 years and covered China tech as well, I think the biggest lesson to be drawn from DeepSeek is the huge cracks it illustrates with the current dominant paradigm of AI development. A long thread. 1/

submitted 28 days ago • 230 comments

Explainer: What's R1 and Everything Else This is an attempt to consolidate the dizzying rate of AI developments since Christmas. If you're into AI but not deep enough, this should get you oriented again. timkellogg.me/blog/2025/01...

submitted 30 days ago • 5 comments

I'm not sure if people realize how quickly the Trumpzis can do enormous damage to US science, from basic research to translation. Really fast. REALLY fast. Labs with decades of irreplaceable domain and technique knowledge can break apart with a surprisingly short funding gap. When they're gone...1/

submitted 32 days ago • 23 comments

Next big thing for brands: knowing what sites agents prefer. If you ask for stock prices, Claude with Computer Use goes to Yahoo Finance while Operator does a Bing search Operator loves buying from the top search result on Bing. Claude has direct preferences like 1-800-Flowers We don't know why

submitted 32 days ago • 7 comments

Worth also pointing out that there are many "tests so easy no AI system can pass them". Moravec's paradox remains. E.g., arxiv.org/abs/2404.12390

submitted 32 days ago • 8 comments

The new ability of AI video creators to add real people and products to scenes with just an image is likely to increase the utility (& more worryingly, misuse) of AI video. Here I made Shakespeare at a cafe and the Girl with the Pearl Earring piloting a mech (just as Vermeer intended)

submitted 33 days ago • 5 comments

In December, I posted about our new paper on mastering board games using internal + external planning. 👇 Here's a talk now on Youtube about it given by my awesome colleague John Schultz! www.youtube.com/watch?v=JyxE...

submitted 38 days ago • 2 comments

Explainability focuses on finding directions in representation space that correspond to concepts, and strong LRH posits that this may be the only kind of representation to look for. Not so, counterexample given where magnitude matters orthogonally. aclanthology.org/2024.blackbo...

submitted 38 days ago • 0 comments

"Titans", as opposed to Transformers, treat attention as short-term memory and extend the possible context window by using an additional neural memory that lives just as long as in-context document ingestion and query ("test time"), and controlled by surprise and decay. arxiv.org/pdf/2501.00663

submitted 39 days ago • 0 comments

Qwen released a 72B process reward model (PRM) on their recent math model. A good chance it's the best PRM openly available for reasoning research. We like Qwen. https://buff.ly/4gQV9wt

submitted 42 days ago • 0 comments

They found it helpful to pretrain by masking only nouns, verbs, and named entities, and only one at a time, rather than a random set of tokens, for languages where data are scarce.

submitted 42 days ago • 1 comment

The disadvantage of writing one big review of 2024 is that individual sections get lost in the noise - this part about both the improvements and deteriorations in terms of environmental impact of LLMs probably deserved its own separate post

submitted 44 days ago • 6 comments

Google just released TimesFM-2.0 (Time Series Foundation Model - jax & pytorch) on Hugging Face with a significant boost in accuracy and maximum context length. It is a pretrained time-series foundation model developed by Google Research for time-series forecasting. huggingface.co/google/times...

submitted 45 days ago • 0 comments

25 AI Predictions for 2025 (and a review of my almost entirely correct predictions from 2024) open.substack.com/pub/garymarc...

submitted 54 days ago • 9 comments

My prediction in 2010 was that we would have more autonomous cars than human driven ones on the road by 2030, and I guess we'll see, but an important take is $ would be better spent on improving public transit and infrastructure. Better driver assistance is cool too, I guess. Yay LLMs, despite hype!

submitted 54 days ago • 0 comments

It's very fashionable to keep criticizing LLMs as "glorified autocorrect"s. I'm curious how one explains the ability to execute this prompt beautifully as an "autocorrect". (And yes, I've used many other language systems: Duolingo, Mango, Pimsleur, etc.)

submitted 57 days ago • 17 comments

Published version, here: van Rooij, I., Guest, O., et al. Reclaiming AI as a Theoretical Tool for Cognitive Science. Comput Brain Behav 7, 616–636 (2024). doi.org/10.1007/s421...

submitted 58 days ago • 6 comments

Genius! For medical device communication, do not use wireless, which is easy to snoop or jam, nor implant actual wires, ugh, but use the human body itself "as the communication medium for the devices in someone's body-area network." #IoBodies wow.

submitted 64 days ago • 0 comments

Basically think of the o3 results as validating Douglas Adams as the science fiction author most right about AI. When given longer to think, the AI can generate answers to very hard questions, but the cost is very high, it is hard to verify, & you have to make sure you ask the right question first.

submitted 66 days ago • 8 comments

"The Free Software Foundation announced they are pursuing freedom in machine learning while not being limited to just the software but also the training data as well": The Free Software Foundation Finally Has AI / Machine Learning Apps On Their Radar - Phoronix

submitted 124 days ago • 0 comments

A lot more encoding happens than generation, because e.g. to find query-relevant documents you encode them all and look for similarities in the encoded space. Improvements in encoding are thus less visible but perhaps more impactful from sustainability and quality viewpoints.

submitted 67 days ago • 0 comments

Dear god does this really all need to happen approximately 2 days before end of days! Ill come back to you on this one 🤣 @create-glasgow.bsky.social www.gov.uk/government/c...

submitted 68 days ago • 2 comments

People are right now slobbering pretty hard over this AI tutor demo over on LinkedIn. I think it's a mess—pedagogically, socially, and mathematically. What do you notice?

submitted 83 days ago • 21 comments

Amazing line up of speakers this afternoon, glad I chose to attend @suhr.bsky.social talked about interactive language use in games, specifically their latest project on studying how people cooperate/talk in Portal 2 Tom Griffiths, among other insights, showed tasks where CoThurts performance!

submitted 72 days ago • 1 comment

Used Gemini Live tonight to discuss CA’s new law AB 2013. The conversation flowed easily and could definitely help with brainstorming complex ideas. #EduSky #AIEdu

submitted 75 days ago • 0 comments

U.S. math scores drop on major international test | https://buff.ly/3VNRNSv

submitted 75 days ago • 0 comments

Google quietly updated their ngrams viewer again this year. The books used appear to be extremely different yet again--the rate of the words "she said" are about 60% what they were in the 20th century compared to the 2019 release, and just 20% compared to 2009. But there's a catch:

submitted 76 days ago • 5 comments

The new Deep Research feature from Google feels like one of the most appropriately "Google-y" uses of AI to date, and is quite impressive. I've had access for a bit and it does very good initial reports on almost any topic. The paywalls around academic sources puts some limits around it, though.

submitted 75 days ago • 3 comments

"Lovotics" may be a fun new term, but we've literally and literarily been discussing this since the term "robot" was coined in Karel Čapek's R.U.R.

submitted 76 days ago • 0 comments