Profile avatar
gregory-marton.bsky.social
GenAI adjunct at Tufts, ft dad, cs tutor. https://www.seidellmarton.us/gremio https://www.linkedin.com/in/gregory-marton/
63 posts 515 followers 1,158 following
Prolific Poster

Removing the gears part results in better performance, and that's surprising because it feels different from how humans learn. Perhaps relatedly, though anecdotally, telling e.g. an image generator what you didn't like about the previous response results in more, not less, of what you didn't like.

you fucked up a perfectly good computer is what you did. look at it. it's got innumeracy

We straight white men have got to be collectively embarrassed about this. I mean, I'm mediocre enough to need this kind of leg up, I guess, but all y'all?

«Kim pointed to newer introductory offerings such as “Python for Humanities and Social Sciences,” “AI for Future Presidents” and “C Programming Language and Linux.”» and it's still available free online www.edx.org/cs50 Love the homage to Richard Muller, too!

Are linguists paying a lot of attention to LLMs? Because this seems like a fascinating finding with large implications: LLMs share highly abstract grammatical concept representations, even across unrelated languages, so even models trained mostly on English do well in other languages.

Tech oligarchs made their fortunes thanks in large part to government funded research done by scientists based in universities. The tech industry’s complicity in dismantling these govt agencies and higher ed is not only immoral, it’s also shortsighted. Where will new science breakthroughs come from?

We launched a bunch of Gemini 2.0 models today. Compared to the 1.5 series models, each of the 2.0 models is generally better than the "one size up" model in the 1.5 series. 2.0 Flash & Flash-Lite set new standards in the quality/cost Pareto frontier. More details: blog.google/technology/g...

open-Deep-Research by huggingface as posted by @aymeric-roucher.bsky.social An entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data...

Exa & Deepseek R1 Chat App Exa & Deepseek Chat App is a free and open-source chat app that uses Exa's API for web search and Deepseek R1 LLM for reasoning. github.com/exa-labs/exa...

The Internet Archive has to date downloaded 500 terabytes of US government websites, which it crawls at the end of every presidential term. The whole archive is fully searchable. This effort's housed by a donation-funded nonprofit, not a branch of the US government. blog.archive.org/2024/05/08/e...

Researchers claim Linux kernel tweak could reduce data center energy use by 30% https://www.techspot.com/news/106501-linux-kernel-upgrade-promises-up-30-energy-savings.html #AI #climate

More info on the Open R1 initiative, as well as a nice explanation of DeepSeek's models and why they are so interesting huggingface.co/blog/open-r1

As someone who has reported on AI for 7 years and covered China tech as well, I think the biggest lesson to be drawn from DeepSeek is the huge cracks it illustrates with the current dominant paradigm of AI development. A long thread. 1/

Explainer: What's R1 and Everything Else This is an attempt to consolidate the dizzying rate of AI developments since Christmas. If you're into AI but not deep enough, this should get you oriented again. timkellogg.me/blog/2025/01...

I'm not sure if people realize how quickly the Trumpzis can do enormous damage to US science, from basic research to translation. Really fast. REALLY fast. Labs with decades of irreplaceable domain and technique knowledge can break apart with a surprisingly short funding gap. When they're gone...1/

Next big thing for brands: knowing what sites agents prefer. If you ask for stock prices, Claude with Computer Use goes to Yahoo Finance while Operator does a Bing search Operator loves buying from the top search result on Bing. Claude has direct preferences like 1-800-Flowers We don't know why

Worth also pointing out that there are many "tests so easy no AI system can pass them". Moravec's paradox remains. E.g., arxiv.org/abs/2404.12390

The new ability of AI video creators to add real people and products to scenes with just an image is likely to increase the utility (& more worryingly, misuse) of AI video. Here I made Shakespeare at a cafe and the Girl with the Pearl Earring piloting a mech (just as Vermeer intended)

In December, I posted about our new paper on mastering board games using internal + external planning. 👇 Here's a talk now on Youtube about it given by my awesome colleague John Schultz! www.youtube.com/watch?v=JyxE...

Explainability focuses on finding *directions* in representation space that correspond to concepts, and strong LRH posits that this may be the only kind of representation to look for. Not so, counterexample given where magnitude matters orthogonally. aclanthology.org/2024.blackbo...

"Titans", as opposed to Transformers, treat attention as short-term memory and extend the possible context window by using an additional neural memory that lives just as long as in-context document ingestion and query ("test time"), and controlled by surprise and decay. arxiv.org/pdf/2501.00663

Qwen released a 72B process reward model (PRM) on their recent math model. A good chance it's the best PRM openly available for reasoning research. We like Qwen. https://buff.ly/4gQV9wt

They found it helpful to pretrain by masking only nouns, verbs, and named entities, and only one at a time, rather than a random set of tokens, for languages where data are scarce.

The disadvantage of writing one big review of 2024 is that individual sections get lost in the noise - this part about both the improvements and deteriorations in terms of environmental impact of LLMs probably deserved its own separate post

Google just released TimesFM-2.0 (Time Series Foundation Model - jax & pytorch) on Hugging Face with a significant boost in accuracy and maximum context length. It is a pretrained time-series foundation model developed by Google Research for time-series forecasting. huggingface.co/google/times...

25 AI Predictions for 2025 (and a review of my almost entirely correct predictions from 2024) open.substack.com/pub/garymarc...

My prediction in 2010 was that we would have more autonomous cars than human driven ones on the road by 2030, and I guess we'll see, but an important take is $ would be better spent on improving public transit and infrastructure. Better driver assistance is cool too, I guess. Yay LLMs, despite hype!

It's very fashionable to keep criticizing LLMs as "glorified autocorrect"s. I'm curious how one explains the ability to execute this prompt beautifully as an "autocorrect". (And yes, I've used many other language systems: Duolingo, Mango, Pimsleur, etc.)

Published version, here: van Rooij, I., Guest, O., et al. Reclaiming AI as a Theoretical Tool for Cognitive Science. Comput Brain Behav 7, 616–636 (2024). doi.org/10.1007/s421...

Genius! For medical device communication, do not use wireless, which is easy to snoop or jam, nor implant actual wires, ugh, but use the human body itself "as the communication medium for the devices in someone's body-area network." #IoBodies wow.

Basically think of the o3 results as validating Douglas Adams as the science fiction author most right about AI. When given longer to think, the AI can generate answers to very hard questions, but the cost is very high, it is hard to verify, & you have to make sure you ask the right question first.

"The Free Software Foundation announced they are pursuing freedom in machine learning while not being limited to just the software but also the training data as well": The Free Software Foundation Finally Has AI / Machine Learning Apps On Their Radar - Phoronix

A lot more encoding happens than generation, because e.g. to find query-relevant documents you encode them all and look for similarities in the encoded space. Improvements in encoding are thus less visible but perhaps more impactful from sustainability and quality viewpoints.

Dear god does this really all need to happen approximately 2 days before end of days! Ill come back to you on this one 🤣 @create-glasgow.bsky.social www.gov.uk/government/c...

People are right now slobbering pretty hard over this AI tutor demo over on LinkedIn. I think it's a mess—pedagogically, socially, and mathematically. What do you notice?

Amazing line up of speakers this afternoon, glad I chose to attend @suhr.bsky.social talked about interactive language use in games, specifically their latest project on studying how people cooperate/talk in Portal 2 Tom Griffiths, among other insights, showed tasks where CoT*hurts* performance!

Used Gemini Live tonight to discuss CA’s new law AB 2013. The conversation flowed easily and could definitely help with brainstorming complex ideas. #EduSky #AIEdu

U.S. math scores drop on major international test | https://buff.ly/3VNRNSv

Google quietly updated their ngrams viewer again this year. The books used appear to be extremely different yet again--the rate of the words "she said" are about 60% what they were in the 20th century compared to the 2019 release, and just 20% compared to 2009. But there's a catch:

The new Deep Research feature from Google feels like one of the most appropriately "Google-y" uses of AI to date, and is quite impressive. I've had access for a bit and it does very good initial reports on almost any topic. The paywalls around academic sources puts some limits around it, though.

"Lovotics" may be a fun new term, but we've literally and literarily been discussing this since the term "robot" was coined in Karel Čapek's R.U.R.