meatlearner.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

meatlearner.bsky.social

Machine-learner, meat-learner, research scientist, AI Safety thinker. Model trainer, skeptical adorer of statistics. Co-author of: Malware Data Science

11 posts 39 followers 117 following

Posts 4 Comments 7

comment in response to post

I was surprised at how clear-cut and blatant it was. I mean, two times in a row, closed fingers, correct angle. Meanwhile, Musk has recently issued public support for the far-right wing AfD party, often described as anti-semetic / extremist. www.cnn.com/2024/12/20/m... That + no apology...

submitted 41 days ago

comment in response to post

Nice! Would love to be added (11 yrs in AI, co-author of Malware Data Science, love them NNs)

submitted 55 days ago

comment in response to post

Am I reading this right? Techniques to make the model safe again had almost no effect on non-small models :o.

submitted 90 days ago

comment in response to post

submitted 90 days ago

comment in response to post

submitted 90 days ago

comment in response to post

A response to X is going to be (usually) written by someone socially, politically near X's author, vs some other random piece of content Y. It's extremely hard to take out sycophancy out of an LLM, trained the way we train them.

submitted 90 days ago

comment in response to post

Say a model learns strategy x to minimize training loss --> Later, min(test loss) involves strategy y, but the model regardless sticks with strat x (inner misalignment). Assuming outer misalignment, x can be seen as safer than y. That being said, the better the model, the less this will happen.

submitted 99 days ago