remember that whole story about how ChatGPT aced the bar exam? Oops! OpenAI 100% just lied about that. It didn't happen. Oopsie! www.nytimes.com/2024/05/15/o... - ThreadSky

planetoffinks.bsky.social • 292 days ago

remember that whole story about how ChatGPT aced the bar exam? Oops! OpenAI 100% just lied about that. It didn't happen. Oopsie! https://www.nytimes.com/2024/05/15/opinion/artificial-intelligence-ai-openai-chatgpt-overrated-hype.html?unlocked_article_code=1.sE0.SV0g.r4iVMq0NT6z7&smid=nytcore-ios-share&referringSource=articleShare&sgrp=c-cb

Comments

angusm.bsky.social•292 days ago

"Lied" is a very ugly word. We prefer to say that OpenAI just "hallucinated" their answer to the question about whether ChatGPT passed the bar exam.

alek18.bsky.social•292 days ago

*gasps* y'all I swear to some sparkling unicorn I'm so, so, *so* shocked.

orivandewalle.bsky.social•291 days ago

No? If you read thru paper, it says (a) there are multiple ways to evaluate %tile and GPT4 got <90 using others, (b) they replicated its reported raw score, and (c) the way OpenAI scored the essays was iffy but probably not iffy enough to change the passing score. https://link.springer.com/article/10.1007/s10506-024-09396-9

archaealex.bsky.social•290 days ago

That was my interpretation as well. I’m an AI skeptic but people (including NYT) are blatantly misrepresenting the paper’s results.

planetoffinks.bsky.social•291 days ago

imagine me saying this back to you, but kind of in a mocking voice

orivandewalle.bsky.social•291 days ago

*shrug* ok

wildmonkeysects.bsky.social•291 days ago

It was a hallucinated hallucination.

hsw3k.bsky.social•290 days ago

Remember when there was all that furore over the "AI-generated Drake/The Weeknd song" and it was in headlines for weeks... even though the song wasn't actually AI-generated.

Oopsie! https://www.tiktok.com/@boxoutmusic/video/7224179902394666282

says-rights.bsky.social•292 days ago

Ok but did a human lie or did GPT write the lie and OpenAI just posted it without verification

planetoffinks.bsky.social•292 days ago

thanks for asking, a human lied, which is technically illegal fraud but whatever

says-rights.bsky.social•292 days ago

A more expected but mundane tale than the Probability Guessing Machine tricking its creators into repeating its random guesses for fun and profit

karpovsmachine.bsky.social•292 days ago

This is the way the world ends
This is the way the world ends
This is the way the world ends

tinselfire.bsky.social•284 days ago

Is this the old article revealing it passed in the 48% rather than 90%, or a new one revealing it didn't pass at all?

My antivirus doesn't like NYT for some reason.

ericcarroll.bsky.social•292 days ago

#GenerativeAiIsGoingGreat

nowwearealltom.com•292 days ago

I forget which whether it's NYT or NPR because I Iisten to both of their daily news shows, but one of them has been running an ad for a show on their network for like a year in which a hosts says something like "an AI just passed the bar exam and we're sitting here pretending not to be impressed???"

ragnell.bsky.social•292 days ago

The proper takeaway from AI passing the bar exam would be to retool the bar exam, not be impressed.

hammancheez.bsky.social•292 days ago

Its like a real lawyer fr

7leaguebootdisk.bsky.social•292 days ago

Duh. It's about time. Plus, with both crypto and AI cratering, we will finally be able to afford a good video card.

kevychristian.bsky.social•292 days ago

It did give me a surprisingly effective mashup of Weekend at Bernie's and Schindler's List. Like better than I could have done.

leojallenjr.bsky.social•292 days ago

How about the "movies so good they're going to replace actors and producers"?
Turns out, the amount of prompting, cutting, and editing it took to show what was presented took longer and required much more effort than just regular shoots.

stardustneon.bsky.social•291 days ago

There was a prompt “artist” who tried to demonstrate the AI art was real art because it took him effort to make an image, and showed an extensive breakdown of his process which required hours of generation and a lot of photoshop to look half-decent. All it showed me was how much it wasted time.

leojallenjr.bsky.social•291 days ago

It can be used to generate some interesting things, but I wouldn't call any of it art.

masonicon.bsky.social•291 days ago

current Generative AIs are works by stealing pre-existing works

legalminimum.bsky.social•292 days ago

This is funny because the Bar Exam, which is just regurgitating rules that someone else told you and not a test of intelligence or aptitude in any way, is exactly the kind of thing at which ChatGPT should excel.

blacknell.bsky.social•292 days ago

I wonder if the Virginia Bar would make ChatGPT put on a suit and tie to sit for the exam. Probably.

atlessman.bsky.social•292 days ago

"Fun fact: Felony murder is weird in New York" is about all I can take away from mine.

ilisabat.bsky.social•292 days ago

...I need to get licensed in other states.

(The Louisiana bar exam is also a lot of regurgitation, but it throws some truly disgusting hypos at you to analyze and write essays on)

marypcbuk.bsky.social•292 days ago

Where do 48th percentile on the bar exam lawyers tend to work and is there a joke about bus bench ads here?

dysamoria.com•292 days ago

Not really though. It’s not capable of thought or validation of content. It’s just a glorified autocomplete algorithm.

legalminimum.bsky.social•292 days ago

The Bar Exam requires none of the things you describe though.

zoneleft.bsky.social•291 days ago

I feel like I could do really well on the bar exam if I could use Google

dialogcrm.bsky.social•290 days ago

"Should we as a society be investing tens of billions of dollars, our precious electricity that could be used toward moving away from fossil fuels, and a generation of the brightest math and science minds on incremental improvements in mediocre email writing?" That's a No from me.

winderwinder.bsky.social•292 days ago

Ooof. Ouchy.

onionflower.bsky.social•291 days ago

I tried to use AI to write code one time and it was so bad that I had to rewrite the whole thing. I think there was maybe one line in the short script that functioned

tcbullfrog.bsky.social•291 days ago

I was pairing with a coworker who was using Copilot once. He wasn't very familiar with the language (Rust). He typed a function signature, and it filled out the body. It was complete, compiled perfectly, and safe... 1/2

tcbullfrog.bsky.social•291 days ago

And absolutely did not do what he wanted it to do.

And because he didn't know the language, he was completely unaware. If I hadn't been watching he would not have caught it.

2/2

onionflower.bsky.social•290 days ago

Oh that's so much worse than my experience. Mine just involved it inserting code in other languages

tcbullfrog.bsky.social•290 days ago

Oh, interesting! I wonder if because Rust is so strictly typed that it had an easier time staying within the correct language

wadeesel.bsky.social•291 days ago

I'm so happy to hear that!

winderwinder.bsky.social•291 days ago

If you’re looking to allay your fears about it I recommend @zitron.bsky.social

A lot of the reporting in that New York Times article was covered by him. Good lil podcast about it.

planetoffinks.bsky.social•292 days ago

I once said that all impressive AI feats are just secretly a low-paid human being doing it. And I am humble enough to now admit that's not true. Some of them are just completely made up.

nafnlaus.bsky.social•292 days ago

Except that what's the lie is the claim that it was "made up".

THREAD:

Here's the paper that's the source of the reference. Read it.

https://link.springer.com/article/10.1007/s10506-024-09396-9#Sec11

Let's go down the list of the problems with the claims that OpenAI made it up, one by one.

nafnlaus.bsky.social•292 days ago

1) The original paper about GPT-4 passing the bar exam was from Daniel Martin Katz, not OpenAI:

https://royalsocietypublishing.org/doi/10.1098/rsta.2023.0254

Katz is not an OpenAI employee. He's a Chicago-Kent Professor of Law, and cofounder of 273 Ventures, a legal AI company.

https://kentlaw.iit.edu/law/faculty-scholarship/faculty-directory/daniel-martin-katz

nafnlaus.bsky.social•292 days ago

https://www.danielmartinkatz.com/

2) The source paper here CONFIRMS that the scaled MBE score was calculated correctly by Katz et al.

It does criticize the rigour of the essay grading, in that no rubrick was used and the graders weren't NCBE trained, but that's just as likely to downplay GPT-4 as upplay it

nafnlaus.bsky.social•292 days ago

3) Katz himself noted this issue in his original paper, so it's hardly a "gotcha", and it doesn't change the conclusions that ChatGPT would pass the bar exam, as the authors of the new paper agree (they only raise the issue that the essay score "might" not be as high)

nafnlaus.bsky.social•292 days ago

4) Where did this "48th percentile" number come from? Two things.

4a) Maximally vs. minimally tailored questions:

There are two ways you can ask the AI questions. You can format them up all nice and neat (such as putting quotes around the question), tell it to give ranked choices and proper...

neurograce.bsky.social•291 days ago

Thanks for this!

wliz.bsky.social•292 days ago

You’re not wrong. The training data is mostly scraped from unpaid sources.

jdrch.github.io•292 days ago

As a former #WolframAlpha employee who saw how the sausage was made (read: manually developed apps and calculators for each microniche topic) it give me great comfort to know that we weren't incompetent after all

coldbrewsoymilk.bsky.social•292 days ago

Probably could also say that this made WolframAlpha a very powerful and extensible engine, being able to handle a lot of different queries and give reliable answers back, and not lying a percentage of the time.

jdrch.github.io•292 days ago

Thanks very much! If only the guy who runs Wolfram appreciated any of the effort that went into that.

That was easily the worst job of my career BTW. I swore off software development afterwards

raeleemaycarpenter.bsky.social•291 days ago

I mean. A bot that's programmed to gather specific information when asked will be able to parrot back that information when it's asked specific questions. That doesn't mean it's alive, or conscious, or would be any good at practicing law with real live clients.

notesnthemargin.bsky.social•292 days ago

learning and growing, unlike some AI we know...

jametc.bsky.social•291 days ago

I mean, it is.

it isn't just always a guy in India at a computer, the datasets they train on are all taken from said low-paid humans

maryaed.bsky.social•291 days ago

I'm looking for part-time work right now as an editor/writer and a disturbing number of freelance (of course) listings appear to be for training AI by critically assessing its responses.

Love to train the robots to replace my work with "good enough" content.

erikgunderson.bsky.social•292 days ago

Having taken and passed two bar exams, I can say that the version of ChatGPT I've played with online would be too wishy-washy to possibly pass the bar. It knows Strunk & White pretty well and could be taught to write an essay in IRAC format, but it's not great at doing meaningful factual analysis.

kenmarable.bsky.social•291 days ago

Yep, they're not designed to do any analysis whatsoever, or know anything. At all. They're design on a fundamental level is convincing language *structure* only.

But it ends up that:
good structure + sometimes ok-ish facts + audience wanting to believe + tech corps wanting $$$ = giant hype bubble

kenmarable.bsky.social•291 days ago

It's nothing more than the simple psychological gullibility con artists rely on that if someone confidently speaks like they know what they are talking about, people will believe them - but automated and taken to a massive enterprise level.

chazrunner.bsky.social•291 days ago

AI can be good for setting up structure but terrible for content.

paula72.bsky.social•292 days ago

dorkvania.bsky.social•292 days ago

When this 'ai' stuff first started blowing up I was so careful to not be dismissive of it even as I posted critiques and debunks. Even though I was trying to show it was mostly hype, I way over-estimated it

plrock.bsky.social•292 days ago

Yep! Now excuse me while I go watch a movie on my 3D TV.

mirjam.bsky.social•271 days ago

Didn't someone launch a service to have an 'AI lawyer' file legal paperwork for you, and basically promised it could take over parts of a lawsuit?

ianpaulfreeley.bsky.social•292 days ago

Not going to bother to click through to the article. Just going to assume it was a dude in India who aced the test.

ethanpfreedman.com•292 days ago

aI iS iNeViTaBLe

symbo1ics.bsky.social•292 days ago

unfortunately it kind of is, with the current crop of anti-representatives. rather than curtail this monstrosity, they're expanding it, with public money

https://techpolicy.press/us-senate-ai-working-group-releases-policy-roadmap

ethanpfreedman.com•292 days ago

We might have different definitions of the word “inevitable”

symbo1ics.bsky.social•292 days ago

we probably agree, perhaps my point wasn't made well.

ethanpfreedman.com•292 days ago

I am sure - the stubborn cultural critic in me simply feels the need to point out at every turn that, as Chomsky said, everyone is just out here manufacturing consent!

symbo1ics.bsky.social•292 days ago

That is a great point, one has to be very careful.

willrosecrans.com•291 days ago

Harm from AI does seem inevitable, regardless of whether or not those harms actually lead to strong AI.

billstewart.bsky.social•291 days ago

ChatGPT produced a fairly consistent style of news article about ChatGPT passing the bar exam, using the kind of language an actual news article about that would probably use.

dysamoria.com•292 days ago

Again: this is pathological tech. Just like “self-driving cars”, mainstream VR/AR, and “paperless society”.

fuzzboy13.bsky.social•292 days ago

I finished college in 2000, almost every week I tell the story about how my dad said to me "we're moving into a paperless society" before I even started there 🤷🏻‍♂️

dysamoria.com•291 days ago

In 2000 I did a project for a local pipeline company to see if it could go paperless. OCR did something, but it wasn’t great. Today it seems to have gotten much faster at providing slightly less poor reliability. Speech to text, text to speech, and OCR all lack intelligence. There is no AI.

fuzzboy13.bsky.social•291 days ago

I did help a company move from all paper to much less paper, but there was still a ton of paper being used. I've also worked for people who print off every single email they get.

dysamoria.com•290 days ago

That would drive me crazy. I’m terribly motivated to not waste paper and to recycle, in a culture that hates being environmentally and socially responsible, in a country that refuses to recycle sensibly and therefore can’t get anyone to take their recyclables anymore.

lucytraveler.bsky.social•292 days ago

The paperless society has some real weight to it, especially with the increase in electronic signatures/seals post-Covid

dysamoria.com•291 days ago

It moved the needle a bit, yes.

petitwallaby.bsky.social•291 days ago

I'd really want to know how much money the company I work for wasted on our "Metaverse presence"! But I'm sure it dwarfs what they are currently wasting in being "AI pioneers"!

sababausa.bsky.social•292 days ago

lmao I called it

https://link.springer.com/article/10.1007/s10506-024-09396-9/tables/1

https://bsky.app/profile/sababausa.bsky.social/post/3kmnnqv5fwu2c

zeroisanumber.bsky.social•292 days ago

Holy shit! An AI skeptical article in the NYT that cites Molly White? Someone in Sultzberger's circle of billionaire pals must be getting nervous.

stuckatthisbit.bsky.social•292 days ago

It’s fine if you want a lawyer with 29 fingers

elofla.bsky.social•292 days ago

It has done an impressive job of running a lot of authors, IT professionals and others off of the cloud and from loading anything onto social media. I would say that unreliable intern has pretty much taken over the office.