This is what happens when you have idiot boys who think they know more than they possibly have the capability of knowing told they know everything. Btw has idiot boy heard of Google?
You would think that some Republicans would have some vague survival instinct, because when the full dimensions of the data breach that Musk and his flying monkeys have engineered comes out, their constituents are going to be showing up with pitchforks.
Stuff I could do 25 years ago with Frame or LaTex - tools that also do a ton of other things - now takes subscriptions, the cloud, and a fair amount of constant bandwidthβ¦cool.
I have several friends who teach high school (both public and private) and they are regularly amazed by how computer illiterate their students are. If there isnβt an app or an LLM, the kids canβt figure it out.
Yeah, but thinking you need an LLM to do document conversion!? Yes, I would expect that from some middle manager who has a problem remembering how to print his documents, but these guys? Geez.
They are working on making planes fall out of the sky first, probably shutting down the electric grid, before they get an LLM to successfully write COBOL. So don't get your hopes up.
Hey, Luke. I think you meant to ask Stack Overflow, but you ended up Tweeting it instead.
I'm sorry you seem to be missing a DOGE slack channel in which you can ask someone else to do your homework for you, but this is the real world you entered and people are watching.
Translation from intern-lad:
"Is there a real software engineer who can do what Elon asked me to do? I have all the ill-intent, but none of the talent!"
Yeah he's the dude who got writing out of those scrolls in Pompeii that were turned into pure charcoal bricks by the eruption and now they can be read.
like many smart people he probably thinks he's smart in many areas he's not
Because he doesn't know how to write a script to convert the files. These "geniuses" have access to some of the most sensitive financial info on Americans. WTF could go wrong? π
I use automated conversion stuff for text accessibility and it blows chunks. I very strongly doubt any experimental version of that βAIβ is much better than consumer grade offerings. If something like PDF to text DID work, that would be a holy fuckin grail
Yeah, it was kind of a shock to me when my work provided me with access to Adobe Acrobat and it was... kinda shitty? I was thinking "I've known about this thing 25 years+ and assumed it could do all sort of clever stuff, but really it's 50-50 vs opening a PDF in Word and hoping for the best".
Yeah, as soon as you dip your toe in "it seems straightforward" goes to "document conversion is a competitive multimillion dollar industry" in about 10 minutes
Itβs more readable, but still rife with erroneous best guesses as to anything beyond spelling and sequence. Contextual formatting remains a nightmare β with minimal line spacing, it can make okay guesses as to where carriage returns are, but it still drops headers and footers into the body of texts
i think current bleeding edge does actually do this correctly. like if you put a gun to my head and told me to get a terabyte of pdfs converted to txt with sub 1% error rate i think that is doable
This used to be a big part of the work I did in THE LATE 90S. Like, there was an API. We converted word docs to PDF or HTML en masse. It wasn't complicated.
I've also converted plenty of PDFs but you're right, you do lose stuff. But my understanding is that Adobe makes tools that you can automate that process as well. And that problem is pretty unique to PDFs (and PDF analogs) specifically.
This was posted during the presidential transition. He is trying to figure out how to use LLMs for his tiger team work at the Treasury / wherever they have him operating
I can't decide if this is good (because whatever harm they're trying to do that might rely on actually using the data is wildly unlikely to do what they want) or bad (because it's just going to continue to poison all data in the gov that they touch in hard to fix ways)
Bad. Very bad. As a former cocky 23 year old in the start up place, its a fine question to ask if you are trying to parse data for your own projects. Its another thing entirely to do it with US Treasury transaction data, whatever it might be.
I mean, honestly, for a 23 year old early in their career... it's not actually a fine question to ask unless you want to suck at coding and actually understanding the data you're trying to work with.
A question like this is just an open admission of long term incompetence. There's no learning.
There are many many tools that can do what heβs asking about but his first thought is to look for the most expensive and least reliable way of doing it.
π Golden opportunity here for someone good at programming to screw with this kid...
Take a pre-existing workflow and have it parse PDFs and other documents to replace specific words...
(e.g. "Trump" with "giant sweaty ballsack" or "Musk" with "armpit odor" or "Republican" with "shit-eating wanker")
Is there an LLM made specifically for kicking this douchewad in the nads repeatedly until he quits fucking around with the American people's sensitive data?
I don't know whether I should be angry he's going to fsck it up with an LLM or relieved he's not using a spyware ridden online PDF to Excell converter...
The question is as dumb as asking how to convert a text file to a word document. He is a complete moron. There are 12 year olds with more ability than him.
This is the sort of problem I would need two days to solve and I had two years of college CS classes a decade ago and wrote zero other code before during or after.
No, ETL software is its own industry. Copilot can regurgitate helper functions, it might get the PDF library and tesseract for OCR, but you might as well write it from scratch yourself.
Agreed. I'm definitely not a fan of Copilot. But it does do basic first drafts of simple tasks well enough. I'd never trust a line of code it generated, though, so in most cases, it's faster to just write it myself.
Implying that a human may not be validating code run against the system which manages roughly %20 of the economy? Any other engineer would be immediately fired.
The technical term when under-experienced children are taking shortcuts in ripping apart data they don't fully understand with tooling they understand even less is "script kiddies" (usually misspelled horribly in the manner of an LLM regurgitating random data that is "close enough")
Yes true, but even more so I suspect they have a straightforward (but also very large and complex) data management problem. Like he's trying to improvise a massive all-purpose ETL and doesn't even know what words like data model mean.
This is what scares me about the DOGE initiatives pushing out code so quickly. From what we can tell none of them have any government/public policy/finance experience, but they think they can just stroll in and use β¨AIβ¨ to solve problems people have dedicated their whole lives to.
I genuinely think these kids are true believers in the AI cult, they've been raised for years in an environment of silicon valley tech hype and fart huffing that they think technology is literally magic, and they literally believe that *actually having expertise and knowing things* is obsolete
Sure, there are also nasty ideological motivations for some of them as some have already been exposed as not so secret white supremacists, but I think they really do believe that gutting everything in the federal government and replacing it all with techno-solutionist AI bullshit will actually work.
Yeah i highly suspect that these kids are nowhere near hacker material and instead Musk chose them because theyβll simply do anything he asks and eventually front him with something that appears to work but really doesnβt
They parse plaintext present in most of those formats, and for PDFs they read images of the PDFs using their vision capabilities. There are absolutely 100% LLMs focused on converting text between formats, especially from PDF to markdown/plaintext.
Once you have read the text in a PDF and want to get a more structured output you can use a LLM to extract the data and write it in a JSON for example. Can be relevant for some problems.
Holy crap, this guy's a moron. I thought at least Musk would bring in some bright tech guys; not really bright at anything else, mind you, or with any sense of what it takes to make a system manageable and maintainable, but at least someone who knows their shit. That is just plain stupid.
Ok but LLMs *do* convert PDFs to plaintext, thatβs a huge use case nowadays thatβs advancing by the day (see https://www.sergey.fyi/articles/gemini-flash-2). Basically all large PDFs donβt have text annotations, so need AI to convert them.
Obviously fuck this guy, but weβre picking up on nuthin here
I mean yeah, fuck this guy ofc. I guess he could google it. Thereβs not really any good reason to ever use microblogging platforms, tbf β itβs for fun/community
WOW TIL, sorry I was wrong! Still, OCR was like the quintessential use case for ML when I was in undergrad. I guess I can't speak to the whole industries ongoing usecases, but I can confidently say there is a large contingent of people that find ML-based OCR (and, yes, LLM OCR) very helpful
"Man, I wrote on Stack Overflow, and they said my question wan't specific enough, gave no examples, and I couldn't edit cos I had no reputation. And now my mom wants me to empty the trash..."
I think mine is closer. A guy whose only cooking gadget is an air fryer, faced with a requirement to turn an egg into an omelette, asks how to use the air fryer to do it. Thus demonstrating 1. He has no idea what an air fryer can and can't do. 2. Has never broken an egg and made an omelette.
Iβve been thinking about thisβhardβfor at least 20 minutes. Nothing I can come up with conveys the distinct combination of like, object reference to its own ingredients, conflation, irony, idiocy and likeβ¦the horrifying primal revulsion this combination evokes in a programmer.
It is crazy because document conversion is so bog standard and can be done anywhere all the time. The examples in the question were stuff I was doing in the 90s on 486s.
Like why would I push this to farms of GPUs that will try to do it with an algorithm that tries to select similar docs.
That's why I'm blanking on an adequate analogy. It's taking a *simple, well-understood, easily-available-solutions problem and... doing something with a complex, poorly-understood, expensive tool that may not produce the right results.
Yes! All the time I thought I was terrible at math, it was just that I needed more thorough explanation. I think this happens to a LOT of people and if we get the resources we need, we don't need to struggle so much πͺ
I love breaking down things for my kid. I hated math too (well the work), as I just knew the answer. If I had more plain language concepts it would have been so easier and I wouldnβt have hated it so much
If you can explain complex concepts to children you are very intelligent. And shows you actually know the information. Itβs a hard thing to do for most
I did too! I avoided all math classes after failing my way through Algebra 2 in high school. But then I took an intro class in formal logic and things started clicking, and long story short I ended up doing a computer science degree π
Thatβs awesome. You know I actually do remember when I was in college (in my 30βs) when stuff like that would kind click for me too. My biggest problem was forgetting how I got there for the next problem.
It's super easy to convert from many formats to PDF. Converting from PDF to DOCX used to be essentially impossible but I see that's no longer the case.
It all depends on what his source formats are and what the target format is.
If you were hellbent on AI for this task the last model you'd want is a large language model, given that the relevant schemas are highly constrained, not part of a natural language. It's the equivalent of asking an image gen AI to interpret an SVG and output a PNG of that interpretation.
The only LLM mentioned there is a post-process vision based LLM after OCR, so it doesn't do anything in terms of conversion, and my only question is, why? Once you've run the OCR, why? This seems like straight up training data harvesting rather than LLM doing anything purposeful.
But isn't this a coding wizard who absolutely deserves access to things 99.9% of the government isn't allowed to touch for legal reasons? *eyes roll so hard they develop gravitational pull*
IDK why but this reminds me of the early 1990s when I was trying to train little old ladies in the office to give up their Selectrics because of these things called computers.
That basically stopped being true after millennials, once computer UX got good enough that you didn't pick up a meaningful amount of tech knowledge just trying to watch porn
Yeah Iβve heard stories about Zoomers at their first real jobs not knowing how to locate a file theyβve just downloaded because they previously did all their computing on an iPhone.
In hindsight millennials occupy a really privileged position with computers and the internet. When this stuff was new to us, it was also new to everyone else. We had to learn together. But we were young enough that we werenβt instinctively afraid of new tech, so we dove in head first.
aren't these kids like actually tech bros though? I am sure they used something besides an iphone to get through college, they're not the average zoomer
Of course there are outliers. Iβm just saying that basic computer literacy is not as widespread among younger generations that many of us would assume.
Sure, they'll be better than the average zoomer/alpha, but knowing how to code in a few languages doesn't confer the lived experience of having to negotiate incompatible file types all the time for your hobbies. That wasn't even common for millennials, just not *rare*.
Specifically by abandoning the file manager the smartphone UI model made everything mediated through programs rather than the data itself being primary as a "file".
Which concerned people at the time [it's me, I'm people]
Consequence of this is that people stopped interacting with computers as integrated systems that could be learned, and started interacting with a bunch of programs, all of which were different and many of which were buggy as fuck. Of course understanding fell.
Ran into this a few days ago, one of the systems had hidden file extensions and didn't have acrobat installed, so a coworker didn't realise they were looking at the pdfs they needed.
They stopped teaching kids how to use computers assuming they knew how to do it all, and then we stopped having to use pcs and got phones and now they know nothing. I've seen dozens give up on how to open a task manager or properly search for a file type or understand what basic stuff is.
I don't blame them, It's just that they've been failed and they aren't being taught enough cause its expected they should know things they do not. They have to go more out of their way to learn more so now.
For the record, yes. It's called "print to pdf". Speak not to me about converting PDFs to Excel. It takes so much clean up for the documents we're reviewing that I'm happier going line by line copy and paste.
The squawk I squawked when I saw this diagram. You know when you're totally unprepared for something to be as funny as it is and a laugh just jumps out of your throat before you even really know what you're seeing? @spilth.bsky.social π€£
Yeah, heβs talking about something called Natural Language Processing, which is a different kind of AI that unfortunately is not as advanced as LLMs. Itβs a lot easier for a program to generate bullshit than for it to understand it.
No worries, the 19 year old HR expert, fresh out of high school, crashing staff meetings is the brilliant one in the toddler team. Demanding from staffers to justify their jobs :d
I had expected he would have sent grown-ups, not staged a re-enactment of Four Lions. Presumably there is no-one left even at the Linda Yaccarino level who will do his bidding.
Because OF COURSE you need an LLM for, err, (checks notes,) document conversion? WTAF
This is why you don't hire 20 year olds for this. Just to be clear, there are 30-40 year old geniuses with decades of experience. They're harder to manipulate though.
This guy won a massive prize for participation in some celebrity science thingummy, a boutique research project at U. of Nebraska-Lincoln. Almost ΒΎ of a million! The nepotism is strong with this one.
There are only like a billion scripts for that. The point is that if he were really that badass, he'd already know that. Him asking for an LLM to do that is adorbs
I can't believe I'm saying this, but "in his defense" (threw up in my mouth a little) I think he's asking about automating a batch of those processes, not doing them once.
Worse than that, you don't *want* an LLM for that. This is like speed-running that Xerox image compression bug that caused photocopiers/scanners to mysteriously transpose digits in documents...
Not only do you not need an LLM, even if you really, really wanted to use an AI to do batch conversion instead of one of 10000 easily available regular programs, an LLM would be the exact last AI you'd want to use.
My guess is that he is looking for a model that can parse data in a large database composed of file of various formats. It does exist (I think ?), and it does use a LLM to handle unstructured data.
WTF is he passing it into an LLM and Who the FUCK thinks LLMs currently are secure or accurate enough for this data? (They are not, and may never be, and I'm bullish on LLMs!)
I just wanted to give some context, and say that using LLMs to handle PDFs is not totally irrelevant. But yes of course, at the moment I would not use it for anything essential.
I'm, well, not in my fifties anymore, and even without any sort of computing background other than basic pc use I understand just how screamingly not good this is.
"Hey Jim, I got the right row with my SELECT at last, then deleted it, now I can't find anything"
"What..."
"See - select from inventory where id='345-5646'; delete from table;"
"..."
"Do I code good Jim?"
Comments
π€
How do I print PDF
I'm gen-X and was hand-assembling 6502 on paper at 12 years old.
Christ almighty.
We are getting fucked so hard.
Conversion_LLM_KGBPortal.exe
woke: teaching Hitler Youth how to copy-paste PII into notepad
Thereβs one post that shows a short Python script that calls chatgpt to get ffmpeg arguments b/c reading the manual is too hard.
These arenβt rocket geniuses.
DeepLLM: I see you asked for `rm -rf /`, processing...
https://news.unl.edu/article-2
Maybe he thinks it was a typical example of the problem
But it's probably easier to just write your own ten line program to do it.
I'm sorry you seem to be missing a DOGE slack channel in which you can ask someone else to do your homework for you, but this is the real world you entered and people are watching.
"Is there a real software engineer who can do what Elon asked me to do? I have all the ill-intent, but none of the talent!"
like many smart people he probably thinks he's smart in many areas he's not
1) often dont have a big enough context window for these file types
2) hallucinate when you try to extract data
I gave up after two hours of garbled results and went back to manually typing.
A question like this is just an open admission of long term incompetence. There's no learning.
Like, you're converting JSON to Excel; what do you want the Excel to look like?
Take a pre-existing workflow and have it parse PDFs and other documents to replace specific words...
(e.g. "Trump" with "giant sweaty ballsack" or "Musk" with "armpit odor" or "Republican" with "shit-eating wanker")
Like, kids a genius, but the two are VERY different, and you *don't* send them on live patients on day one.
Elon: "Send em in, what's the worse that can happen!" π€¦ββοΈ
On god"
Would he use DeepSeek?
βHow do I use computerβ
Yeah, no. Now I understand. I have nothing. Nothing at all.
# import and use the llm that will allow me to convert JSON to excel (I will settle for google sheets format) [TAB]
"What are all these launch control locks for? We should just put this on the blockchain."
Yeah I don't have any pedagogy in coding but I stick to my little signal processing and mo one gets hurt. Don't lump me in with those folk.
I learned script kiddie as a normal developmental phase of programmers, a pupae, where one "codes" by jigsawing and reassembling others' code.
It can be derogatory, but it could be a friendly noogie to newbies.
Why learn how a system works when you can get an AI to fart out an approximation of it and go "good enough"?
Why care if any of it works when you expect a golden parachute when it all goes to shit?
Obviously fuck this guy, but weβre picking up on nuthin here
Also that he's exposing government IP and data to AI which might or might not be soaking up the data like a sponge.
so I have no choice but to hold the US treasury hostage
take away format friction or the world economy gets it"
But
He doesnβt know how to use a fucking pdf????
Kid might be a genius, but he's specific to one area, and just been thrown in the other.
Perhaps shuttle pilot vs undersea submarine? Both experts, but you'd not want one driving the other!
https://bsky.app/profile/yourkingmob.bsky.social/post/3lhk3rdyqrc26
This is so dumb.
I think the other option is going to be "how do I fold my paper in half so it's different? Is there AI for that?"
Like why would I push this to farms of GPUs that will try to do it with an algorithm that tries to select similar docs.
Ummm
Huh?
(i know this website looks like it's for kids, but it's good at showing everything with pics. i got through college math with it :)
And oh, this makes me sad.
what is a LMM?
(please don't answer! I do LLMs)
The deepest irony is that they are many times more efficient at the task than the huge probability database.
This kid needs a pack of screws and a hammer, it's about at his level.
When I was doing this often ten years ago, I, um, used Google to find them quickly.
It all depends on what his source formats are and what the target format is.
Not seeing the need for an LLM here.
When all you have is a hammer, everything looks like a nail.
from uslaw import Arrest
def arrest_elon(free_elon, jailed_elon):
cop = Arrest(free_elon)
cop.convert(jailed_elon, start=0, end=None)
cop.close()
arrest_elon("elon", "jail")
we in danger, girl
https://www.newsweek.com/elon-musk-hires-23-year-old-engineer-cut-federal-spending-2026007
Danger!
chatGPT : a horse has six legs
Which concerned people at the time [it's me, I'm people]
Source: teaching my university age children simple file management
Usually.
Technically we've lost an entire Scientific arm that could have been researching the MAR03 code and the JAN31 DNA...
(I made those up, but you can Google which names excel literally deleted on export...)
https://www.reddit.com/r/todayilearned/comments/1fglbck/til_that_20_of_scientific_genetics_research/
Left overlap: (someone help me with this one)
Right overlap: βsticky keys: onβ
123-45-6789
result: -6,711
*terrifying
#TeamToddlers
This is why you don't hire 20 year olds for this. Just to be clear, there are 30-40 year old geniuses with decades of experience. They're harder to manipulate though.
ππ
A DOGE "Engineer"*
*In some jurisdictions (ROFL, Canada, IIRC) you may not legally call yourself an Engineer if you merely have a computer science degree.
https://www.msn.com/en-us/politics/government/who-is-luke-farritor-23-year-old-engineer-hired-to-cut-federal-spending/ar-AA1ypr9s
No LLM outputs PDFs, that would be crazy (tho I guess thereβs people trying to get them to do SVGs so eh someone crazy is prolly working on it lol)
Is the age old and correct question.
WTF is he passing it into an LLM and Who the FUCK thinks LLMs currently are secure or accurate enough for this data? (They are not, and may never be, and I'm bullish on LLMs!)
(and i grew up with a beautiful, chunky compaq laptop running windows xp, lol. im not much older than these guys.)
"What..."
"See - select from inventory where id='345-5646'; delete from table;"
"..."
"Do I code good Jim?"