Profile avatar
guha-anderson.com
hacker / CS professor https://www.khoury.northeastern.edu/~arjunguha/
68 posts 182 followers 75 following
Regular Contributor
Active Commenter
comment in response to post
Are you hiring new grads (BS) for this kind of work? I can suggest some people.
comment in response to post
I distinctly remember the moment in grad school when I realized I was not going to learn any more PL by taking classes. I fell bad for an instant, and then moved on.
comment in response to post
Yes. Still there. Also the pinball machine, the PDP, and @shriram.bsky.social .
comment in response to post
I think language devs can help in a few ways. Benchmarking is the easiest for us to do and necessary to guide LLM development. I’ve been meaning to writeup my experience being only PL person in the room for the StarCoder LLM development process. It was very informative.
comment in response to post
Or, ask these products to write a 2 page ICFP workshop paper in one’s area of expertise. OK if it’s incremental, just has to be novel for 2025 and clearly positioned wrt related work. I know PhD students who can do this.
comment in response to post
Our tech report has more fun examples of short prompts that make reasoning models crunch for several minutes or longer: khoury.northeastern.edu/~arjunguha/m...
comment in response to post
If you want to read some deranged thoughts from frustrated models (R1 and Gemini Thinking), check them out here: huggingface.co/spaces/nuprl...
comment in response to post
We believe our benchmark is out-of-domain for DeepSeek-style models: RL with verifiable rewards on math and programming. It’s remarkable that they generalize to this type of verbal reasoning. But, perhaps there are limits to what can be done with verifiable rewards exclusively.
comment in response to post
However, many problems are so hard that reasoning models “give up” – they output solutions that they know are wrong or argue that the problem is impossible to solve. In some cases, R1 gets stuck “thinking forever”. (See this example of R1 getting “frustrated.”)
comment in response to post
Our benchmark reveals capability gaps and failure modes that are not evident in existing benchmarks. E.g., we find that o1 is significantly better at these tasks than other reasoning models.
comment in response to post
In short, we turn the weekly puzzles from the NPR Sunday Puzzle Challenge into a machine-checkable benchmark. These are hard problems, typically solved by a few hundred people a week. But, the answers are obvious when revealed (to U.S. adults).
comment in response to post
Instead, students often don’t understand what details they need to specify in natural language to elicit the code they want. Students often add unnecessary detail, remove essential details, and even get stuck in loops where each prompt is syntactically distinct but semantically identical.
comment in response to post
We find that it is not about vocabulary: a causal analysis shows that LLMs have no trouble understanding the weirdest language in code comments.
comment in response to post
Although React is now very different from our old work in this space, there is a direct, acknowledged connection to some our low-level techniques from that time: x.com/hupp/status/...
comment in response to post
Sorry to hear that Amazon now has a physical presence in Rhode Island. Stuff used to come sales-tax free from Massachusetts.
comment in response to post
No. I did not understand dynamic dispatch deeply enough to reinvent the visitor pattern. But, my parser is just a big mess of case analysis, and I'd mastered control flow well enough to do that.
comment in response to post
I did manage to hand-roll a bad parser, which was a small leap from regular expressions, which were in C# by that time. However, I failed to figure out how to represent sum types (i.e., for values). My final note from 25 years ago says:
comment in response to post
Does the professionalization achieve its goals for the average professional child?
comment in response to post
Good grief. Given that peer review is just necessary evil, I can’t see how this is good. Kids should just do their own weird research that should not pass review. Notice that the committee-selected topic of social impact is already passé in today’s political climate.
comment in response to post
Many students and teachers have wrongly concluded that people need to learn less because of AI. Counterintuitively, students will actually need to work harder because their boss will assume they can be more productive with AI. So, singularity notwithstanding, learning is more important than ever.
comment in response to post
I think this is exactly the same problem. :)
comment in response to post
From arxiv.org/pdf/2501.00656
comment in response to post
Lots of people to be disappointed with. For example, the people invested in the old curriculum spent years belittling those who do not agree with them, and thus failed to convince everyone else that they may have good ideas.
comment in response to post
Important to keep in mind: there is no way this was not going to happen.We are lucky that Daniel was asked to do it, and lucky again that he said yes. The alternative was that it would be redesigned by someone you do not trust.
comment in response to post
@dbp.bsky.social should respond to specifics. Nobody knows what exactly he will do, other than that it will draw from dcic-world.org. So, a lot of the criticism is at a strawman.
comment in response to post
My position is that instructors being able to change curriculum -- no matter how great others think they are -- is an important part of academic freedom. F1 either gets resigned by someone you trust, or throw out by someone you don't. The old curriculum designers no longer teach it, so it's over.
comment in response to post
Congratulations Yuiry!
comment in response to post
But read what? A high quality implementation of a grade book or simple game? Anything more requires significant domain knowledge.
comment in response to post
@dbp.bsky.social is driving the new first semester course. I'm confident he knows what he's doing. :)
comment in response to post
I see. Idk. I think we should design CS1 by working backwards from what the rest of the curriculum needs. We are already were cramming way too much into the BS in CS. Now we have to also teach students to be better than ChatGPT at programming. It’s a lot to ask for.
comment in response to post
I’m concerned that this may put a ceiling on what one can achieve as a programmer. I’ll admit I am wrong when an exceptional freshman approaches me to do research and says they taught themselves to code with an LLM.
comment in response to post
Right. And almost by definition, the goal of any "work" assigned in school is to learn. If there is no learning objective, it is just pointless work. This makes student use of LLMs very tricky. There are counterexamples, e.g., when the goal is to learn how to accelerate work with an LLM.
comment in response to post
What may be missing: there are certain things that you may not do without an LLM to help you blast through it. I wrote several one-off programs to help with in-class activities last year. I also wrote iPhone apps for myself, e.g., apps.apple.com/us/app/calen... (not worth my time to learn Swift)
comment in response to post
Ignoring cheating, I think you need strong introspection to know when an LLM will help/hinder learning -- far beyond what most college students can. In upper-level classes, I explicitly mark parts of HWs with "do not use LLM" and "use LLM".
comment in response to post
Python? Not yet. We used to teach Racket, so I used Racket: gist.github.com/arjunguha/59...
comment in response to post
great question: may not be possible based on what we know how to teach right now. :) Things I have done in intro: - Explained language modeling over a small finite vocabulary (just the model, not the training algorithm) - Presented SGD without derivates (i.e., `(/ (- (f (+ x h)) (f x)) h)`).