guha-anderson.com - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Are you hiring new grads (BS) for this kind of work? I can suggest some people.

submitted 61 days ago

comment in response to post

I distinctly remember the moment in grad school when I realized I was not going to learn any more PL by taking classes. I fell bad for an instant, and then moved on.

submitted 81 days ago

comment in response to post

Yes. Still there. Also the pinball machine, the PDP, and @shriram.bsky.social .

submitted 84 days ago

comment in response to post

I think language devs can help in a few ways. Benchmarking is the easiest for us to do and necessary to guide LLM development. I’ve been meaning to writeup my experience being only PL person in the room for the StarCoder LLM development process. It was very informative.

submitted 91 days ago

comment in response to post

Or, ask these products to write a 2 page ICFP workshop paper in one’s area of expertise. OK if it’s incremental, just has to be novel for 2025 and clearly positioned wrt related work. I know PhD students who can do this.

submitted 105 days ago

comment in response to post

Our tech report has more fun examples of short prompts that make reasoning models crunch for several minutes or longer: khoury.northeastern.edu/~arjunguha/m...

submitted 114 days ago

comment in response to post

If you want to read some deranged thoughts from frustrated models (R1 and Gemini Thinking), check them out here: huggingface.co/spaces/nuprl...

submitted 114 days ago

comment in response to post

We believe our benchmark is out-of-domain for DeepSeek-style models: RL with verifiable rewards on math and programming. It’s remarkable that they generalize to this type of verbal reasoning. But, perhaps there are limits to what can be done with verifiable rewards exclusively.

submitted 114 days ago

comment in response to post

However, many problems are so hard that reasoning models “give up” – they output solutions that they know are wrong or argue that the problem is impossible to solve. In some cases, R1 gets stuck “thinking forever”. (See this example of R1 getting “frustrated.”)

submitted 114 days ago

comment in response to post

Our benchmark reveals capability gaps and failure modes that are not evident in existing benchmarks. E.g., we find that o1 is significantly better at these tasks than other reasoning models.

submitted 114 days ago

comment in response to post

In short, we turn the weekly puzzles from the NPR Sunday Puzzle Challenge into a machine-checkable benchmark. These are hard problems, typically solved by a few hundred people a week. But, the answers are obvious when revealed (to U.S. adults).

submitted 114 days ago

comment in response to post

Instead, students often don’t understand what details they need to specify in natural language to elicit the code they want. Students often add unnecessary detail, remove essential details, and even get stuck in loops where each prompt is syntactically distinct but semantically identical.

submitted 124 days ago

comment in response to post

We find that it is not about vocabulary: a causal analysis shows that LLMs have no trouble understanding the weirdest language in code comments.

submitted 124 days ago

comment in response to post

Although React is now very different from our old work in this space, there is a direct, acknowledged connection to some our low-level techniques from that time: x.com/hupp/status/...

submitted 125 days ago

comment in response to post

Sorry to hear that Amazon now has a physical presence in Rhode Island. Stuff used to come sales-tax free from Massachusetts.

submitted 132 days ago

comment in response to post

No. I did not understand dynamic dispatch deeply enough to reinvent the visitor pattern. But, my parser is just a big mess of case analysis, and I'd mastered control flow well enough to do that.

submitted 133 days ago

comment in response to post

I did manage to hand-roll a bad parser, which was a small leap from regular expressions, which were in C# by that time. However, I failed to figure out how to represent sum types (i.e., for values). My final note from 25 years ago says:

submitted 133 days ago

comment in response to post

Does the professionalization achieve its goals for the average professional child?

submitted 133 days ago

comment in response to post

Good grief. Given that peer review is just necessary evil, I can’t see how this is good. Kids should just do their own weird research that should not pass review. Notice that the committee-selected topic of social impact is already passé in today’s political climate.

submitted 133 days ago

comment in response to post

Many students and teachers have wrongly concluded that people need to learn less because of AI. Counterintuitively, students will actually need to work harder because their boss will assume they can be more productive with AI. So, singularity notwithstanding, learning is more important than ever.

submitted 137 days ago

comment in response to post

I think this is exactly the same problem. :)

submitted 137 days ago

comment in response to post

From arxiv.org/pdf/2501.00656

submitted 138 days ago

comment in response to post

Lots of people to be disappointed with. For example, the people invested in the old curriculum spent years belittling those who do not agree with them, and thus failed to convince everyone else that they may have good ideas.

submitted 138 days ago

comment in response to post

Important to keep in mind: there is no way this was not going to happen.We are lucky that Daniel was asked to do it, and lucky again that he said yes. The alternative was that it would be redesigned by someone you do not trust.

submitted 138 days ago

comment in response to post

@dbp.bsky.social should respond to specifics. Nobody knows what exactly he will do, other than that it will draw from dcic-world.org. So, a lot of the criticism is at a strawman.

submitted 138 days ago

comment in response to post

My position is that instructors being able to change curriculum -- no matter how great others think they are -- is an important part of academic freedom. F1 either gets resigned by someone you trust, or throw out by someone you don't. The old curriculum designers no longer teach it, so it's over.

submitted 138 days ago

comment in response to post

Congratulations Yuiry!

submitted 139 days ago

comment in response to post

But read what? A high quality implementation of a grade book or simple game? Anything more requires significant domain knowledge.

submitted 140 days ago

comment in response to post

@dbp.bsky.social is driving the new first semester course. I'm confident he knows what he's doing. :)

submitted 142 days ago

comment in response to post

I see. Idk. I think we should design CS1 by working backwards from what the rest of the curriculum needs. We are already were cramming way too much into the BS in CS. Now we have to also teach students to be better than ChatGPT at programming. It’s a lot to ask for.

submitted 147 days ago

comment in response to post

I’m concerned that this may put a ceiling on what one can achieve as a programmer. I’ll admit I am wrong when an exceptional freshman approaches me to do research and says they taught themselves to code with an LLM.

submitted 147 days ago

comment in response to post

Right. And almost by definition, the goal of any "work" assigned in school is to learn. If there is no learning objective, it is just pointless work. This makes student use of LLMs very tricky. There are counterexamples, e.g., when the goal is to learn how to accelerate work with an LLM.

submitted 147 days ago

comment in response to post

What may be missing: there are certain things that you may not do without an LLM to help you blast through it. I wrote several one-off programs to help with in-class activities last year. I also wrote iPhone apps for myself, e.g., apps.apple.com/us/app/calen... (not worth my time to learn Swift)

submitted 147 days ago

comment in response to post

Ignoring cheating, I think you need strong introspection to know when an LLM will help/hinder learning -- far beyond what most college students can. In upper-level classes, I explicitly mark parts of HWs with "do not use LLM" and "use LLM".

submitted 148 days ago

comment in response to post

Python? Not yet. We used to teach Racket, so I used Racket: gist.github.com/arjunguha/59...

submitted 149 days ago

comment in response to post

great question: may not be possible based on what we know how to teach right now. :) Things I have done in intro: - Explained language modeling over a small finite vocabulary (just the model, not the training algorithm) - Presented SGD without derivates (i.e., `(/ (- (f (+ x h)) (f x)) h)`).

submitted 149 days ago