federicovaggi.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Can you imagine the analyst that wrote the first three drafts of the report and would keep getting them sent back telling them they needed to add at least 3 orders of magnitude and to keep tweaking the models until they get the right answer?

submitted 14 days ago

comment in response to post

Going through the replies of this thread is a great way to identify exactly the people you don't want to interact with on social media.

submitted 14 days ago

comment in response to post

You could also probably do this using Cuturi & Blondel soft dtw trick right? I love that work on differentiable dynamic programming by smoothing max. arxiv.org/abs/1802.03676

submitted 20 days ago

comment in response to post

In a GP with an RBF kernel is the bandwidth parameter a parameter or a hyper parameter? :). It’s typically set by gradient descent but the objective is marginal likelihood not maximum likelihood!

submitted 31 days ago

comment in response to post

Salammbô ?

submitted 39 days ago

comment in response to post

This is of course assuming the input datasets are small enough that optimizing the model is not actually a bottleneck - otherwise you have to use first order optimizers (SGD, SAGA, etc), which will be considerably more sensitive to numerical issues.

submitted 40 days ago

comment in response to post

OLS is just the loss function, so it depends on how you actually optimize it: looking at R, it uses QR decomposition by default which should be quite robust - in practice, I think SVD will be even more robust since you can just drop the problematic singular values.

submitted 40 days ago

comment in response to post

The only thing is you might run into numerical issues, especially if you are doing logistic regression. Also, depending on which flavor of statistical testing you are doing after the model is fit, some tests adjust for the number of covariates.

submitted 40 days ago

comment in response to post

For example - if you propose an improvement over the baseline transformer architecture - all reviewers at serious venues will ask you to show how your performance changes relative to the baseline - IE, ablating your fancy modification.

submitted 51 days ago

comment in response to post

These are commonly known as "ablations". IE - you want to see how performance improves if you add a given module to a neural network - so you compare the performance of a baseline model without the addition relative to the performance with the new module.

submitted 51 days ago

comment in response to post

Sure and discussions about IP etc are necessary and wholly welcome, but, instead I just see a sea of negativity that lacks any kind of nuance. Mind you, the other site is bad in all sorts of other ways, I just wish I could have the best of both worlds!

submitted 58 days ago

comment in response to post

Bluesky is completely unusable for any discussion of gen AI

submitted 58 days ago

comment in response to post

There was a quote I vaguely remember from a long time ago (I think from @wang.social but might have been Teo Oliphant) about how Scientific Python was built by PhD students slacking off on their thesis.

submitted 62 days ago

comment in response to post

not my field of econ but: the size of these tarriffs is enormous and if so, and if they are held and imposed for years, i suspect this will make brexit look like child's play

submitted 67 days ago

comment in response to post

I totally agree that the residual is huge though!

submitted 72 days ago

comment in response to post

If there was a way to specify this in a way that was falsifiable (I don't think there is) I would be happy to have a friendly wager with a donation to givewell or what have you though.

submitted 72 days ago

comment in response to post

I think in the very short term, dems will almost certainly get significant wins (thermostatic backlash vs Trump + mid term election). I don't see a way for dems to win enough senate seats in the short/medium term without significant moderation or greater tolerance for heterodoxy.

submitted 72 days ago

comment in response to post

I mean that depends what you mean by *vibes* right? Because in the current discourse sometimes people invoke *vibes* to mean to say that there's no need to moderate ideologically - and - I think in the short term, that's likely wrong, in the long term, I'm agnostic.

submitted 72 days ago

comment in response to post

Maybe you can leverage the Athey/Imbens surrogate index work - but - it's genuinely a really hard problem, so people really should have a lot of epistemic humility on the topic... and yet, people have extremely strong convictions that are absolutely not justified by our current understanding.

submitted 72 days ago

comment in response to post

The short term causal effect of persuasion is already difficult to study (not my area, but Broockman did some work I thought was quite rigorous on canvassing) - but - rigorous research on long term persuasion is basically non-existant, so people mostly fall back on priors.

submitted 72 days ago

comment in response to post

www.youtube.com/watch?v=VGKc... fits the mood

submitted 72 days ago

comment in response to post

One thing you are in a unique position to write about is how to avoid sycophancy and not let that warp your thinking. The sms that were part of the discovery in the Twitter vs Musk lawsuit were genuinely enlightening, his “peers” were all constantly sucking up to him.

submitted 78 days ago

comment in response to post

yeah the political monoculture of blue sky means that on any topic views are split among political lines, you basically only get the view of the 10% most left leaning people in the US

submitted 79 days ago

comment in response to post

I think those examples are particularly challenging because you can usually predict with relatively high likelihood how people will respond, even though it’s ultimately their choice to respond in a particular way.

submitted 86 days ago

comment in response to post

Interesting post. What do you think of the categories where action a can be predicted to cause agent b to act in a particular way which is not desirable, even though agent b is a moral agent and they could, theoretically choose not to act that way?

submitted 86 days ago

comment in response to post

Wonderful article, and what a great reason to share your pet pictures. Thank you!

submitted 90 days ago

comment in response to post

USAID was the most moral part of the U.S. budget. We should significantly increase it, especially the public health interventions (some of the other initiatives were kinda silly but even if you count those as waste the program as a whole was incredibly effective)

submitted 91 days ago

comment in response to post

You have to take Trump stupidly, but not literally

submitted 97 days ago

comment in response to post

It’s not the primary factor, but it definitely did not help.

submitted 97 days ago

comment in response to post

Another way to put it: what fraction of SAE vectors give rise to steerable outputs relative to non sparse ones?

submitted 98 days ago

comment in response to post

What about the results that show that intervening on the activations leads to meaningfully different outputs? That’s pretty persuasive to me.

submitted 98 days ago

comment in response to post

Gender polarization happening everywhere. Don’t love this trend at all.

submitted 105 days ago

comment in response to post

It’s the fault of a junior employee who somehow snuck a PR altering something minor like the system prompt past the senior employees: x.com/ibab/status/...

submitted 105 days ago

comment in response to post

Barring second order effects, the problem isn’t that the media isn’t conveying the seriousness of the current moment to its readers, it’s that it is not reaching a lot of people.

submitted 105 days ago

comment in response to post

The kind of person that reads the news is not “typical”. Someone who regularly reads a mainstream newspaper is almost certainly a Dem voter.

submitted 105 days ago

comment in response to post

Totally fine with an efficient approximate posterior (over the predictions, you can treat all parameters as nuisance parameters) - and - I have explored some ideas that seemingly work quite well in practice. I'm just curious if there's good theory on the topic or if this is a known solved problem!

submitted 106 days ago

comment in response to post

The intuition is that the parts of theta that are hard to efficiently sample (because they are underconstrained by data) won't affect predictions very much as long as you are not trying to extrapolate too much (ie, X_test somewhat close to X_train, for appropriate notion of close).

submitted 106 days ago

comment in response to post

This feels vaguely like PAC-Bayes. In practice, it feels that knowing what "X_test" is at inference time should give you the opportunity to do better sampling, rather than inferring theta in an agnostic way, and then pushing that forward through f.

submitted 106 days ago

comment in response to post

On phone, so cleaned this up very slightly with Gemini: Informally: are there instances where it's very difficult to get a posterior distribution over the parameters, but, easy to get a posterior distribution over predictions? Trivial example - linear model with perfectly colinear features.

submitted 106 days ago

comment in response to post

Adaptive algorithms (I mean - broadly speaking, things in the family of Levemberg-Marquardt, NUTS, etc) are really gnarly to implement efficiently on modern hardware, because, if you try to parallelize them, you'll have different elements within a batch that take different numbers of adaptive steps

submitted 109 days ago