federicovaggi.bsky.social
F_vaggi on Twitter. Senior staff scientist at Google X, previously Amazon.
122 posts
158 followers
167 following
Regular Contributor
Active Commenter
comment in response to
post
Can you imagine the analyst that wrote the first three drafts of the report and would keep getting them sent back telling them they needed to add at least 3 orders of magnitude and to keep tweaking the models until they get the right answer?
comment in response to
post
Going through the replies of this thread is a great way to identify exactly the people you don't want to interact with on social media.
comment in response to
post
You could also probably do this using Cuturi & Blondel soft dtw trick right? I love that work on differentiable dynamic programming by smoothing max. arxiv.org/abs/1802.03676
comment in response to
post
In a GP with an RBF kernel is the bandwidth parameter a parameter or a hyper parameter? :). It’s typically set by gradient descent but the objective is marginal likelihood not maximum likelihood!
comment in response to
post
Salammbô ?
comment in response to
post
This is of course assuming the input datasets are small enough that optimizing the model is not actually a bottleneck - otherwise you have to use first order optimizers (SGD, SAGA, etc), which will be considerably more sensitive to numerical issues.
comment in response to
post
OLS is just the loss function, so it depends on how you actually optimize it: looking at R, it uses QR decomposition by default which should be quite robust - in practice, I think SVD will be even more robust since you can just drop the problematic singular values.
comment in response to
post
The only thing is you might run into numerical issues, especially if you are doing logistic regression. Also, depending on which flavor of statistical testing you are doing after the model is fit, some tests adjust for the number of covariates.
comment in response to
post
For example - if you propose an improvement over the baseline transformer architecture - all reviewers at serious venues will ask you to show how your performance changes relative to the baseline - IE, ablating your fancy modification.
comment in response to
post
These are commonly known as "ablations". IE - you want to see how performance improves if you add a given module to a neural network - so you compare the performance of a baseline model without the addition relative to the performance with the new module.
comment in response to
post
Sure and discussions about IP etc are necessary and wholly welcome, but, instead I just see a sea of negativity that lacks any kind of nuance. Mind you, the other site is bad in all sorts of other ways, I just wish I could have the best of both worlds!
comment in response to
post
Bluesky is completely unusable for any discussion of gen AI
comment in response to
post
There was a quote I vaguely remember from a long time ago (I think from @wang.social but might have been Teo Oliphant) about how Scientific Python was built by PhD students slacking off on their thesis.
comment in response to
post
not my field of econ but: the size of these tarriffs is enormous and if so, and if they are held and imposed for years, i suspect this will make brexit look like child's play
comment in response to
post
I totally agree that the residual is huge though!
comment in response to
post
If there was a way to specify this in a way that was falsifiable (I don't think there is) I would be happy to have a friendly wager with a donation to givewell or what have you though.
comment in response to
post
I think in the very short term, dems will almost certainly get significant wins (thermostatic backlash vs Trump + mid term election). I don't see a way for dems to win enough senate seats in the short/medium term without significant moderation or greater tolerance for heterodoxy.
comment in response to
post
I mean that depends what you mean by *vibes* right? Because in the current discourse sometimes people invoke *vibes* to mean to say that there's no need to moderate ideologically - and - I think in the short term, that's likely wrong, in the long term, I'm agnostic.
comment in response to
post
Maybe you can leverage the Athey/Imbens surrogate index work - but - it's genuinely a really hard problem, so people really should have a lot of epistemic humility on the topic... and yet, people have extremely strong convictions that are absolutely not justified by our current understanding.
comment in response to
post
The short term causal effect of persuasion is already difficult to study (not my area, but Broockman did some work I thought was quite rigorous on canvassing) - but - rigorous research on long term persuasion is basically non-existant, so people mostly fall back on priors.
comment in response to
post
www.youtube.com/watch?v=VGKc... fits the mood
comment in response to
post
One thing you are in a unique position to write about is how to avoid sycophancy and not let that warp your thinking. The sms that were part of the discovery in the Twitter vs Musk lawsuit were genuinely enlightening, his “peers” were all constantly sucking up to him.
comment in response to
post
yeah the political monoculture of blue sky means that on any topic views are split among political lines, you basically only get the view of the 10% most left leaning people in the US
comment in response to
post
I think those examples are particularly challenging because you can usually predict with relatively high likelihood how people will respond, even though it’s ultimately their choice to respond in a particular way.
comment in response to
post
Interesting post. What do you think of the categories where action a can be predicted to cause agent b to act in a particular way which is not desirable, even though agent b is a moral agent and they could, theoretically choose not to act that way?
comment in response to
post
Wonderful article, and what a great reason to share your pet pictures. Thank you!
comment in response to
post
USAID was the most moral part of the U.S. budget. We should significantly increase it, especially the public health interventions (some of the other initiatives were kinda silly but even if you count those as waste the program as a whole was incredibly effective)
comment in response to
post
You have to take Trump stupidly, but not literally
comment in response to
post
It’s not the primary factor, but it definitely did not help.
comment in response to
post
Another way to put it: what fraction of SAE vectors give rise to steerable outputs relative to non sparse ones?
comment in response to
post
What about the results that show that intervening on the activations leads to meaningfully different outputs? That’s pretty persuasive to me.
comment in response to
post
Gender polarization happening everywhere. Don’t love this trend at all.
comment in response to
post
It’s the fault of a junior employee who somehow snuck a PR altering something minor like the system prompt past the senior employees: x.com/ibab/status/...
comment in response to
post
Barring second order effects, the problem isn’t that the media isn’t conveying the seriousness of the current moment to its readers, it’s that it is not reaching a lot of people.
comment in response to
post
The kind of person that reads the news is not “typical”. Someone who regularly reads a mainstream newspaper is almost certainly a Dem voter.
comment in response to
post
Totally fine with an efficient approximate posterior (over the predictions, you can treat all parameters as nuisance parameters) - and - I have explored some ideas that seemingly work quite well in practice. I'm just curious if there's good theory on the topic or if this is a known solved problem!
comment in response to
post
The intuition is that the parts of theta that are hard to efficiently sample (because they are underconstrained by data) won't affect predictions very much as long as you are not trying to extrapolate too much (ie, X_test somewhat close to X_train, for appropriate notion of close).
comment in response to
post
This feels vaguely like PAC-Bayes. In practice, it feels that knowing what "X_test" is at inference time should give you the opportunity to do better sampling, rather than inferring theta in an agnostic way, and then pushing that forward through f.
comment in response to
post
On phone, so cleaned this up very slightly with Gemini:
Informally: are there instances where it's very difficult to get a posterior distribution over the parameters, but, easy to get a posterior distribution over predictions? Trivial example - linear model with perfectly colinear features.
comment in response to
post
Adaptive algorithms (I mean - broadly speaking, things in the family of Levemberg-Marquardt, NUTS, etc) are really gnarly to implement efficiently on modern hardware, because, if you try to parallelize them, you'll have different elements within a batch that take different numbers of adaptive steps