giffmana.ai - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

index the codebase first. Then ask o1 in "codebase chat" the same questions you would like to ask the author/owner of the codebase. Mostly useful when digging into unknown/new codebases and trying to understand them. Or asking about possible bugs :)

submitted 132 days ago

comment in response to post

yes, I noticed that over Christmas break and ever since, it's just... a lot more boring here. My first reason to open social media is to be entertained. Second reason to entertain. Maybe last third to learn something new. Not much of any of these here. I know I know, be the change and all that.

submitted 132 days ago

comment in response to post

It reeeeally depends what are loss1 and loss2, both regarding what’s standard and what’s wasted. I honestly think you are confused, the three codes in three different posts of you mean three different things. I don’t mean it in a negative way, but clearing it up would take more time than I want :/

submitted 170 days ago

comment in response to post

Yeah it seems either he had a mistake in the OP, or the subject of the discussion has drifted :)

submitted 170 days ago

comment in response to post

Until we got good enough AI supported search, no, you can’t realistically expect them to find anything and everything from the past 30 years when the vocab and everything changes.

submitted 170 days ago

comment in response to post

Well, no, two things: 1. In the OP indeed *both* formulations waste compute, so yeah :) 2. In 2nd post, you are not doing the same thing as in your OP! In your 2nd, you are doing good old micro batching which indeed the second way is the standard way. So what you say keeps changing O.o

submitted 170 days ago

comment in response to post

I would enjoy that meeting ;)

submitted 170 days ago

comment in response to post

Yeah it’s silly to expect the new generation to know everything the old generation did, doing so shows complete lack of empathy.

submitted 170 days ago

comment in response to post

If the two graphs are completely disjoint, then there is no point in this. If they have some commonality (like model) then this does the common part twice.

submitted 170 days ago

comment in response to post

I’m somewhat confident both of these are sins lol second one wastes a ton of compute!

submitted 171 days ago

comment in response to post

Not quite because this one is not stacked, so I give it better chance to scale:

submitted 171 days ago

comment in response to post

That’s what the globe was for!

submitted 175 days ago

comment in response to post

One of the physics of llm papers studied that and found you need a certain amour of repetitions of a factoid before it’s memorized. Repetition can be either multi epochs or just the same fact in another document. Number of needed repeats is also related to model size.

submitted 176 days ago

comment in response to post

OK OK I’ll admit it, I’m feeding off your fomo! There can’t be enough fomo! Mmmm fomo!

submitted 177 days ago

comment in response to post

Yeah they compress videos to shit here, and are considering making good quality videos a paying feature.

submitted 177 days ago

comment in response to post

No talk but two posters, I’m just a middle author but will try to be there (locca and NoFilter). That being said my main occupation here will be meeting many of my new colleagues.

submitted 179 days ago

comment in response to post

lol exactly. I said cdg whenever possible. That being said, our flight is operated by AirCanada, including a layover in Canada, that is a lot worse.

submitted 180 days ago

comment in response to post

Oops! Thanks

submitted 180 days ago

comment in response to post

This afternoon flight? I’m taking that too

submitted 181 days ago

comment in response to post

Aye, finally I can without it being weird lol

submitted 181 days ago

comment in response to post

submitted 181 days ago

comment in response to post

Yep, and related to this: bsky.app/profile/giff...

submitted 182 days ago

comment in response to post

I’m confused, why not nbviewer.com which has existed and working for a decade?

submitted 182 days ago

comment in response to post

huh indeed got lost. nvm then, deleting.

submitted 182 days ago

comment in response to post

All the connections you 3 mention aren't wrong, but the OP figure is about origin/influence, and I'm telling you nonlocal had epsilon influence on ViT invention or development. The arrow should come from BERT, but that's not on the pic, so next best arrow source is Transformer.

submitted 182 days ago

comment in response to post

Yes. ViT is not inspired by nonlocal paper at all, and I may even have been the only person (or close to that) in the project who knew nonlocal paper. I’m not saying this as a hater, I liked nonlocal:

submitted 183 days ago

comment in response to post

Yeah but the arrow from non-local to ViT is incorrect.

submitted 183 days ago

comment in response to post

I agree 😬

submitted 184 days ago

comment in response to post

Bought it, will frame or something 😁

submitted 184 days ago

comment in response to post

*As mentioned: Andreas, @asusanopinto.bsky.social and @mtschannen.bsky.social did a heroic effort here and deserve all the credit for these models and the interesting report. I merely supported, advised, and handed things over. Keep a close eye on these guys!

submitted 184 days ago

comment in response to post

Thanks :)

submitted 184 days ago

comment in response to post

mais bon*

submitted 184 days ago

comment in response to post

Oui j’étais un peu perplexe la première fois que j’ai vu ça… mais moins au moins ils sont cohérents et traduisent vraiment TOUT lol

submitted 184 days ago

comment in response to post

Top, merci François! Mais… « cadre d’apprentissage » c’est « learning framework » ou quoi? O.o

submitted 184 days ago

comment in response to post

Thanks!

submitted 184 days ago

comment in response to post

Most cli tools you can keep. Install homebrew.

submitted 185 days ago

comment in response to post

Thanks :)

submitted 185 days ago

comment in response to post

Thanks Tobias! I owe you and Marco a lot :)

submitted 185 days ago

comment in response to post

Waiting for the core first authors thread :)

submitted 185 days ago