catforgetter.bsky.social
cellular, modular, interactivodular
62 posts
80 followers
527 following
Regular Contributor
Active Commenter
comment in response to
post
not a website, so it's of dubious linkability, but the BlueSky / atproto paper has the most concise introduction I've found in sections 3.1 - 3.2: arxiv.org/abs/2402.03239
comment in response to
post
is there any available information on what model you're using for topics? also what "ntpc" is? I (think?) am working on something similar for a feed generator and am curious
comment in response to
post
but see also: andymatuschak.org/primer/
comment in response to
post
agree with this, but unfortunately I think bluesky is a bad example right now, as I'm not sure the algorithm they use for the Discover tab is open source (if it is I haven't been able to find it)
comment in response to
post
honestly i'm not even saying they're wrong, just wondering which psychopath over there is sending out rejection emails on christmas eve
comment in response to
post
i'll admit to being a wallcell and surprised by this, but the x-axis is still log scale and spending thousands of dollars per task for something that underperforms STEM grads on ARC-AGI means this is not there yet. regardless, definitely time for me to re-evaluate some things
comment in response to
post
Forgiveness meditation. I sobbed the first two times, third time there was less of an effect so I stopped doing it. probably the biggest happiness ROI of anything I've done www.youtube.com/watch?v=nz0a...
comment in response to
post
my motto? Become Unemployable
comment in response to
post
Looking more at the speech vs. reach docs here:
docs.bsky.app/docs/advance...
they talk about "reach" being determined by the indexing service, which I understand is the AppView. I'm still confused, probably going to have to read the atproto specs
comment in response to
post
I was more referring to evading the account ban rather than trying to scrape data. if bluesky moderation permanently suspends my account, is that just happening at the PDS level? so if I migrate to my own PDS, then would people be able to read my posts? or is the ban happening at the relay level?
comment in response to
post
Okay, so in theory it's designed to be decentralizable:
docs.bsky.app/blog/bluesky...
But I'm still confused about where exactly the permaban occurred. is it at the account host level? in the AppView? both? can you evade the ban by running your own PDS, or do you need a diff AppView altogether
comment in response to
post
I do wish I didn't have to convert to mp4 to post it here, though
comment in response to
post
full stack engineer implies full heap engineer
comment in response to
post
still, if you try to implement the pseudocode in the paper, it won't work. that first discrepancy is a pretty big problem in this case. I dream of a world where prose descriptions of algorithms are automatically generated from source code, though it's unclear to me how feasible that is at present
comment in response to
post
for the latest paper, it's a bunch of relatively minor stuff, like "actually the activation function is not applied to the input layer", and "actually it's not 2 hidden layers of size 64, but one hidden of size 128", and "actually it's not gaussian Xavier weight initialization, but uniform Kaiming"
comment in response to
post
big mood
comment in response to
post
it's horrifying. some real galaxy brain stuff going on here.
comment in response to
post
I haven't seen this variant of Q-learning before, where you only do minibatch updates at the end of an episode, and not during. that seems like a nifty way to stabilize learning without using a target network.
comment in response to
post
MNIST-1d is a smaller dataset (4000 training examples, i.e. 400 per class), but that doesn't explain the performance, since NGC can still attain ~87% accuracy on MNIST restricted to 300 training samples per class. obviously NGC lacks inductive biases for translation equivariance, but so do DT / KNN
comment in response to
post
The biggest surprise so far is how bad NGC is at mnist1d (github.com/greydanus/mn...)
By comparison: for fairly untuned (DT, KNN, MLP) impls, I'm getting (87%, 96.5%, 97%) on MNIST, and (55%, 56%, and 61%) on MNIST-1d
For NGC? ~89% / ~20%. it's not just a little bad, its catastrophically bad
comment in response to
post
I have both unsupervised and supervised learning implemented for NGC: github.com/emptydiagram...
MNIST accuracy is mediocre, topping out at around 89% (validation). (can probably improve a bit with better settling init). This also matches what I'm seeing with the official ngc-learn implementation
comment in response to
post
some day we're going to figure out how to build capable AIs without repeatedly slamming a massive, static dataset into a transformer, but today is not that day
comment in response to
post
i should be catching up on my calculus of variations coursework, but instead I'm trying to implement the base NGC model on MNIST. something is still wrong with my ANGC implementation, so making sure I understand the simpler version first. also, first thing's first, a connectivity diagram