catforgetter.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

not a website, so it's of dubious linkability, but the BlueSky / atproto paper has the most concise introduction I've found in sections 3.1 - 3.2: arxiv.org/abs/2402.03239

submitted 15 days ago

comment in response to post

is there any available information on what model you're using for topics? also what "ntpc" is? I (think?) am working on something similar for a feed generator and am curious

submitted 16 days ago

comment in response to post

but see also: andymatuschak.org/primer/

submitted 48 days ago

comment in response to post

agree with this, but unfortunately I think bluesky is a bad example right now, as I'm not sure the algorithm they use for the Discover tab is open source (if it is I haven't been able to find it)

submitted 53 days ago

comment in response to post

honestly i'm not even saying they're wrong, just wondering which psychopath over there is sending out rejection emails on christmas eve

submitted 69 days ago

comment in response to post

i'll admit to being a wallcell and surprised by this, but the x-axis is still log scale and spending thousands of dollars per task for something that underperforms STEM grads on ARC-AGI means this is not there yet. regardless, definitely time for me to re-evaluate some things

submitted 74 days ago

comment in response to post

Forgiveness meditation. I sobbed the first two times, third time there was less of an effect so I stopped doing it. probably the biggest happiness ROI of anything I've done www.youtube.com/watch?v=nz0a...

submitted 78 days ago

comment in response to post

my motto? Become Unemployable

submitted 82 days ago

comment in response to post

Looking more at the speech vs. reach docs here: docs.bsky.app/docs/advance... they talk about "reach" being determined by the indexing service, which I understand is the AppView. I'm still confused, probably going to have to read the atproto specs

submitted 96 days ago

comment in response to post

I was more referring to evading the account ban rather than trying to scrape data. if bluesky moderation permanently suspends my account, is that just happening at the PDS level? so if I migrate to my own PDS, then would people be able to read my posts? or is the ban happening at the relay level?

submitted 96 days ago

comment in response to post

Okay, so in theory it's designed to be decentralizable: docs.bsky.app/blog/bluesky... But I'm still confused about where exactly the permaban occurred. is it at the account host level? in the AppView? both? can you evade the ban by running your own PDS, or do you need a diff AppView altogether

submitted 96 days ago

comment in response to post

I do wish I didn't have to convert to mp4 to post it here, though

submitted 106 days ago

comment in response to post

full stack engineer implies full heap engineer

submitted 396 days ago

comment in response to post

still, if you try to implement the pseudocode in the paper, it won't work. that first discrepancy is a pretty big problem in this case. I dream of a world where prose descriptions of algorithms are automatically generated from source code, though it's unclear to me how feasible that is at present

submitted 401 days ago

comment in response to post

for the latest paper, it's a bunch of relatively minor stuff, like "actually the activation function is not applied to the input layer", and "actually it's not 2 hidden layers of size 64, but one hidden of size 128", and "actually it's not gaussian Xavier weight initialization, but uniform Kaiming"

submitted 401 days ago

comment in response to post

big mood

submitted 401 days ago

comment in response to post

it's horrifying. some real galaxy brain stuff going on here.

submitted 402 days ago

comment in response to post

I haven't seen this variant of Q-learning before, where you only do minibatch updates at the end of an episode, and not during. that seems like a nifty way to stabilize learning without using a target network.

submitted 403 days ago

comment in response to post

MNIST-1d is a smaller dataset (4000 training examples, i.e. 400 per class), but that doesn't explain the performance, since NGC can still attain ~87% accuracy on MNIST restricted to 300 training samples per class. obviously NGC lacks inductive biases for translation equivariance, but so do DT / KNN

submitted 410 days ago

comment in response to post

The biggest surprise so far is how bad NGC is at mnist1d (github.com/greydanus/mn...) By comparison: for fairly untuned (DT, KNN, MLP) impls, I'm getting (87%, 96.5%, 97%) on MNIST, and (55%, 56%, and 61%) on MNIST-1d For NGC? ~89% / ~20%. it's not just a little bad, its catastrophically bad

submitted 410 days ago

comment in response to post

I have both unsupervised and supervised learning implemented for NGC: github.com/emptydiagram... MNIST accuracy is mediocre, topping out at around 89% (validation). (can probably improve a bit with better settling init). This also matches what I'm seeing with the official ngc-learn implementation

submitted 410 days ago

comment in response to post

some day we're going to figure out how to build capable AIs without repeatedly slamming a massive, static dataset into a transformer, but today is not that day

submitted 412 days ago

comment in response to post

i should be catching up on my calculus of variations coursework, but instead I'm trying to implement the base NGC model on MNIST. something is still wrong with my ANGC implementation, so making sure I understand the simpler version first. also, first thing's first, a connectivity diagram

submitted 424 days ago