Profile avatar
ryxcommar.bsky.social
Senior document signer @ Docusign, prev: lead risk manager @ Alameda Research
730 posts 6,711 followers 432 following
Regular Contributor
Active Commenter
comment in response to post
don't actually do this, you don't want to get charged for anything under the CFAA
comment in response to post
Nikola Tesla was a small inventor. Today, Tesla is one of the biggest companies in America by market cap. If you have good ideas, the market will reward you and you can become a large company.
comment in response to post
Siggi's goes insanely hard
comment in response to post
I'm on the low end funny enough but it's because I don't post engagement baity things and I sometimes lock my account. It covers by default 2 week periods but if you dont hit a certain threshold of engagement it covers a month (which it did for me in this case)
comment in response to post
you got $112? wow I'm POOR
comment in response to post
why doesnt the gif play. it's this one. fuck you bsky.
comment in response to post
boo hiss this man has a blue check on the other site, read the room jackass!
comment in response to post
credit card interest rate cap would be good in part *because* it would get some people's credit cards taken away
comment in response to post
I don't think much at all tbh, I'm a blank slate right now.
comment in response to post
Hmmm that makes sense in an inchoate way but I need to wrestle with this more before I feel it in my bones. I appreciate your insights!
comment in response to post
fwiw I did already PCA the vectors to orthogonalize them instead of relying on the raw embeddings vectors :) Since the documents I am using are skewed toward a broad but particular type of textual information.
comment in response to post
*perfectly correlated elements, not vectors
comment in response to post
I am out of my depth here though. Never thought about this stuff much until recently when it started mattering a lot for my job all of a sudden.
comment in response to post
because I do believe that Mahalanobis distance should be isomorphic to something like a 'weighted' L2 distance for what I am doing if the documents are all orthogonal. If they're not I think it's like, uhh, if there are 2 perfectly correlated vectors it splits the difference? Idk if I want that.
comment in response to post
thank you!
comment in response to post
The thing I may actually want is more like Mahalanobis distance? I don't know. Since the document elements are not orthogonal the covariance matrix is not a diagonal matrix, but I don't quite know how to think about how that impacts the results.
comment in response to post
And then I would use that to search for all the other fish documents that were not in the original sample of fish documents.
comment in response to post
So if I have a bunch of documents about everything and I am searching for documents about fish, I would take the average of all the fish documents and the fish index inside the embedding vector would have low variance. (Is this still a bad way to think about this?)
comment in response to post
The vector space is! But not necessarily the sample of records.
comment in response to post
Maybe I'm giving Xai's corporate structure and engineers too much credit but I feel like the people knowledgeable enough to fiddle around with that stuff are not the same people Elon Musk calls up to turn the racism dial.
comment in response to post
Pgvector is great but then I would need to set up a job to move this data to Postgres and then back to Snowflake. That's also annoying.
comment in response to post
Well, O(NM), M number of groups N vectors to search. The time it takes to run is the real annoyance of it all, even like searching 1,000 * 10m vectors is just an insane amount of time.
comment in response to post
Not really. I do everything in pure sql and building the UDF is annoying but not impossible; it'd also be annoying (possibly more so) in Pandas. And it's still unindexed so O(N^2) regardless of snowpark or sql.
comment in response to post
Well, I mean I already did the average one. So net 1 and 1. And it's very stupid-- UDAFs in Snowflake do not accept vectors as types so you need to cast ::array to input and ::vector(float, 1024) on the output.
comment in response to post
One to implement weighted cosine similarity, another 2 to implement both average and variance of elements in a vector.
comment in response to post
The issue is I am doing this in fucking Snowflake, which has a vector implementation so immature (no HNSW indices!!!!!) that I'm shocked they even released it to the world, so I'd need to implement 1 UDF and 2 UDAFs. Sigh.
comment in response to post
sum(bool) over (partition by id order by date) as fake_group
comment in response to post
sum(bool) over (partition by id order by date) as fake_group
comment in response to post
there are more people out there like me and i must find them
comment in response to post
you werent kidding
comment in response to post
No I feel like I'm maybe 1/3 of the way through
comment in response to post
I sort of wasn't sure what the big deal was for the first 2 hours but then it kinda gets to you