ryxcommar.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

don't actually do this, you don't want to get charged for anything under the CFAA

submitted 12 days ago

comment in response to post

Nikola Tesla was a small inventor. Today, Tesla is one of the biggest companies in America by market cap. If you have good ideas, the market will reward you and you can become a large company.

submitted 37 days ago

comment in response to post

Siggi's goes insanely hard

submitted 37 days ago

comment in response to post

I'm on the low end funny enough but it's because I don't post engagement baity things and I sometimes lock my account. It covers by default 2 week periods but if you dont hit a certain threshold of engagement it covers a month (which it did for me in this case)

submitted 37 days ago

comment in response to post

you got $112? wow I'm POOR

submitted 37 days ago

comment in response to post

why doesnt the gif play. it's this one. fuck you bsky.

submitted 37 days ago

comment in response to post

boo hiss this man has a blue check on the other site, read the room jackass!

submitted 37 days ago

comment in response to post

credit card interest rate cap would be good in part *because* it would get some people's credit cards taken away

submitted 47 days ago

comment in response to post

I don't think much at all tbh, I'm a blank slate right now.

submitted 47 days ago

comment in response to post

Hmmm that makes sense in an inchoate way but I need to wrestle with this more before I feel it in my bones. I appreciate your insights!

submitted 47 days ago

comment in response to post

fwiw I did already PCA the vectors to orthogonalize them instead of relying on the raw embeddings vectors :) Since the documents I am using are skewed toward a broad but particular type of textual information.

submitted 47 days ago

comment in response to post

*perfectly correlated elements, not vectors

submitted 47 days ago

comment in response to post

I am out of my depth here though. Never thought about this stuff much until recently when it started mattering a lot for my job all of a sudden.

submitted 47 days ago

comment in response to post

because I do believe that Mahalanobis distance should be isomorphic to something like a 'weighted' L2 distance for what I am doing if the documents are all orthogonal. If they're not I think it's like, uhh, if there are 2 perfectly correlated vectors it splits the difference? Idk if I want that.

submitted 47 days ago

comment in response to post

thank you!

submitted 47 days ago

comment in response to post

The thing I may actually want is more like Mahalanobis distance? I don't know. Since the document elements are not orthogonal the covariance matrix is not a diagonal matrix, but I don't quite know how to think about how that impacts the results.

submitted 47 days ago

comment in response to post

And then I would use that to search for all the other fish documents that were not in the original sample of fish documents.

submitted 47 days ago

comment in response to post

So if I have a bunch of documents about everything and I am searching for documents about fish, I would take the average of all the fish documents and the fish index inside the embedding vector would have low variance. (Is this still a bad way to think about this?)

submitted 47 days ago

comment in response to post

The vector space is! But not necessarily the sample of records.

submitted 47 days ago

comment in response to post

Maybe I'm giving Xai's corporate structure and engineers too much credit but I feel like the people knowledgeable enough to fiddle around with that stuff are not the same people Elon Musk calls up to turn the racism dial.

submitted 47 days ago

comment in response to post

Pgvector is great but then I would need to set up a job to move this data to Postgres and then back to Snowflake. That's also annoying.

submitted 47 days ago

comment in response to post

Well, O(NM), M number of groups N vectors to search. The time it takes to run is the real annoyance of it all, even like searching 1,000 * 10m vectors is just an insane amount of time.

submitted 47 days ago

comment in response to post

Not really. I do everything in pure sql and building the UDF is annoying but not impossible; it'd also be annoying (possibly more so) in Pandas. And it's still unindexed so O(N^2) regardless of snowpark or sql.

submitted 47 days ago

comment in response to post

Well, I mean I already did the average one. So net 1 and 1. And it's very stupid-- UDAFs in Snowflake do not accept vectors as types so you need to cast ::array to input and ::vector(float, 1024) on the output.

submitted 47 days ago

comment in response to post

One to implement weighted cosine similarity, another 2 to implement both average and variance of elements in a vector.

submitted 47 days ago

comment in response to post

The issue is I am doing this in fucking Snowflake, which has a vector implementation so immature (no HNSW indices!!!!!) that I'm shocked they even released it to the world, so I'd need to implement 1 UDF and 2 UDAFs. Sigh.

submitted 47 days ago

comment in response to post

sum(bool) over (partition by id order by date) as fake_group

submitted 61 days ago

comment in response to post

sum(bool) over (partition by id order by date) as fake_group

submitted 61 days ago

comment in response to post

there are more people out there like me and i must find them

submitted 61 days ago

comment in response to post

you werent kidding

submitted 62 days ago

comment in response to post

No I feel like I'm maybe 1/3 of the way through

submitted 65 days ago

comment in response to post

I sort of wasn't sure what the big deal was for the first 2 hours but then it kinda gets to you

submitted 65 days ago