Profile avatar
markptorres.bsky.social
ML Eng @Northwestern, building recommender algos and LLM apps. Building Bluesky feeds @ https://bsky.app/profile/mindtechnologylab.bsky.social. BS (Statistics) @Yale + MS (Computer Science) @UT Austin. Recovering startup tech bro.
170 posts 136 followers 381 following
Regular Contributor
Active Commenter

Claude 4 is the first LLM that has allowed me to actually "vibe code" a decently complicated app in Cursor purely through instructions and markdown files and without having to write a single line of code. Had to intervene a few times in the chat but otherwise really impressive!

I can't believe that in 2025, we can run reasoning models locally. I finally got to try Ollama and QwQ and it's really impressive. Next step is to set up Ollama + Cursor. Can't imagine where things will be in 2026 and beyond. ollama.com/library/qwq mem.ai/p/bf6ew6HSm1...

I still think people should step back sometimes and just think about how far AI has come in the past 5 years. NLP used to be "fine-tune BERT and hope it works" to "do one-shot inference, on any task, using GPT 4o-mini". Can't take it for granted that SOTA AI is an API call away...

The reasoning trace of OpenAI's o3-mini seems like them trying to strike a balance between "we want to keep our reasoning traces IP" and "we want people to think we're being transparent". Still definitely prefer the depth of DeepSeek's traces, though it's still too early to tell.

I just read Stolen Focus and I really recommend it to anyone interested in a holistic systems overview of why it’s so hard to keep your attention on anything. Who could’ve guessed that the key for success is eating healthy, drinking water, sleeping 7-8 hours, exercising, and reading books 🤣

Heard this zinger take at a talk: “Most lay people shouldn’t read scientific papers, even if they think they can, because most people don’t understand that science is an iterative process. There’s no “right answer”, and people do disagree. Even laws are just ideas that we haven’t proven wrong yet.”

NEW: Meta has quietly dismantled the system that prevented misinformation from spreading in the United States. Machine-learning classifiers that once identified viral hoaxes and limited their reach have now been switched off, Platformer has learned www.platformer.news/meta-ends-mi...

I've been experimenting with NotebookLM to read papers in podcast form and it's been great at it! If I add more than 1-2 papers though, I find that the quality suffers. Plus it caps out at ~20 minutes, can ramble, and its adherence to system prompts is iffy. Great tool though!

I wonder if filtering spam in the age of LLMs is similar to designing good CAPTCHAs now, where it's hard to create a filter that catches the best LLMs but is also easy enough for the average person. Especially true since it's hard to reliably tell LLM-generated text from human text.

test post 6

another test post

test post 4

test post 4

Oh wow, LG just released their own open source* LLM. If their published benchmarks are accurate, the 32B model is at least on par with Qwen2.5 (which is already an incredibly strong model), if not better. www.lgresearch.ai/blog/view?se... huggingface.co/LGAI-EXAONE * open weights

another test post

I'm not mad at a baseball player getting paid his money, but its wild to me that MLB has teams that can shell out over $700 million for a player and teams that apparently can't build a stadium without taxpayer money.

I finally learned what Snowflake and Databricks actually do and I now question why I worked for 3 years building essentially an in-house, worse version of what someone with basic SQL knowledge could have done on Snowflake...

The news just came out about the arrest of the CEO's killer and Polymarket is wayyyyy too quick with releasing their latest betting odds 😂

I've never liked tools that try to be "AI writing assistants", but I do like asking ChatGPT to analyze what I've written, give me detailed critique, and then give me line-by-line suggestions for how to improve clarity. Hard to make a tool though that works for everyone's style and use case.