Profile avatar
pelayoarbues.com
Head of Data Science at idealista.com | Film photographer | Ph.D. in Economics | Sharing my digital garden at pelayoarbues.com | Working full remote from the North of Spain
751 posts 6,108 followers 1,188 following
Prolific Poster
Conversation Starter

To write something, you have to do things, collect ideas, process them, put words on the page, structure them, edit them, rewrite them, and publish them to your site — and that’s all for one article. Then, there’s the emotional challenges: writer’s block, imposter syndrome (...)

Tenéis que leer esta reseña del concierto de Duran Duran dirigido por David Lynch (?!).

Software engineer job postings are at a 5-year low, as per data from Indeed (which is a pretty good data source). But how bad are things, really? I looked closer and found that it's... probably not as bad as circulated on social media (cont'd) Also: no, it's not the end of software engineering...

"Any given executive is almost always uncannily good at one way of consuming information. They feel most comfortable consuming data in that particular way, and the communication systems surrounding them are optimized to communicate with them in that one way." By @lethain.com

I would argue operational clarity is exactly the opportunity many data teams have been looking for. So many leaders we speak to say something like “We can do so much more.” Adding operational clarity is something undeniably valuable that data folks are uniquely positioned to do well.

Gobierno y oposición acuerdan empezar una guerra civil para volver a librarse de la Guerra Mundial. www.elmundotoday.com/2025/02/gobi...

Better products, services and experiences are usually the result of the employees who invented, innovated or supplied them. As soon as people are put second on the priority list, differentiation gives way to commoditization. And when that happens, innovation declines (...)

I once read somewhere on the internet: deploying is easy if you ignore all the hard parts

Being able to set ego aside and be ruthlessly critical of yourself and your division is hard. It’s not just about where you have gaps today but also about what you’ll need in the next three to five years.

TFW you realize how many hundreds of billions have been made by first convincing people that structured data stores are too hard to set up and maintain, then 5-10 years later convincing the same people that unstructured data stores are not conducive to fast and reliable analysis. Rinse, Repeat.

"You can’t manage what you can’t measure” is a maxim that is taught and believed by many in both the business and education sectors. But in fact, the phrase is ridiculous. A large portion of what we manage can’t be measured, and not realizing this has unintended consequences

The growth in building footprint area between 1950 and 2020 in Spain is impressive. The dataset was published in ESSD, #HISDACSpain 100 x 100 m resolution is available for free use. More information can be found here 🧑‍💻 essd.copernicus.org/articles/15/... #OpenAccess #OpenData #geography

We need to be tough in the tech industry, but we don’t have to be cynical all the time. It's important to celebrate those who create ideas and shape the future by influencing the narrative in a healthy way. I'm grateful for people like Chip Huyen in this industry.

Just tested Mistral's Le Chat, works really well and super fast. This prompt did a great job: chat.mistral.ai/chat.

This paper is wild - a Stanford team shows the simplest way to make an open LLM into a reasoning model They used just 1,000 carefully curated reasoning examples & a trick where if the model tries to stop thinking, they append "Wait" to force it to continue. Near o1 at math. arxiv.org/pdf/2501.19393

@alphaxiv.org used Gemini 2 Flash to build Cursor for arXiv papers Highlight any section of a paper to ask questions and “@” other papers to quickly add to context and compare results, benchmarks, etc.

Facing difficult problems starts by feeling incompetent. It happened to me with a recommendation system during my time as a consultant. This week I share how I managed the pressure through reading, introspection, and the help of my teammates www.pelayoarbues.com/notes/A-reco... #databs

Would you like to join the premier undergraduate summer internship in economics? It is time to apply! This is a unique opportunity to learn about research in economics by spending a month at CEMFI in Madrid. www.cemfi.es/programs/int...

If your customer support LLM thing says "I've sent this conversation to our team of specialists" and doesn't actually do that, that's not a hallucination, it's an unforgivable bug

Fascinating to watch the US do what most economists say is a terrible idea for the economy: introduce tariffs. They are doing it, countries hit are responding by counter-tariffs and we’ll see tariff wars play out. In months (or years?) we’ll see if the economists were right.

Reading William Zinsser’s “On Writing Well.” Maybe my 4th or 5th book specifically on the craft of writing. Common themes I’m noticing: 1. Adverbs and adjective should be used sparingly. Don’t beef up a weak verb, use a better verb. 2. Use fewer words. Cut anything not necessary, not doing work.

We think we can easily see into the hearts of others based on the flimsiest of clues. We jump at the chance to judge strangers. We would never do that to ourselves, of course. We are nuanced and complex and enigmatic. But the stranger is easy.

Mistral Small 3 is out - Apache 2.0 licensed and it claims equivalent performance to Llama 3.3 70B and Qwen 2.5 32B (two of my favourite models for running on a laptop) but with faster performance My notes, including running with LLM via both their API and Ollama simonwillison.net/2025/Jan/30/...

2024 Obsidian Gems of the Year results are in! Explore the 33 winning projects across seven categories: – Best new plugins – Best new themes – Best existing plugins – Best tools – Best content – Best templates – Best integrations obsidian.md/blog/2024-go...

Priorities for my personal site: 1. I can write and publish directly from Obsidian 2. I can preview the site offline 3. I can switch hosts easily, all the data is in my control CMSes like Wordpress, Squarespace, Webflow add complexity nad liability while being less customizable.

Staying late or arriving early or staying home to work in peace is a damning indictment of the office environment.

A while ago, I wrote about the rise of the Dataset Engineer (someone skilled at annotating data efficiently) www.pelayoarbues.com/notes/The-Ri.... Love this post by @danielvanstrien.bsky.social on combining DeepSeek for annotation and ModernBert for classification danielvanstrien.xyz/posts/2025/d...

In ML, weak learners like decision trees have been incredibly effective when combined. Algorithms like Random Forest and XGBoost turn these weak learners into strong models. What if we applied this same principle to LLMs?

Europe could do the same

¡Hola!👋 Como más vale tarde que nunca, os cuento… ¿qué podéis encontrar aquí si me leéis? #InteligenciaArtificial sin hype, desde el conocimiento y experiencia de trabajar en esto muchos años, y de investigar en sistemas de diálogo. Temas: chatbots, voz, LLMs, HCI, UX, emprendimiento, educación.

Last few days to apply before the first round of admissions!

Abrimos posición en Ciencia de Datos! Con 8 años creando proyectos para idealista/data y productos como idealista/maps, seguimos con la mentalidad de seguir mejorando. Equipo senior, alto impacto y bajísima rotación. ¿Te unes? www.linkedin.com/jobs/view/41...

Stakeholders will sometimes pressure you to find data that supports a narrative they have already created in advance. While playing along with this might score you some points in the near term, what will help you in the long term is being a truth seeker.

The new ability of AI video creators to add real people and products to scenes with just an image is likely to increase the utility (& more worryingly, misuse) of AI video. Here I made Shakespeare at a cafe and the Girl with the Pearl Earring piloting a mech (just as Vermeer intended)

Fridays in my team are for sharing readings, a “digest” of links that started small but grew into a weekly ritual. To save time, I built a Python tool that classifies and summarizes articles. Inspired by @simonwillison.net now I am sharing a Link Blog publicly www.pelayoarbues.com/notes/Buildi...

The raw chain of thought from DeepSeek is fascinating, really reads like a human thinking out loud. Charming and strange.

My main take away of the Deepseek paper is not scientific but organizational: we need an European industrial plan in AI right now. No safety summit, no peppered compute grants, no funding processes that take two years.

Big Tech billionaires had a front row seat at Trump's inauguration. They were seated in front of Trump's own cabinet. Tells you everything you need to know.

Spatial prediction methods for geostatistical data such as disease prevalence 🪰 precipitation 🌧️ contaminants 🏭 household prices 🏡 🔗 www.paulamoraga.com/book-spatial... #rstats #rspatial #GISChat