Profile avatar
inesmontani.bsky.social
๐Ÿ’ฅ Founder & CEO @explosion-ai.bsky.social ๐Ÿ‘ฉโ€๐Ÿ’ป Developing spacy.io & prodi.gy ๐Ÿ Python Software Foundation Fellow ๐Ÿง  AI, Machine Learning & NLP ๐Ÿ’ผ linkedin.com/in/inesmontani ๐Ÿ˜ sigmoid.social/@ines ๐Ÿ’ฅ explosion.ai ๐ŸŒŽ ines.io
109 posts 5,918 followers 64 following
Regular Contributor
Active Commenter

Just published part 3 of my blog post series on making beautiful slides for your talks ๐ŸŽจโœจ This one is about presenting technical content and making dry & abstract topics more interesting. Featuring many examples, including talks by Vitaly Meursault & @oxykodit.bsky.social! ines.io/blog/beautif...

These are the kinds of NLP stories I love! TFW you "accidentally" train a great 2 MB (!) task-specific model ๐Ÿคฏ @strickvl.bsky.social

Look what arrived in the mail today! ๐ŸŽ‰ This is the 2nd edition of "Mastering spaCy" by Duygu Altinok and Dรฉborah Mesquita, featuring how to build structured NLP solutions with custom components, and updated content on using models powered by LLMs. You can get it here: www.amazon.com/dp/B0DVBTX2BL

Now that @honnibal.bsky.social is live-streaming spaCy development (check it out if you haven't!) I tidied up our YouTube and added our latest talks & interviews. I forgot how much content there is and it makes me want to get back into doing videos when I have time โœจ youtube.com/c/ExplosionAI

Enjoyed giving my keynote on "What the history of the web can teach us about the future of AI" at PyCon+Web. I wrote it up as a blog post because I think there are many interesting parallels and lessons we can learn: explosion.ai/blog/history... Here are the most important points ๐Ÿงต

Will do another short (~2h or so) stream this afternoon, at 13:00 CET. Last week I was limited in what I could do because the main thing I was working on was build system stuff, and OBS was crashing when I tried to create a desktop view. This week we can look at a more interesting topic.

My video on spaCy layout is now out! This is probably my favorite update from @explosion-ai.bsky.social (and that's saying something!) This package makes it simple to do region detection, table detection, and OCR with just 1 line of Python. Video: youtu.be/quJtzVxoMtE #MachineLearning

Going live now! Join me for the first pilot stream on YouTube: www.youtube.com/live/kViiI5B...

This is such a cool idea! Get an inside look into the life and work of an open-source developer and chat about NLP and more. I'll probably hang out in the chat for a bit as well ๐Ÿ’™

Writing a new talk on "What the history of the web can teach us about the future of AI" ๐Ÿ”ฎ I've wanted to do this for so long because I think there are some great lessons and analogies here. I'll be presenting it for my keynote at PyCon+Web in Berlin on Jan 25 โ€“ hope to see you there!

New plugin coming to Prodigy soon: a visual data dashboard! ๐Ÿ“Šโœจ Manage, view and filter annotations and access data analytics and progress all in a neat web app. If you're using Prodigy and want to beta test it, check out this post for more details: support.prodi.gy/t/prodigy-da...

Happy Birthday to me! ๐Ÿฅณ It feels like a good opportunity to look back at 2024, an eventful but also difficult year for me. So here's my personal review, including travel, talks, writing and various things I did and enjoyed. ines.io/blog/year-in...

Looking forward to joining the panel on PyLadies entrepreneurs and career development at @pyladiescon.bsky.social on Saturday ๐Ÿ’– It's a fully online conference with many cool people and talks, and you can still register! pretalx.com/pyladiescon-...

New post: From PDFs to AI-ready structured data ๐Ÿ“ƒโœจ A deep dive into document processing, layout analysis and a modular workflow for building end-to-end document understanding and information extraction pipelines using PDFs, Word documents, scans and more. explosion.ai/blog/pdfs-nl...

PyData London 2024: A practical guide to human-in-the-loop distillation @inesmontani.bsky.social LLMs have enormous potential, but modularity, transparency & data privacy in industry are challenging. Ines shows us how to use the latest models in real world applications. youtu.be/pgLLgvjZ_FA?...

GLiREL is a zero-shot Relation Extraction (RE) model capable of classifying unseen relations given the entities within a text. Builds upon GLiNER which does 0-shot Named Entity Recognition (NER). Repo: github.com/jackboyla/GL... Model: huggingface.co/jackboyla/gl... h/t @pacoid.bsky.social

Many of you have been asking about PDF table extraction and I finally got around to experimenting with it ๐Ÿ‘€ Here's tabular data converted with Docling + TableFormer, anchored within the document text and accessible as a pandas.DataFrame:

The first version of my spaCy + Docling integration is here: ๐Ÿ“š process PDFs, Word documents & more ๐Ÿ“ structured text-based output via spaCy's Doc object ๐Ÿท layout spans for sections, headings etc. ๐Ÿ”ฎ apply NLP pipelines to PDF contents โœ‚๏ธ chunk your data for RAG github.com/explosion/sp...

NEW: Prodigy now supports multi-page documents in a single view, without losing the efficiency of its card-based design โœจ It's especially cool for images and PDFs, and for building fully custom interfaces. And it was quite fun to build! More details, examples and docs here: prodi.gy/docs/custom-...