niladridutt.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

🧵10/10 Lastly, huge thanks to my co-advisors Niloy and Duygu! For more details check out our paper below- 🌐 Project Website: monetgpt.github.io 📄 Arxiv: arxiv.org/abs/2505.06176

submitted 26 days ago

comment in response to post

🧵9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).

submitted 26 days ago

comment in response to post

🧵8/10 Photo editing is subjective 🎨. Our framework adapts to user preference by guidance from natural language tags like ‘vibrant’ or ‘retro vibe’ to produce personalized and stylistically distinct retouching plans from the same input image.

submitted 26 days ago

comment in response to post

🧵7/10 Our puzzle-based training with a 'reasoning as a pathway' approach allows MonetGPT to generate detailed justifications for each edit, delivering truly explainable image retouching

submitted 26 days ago

comment in response to post

🧵6/10 🧩 Puzzle C builds planning capabilities. The model learns to generate a complete, multi-step retouching plan to enhance a photo, structuring its reasoning as a sequence of discrete issues and solutions for clarity and control.

submitted 26 days ago

comment in response to post

🧵5/10 🧩 Puzzle B imparts aesthetic judgement. By ranking professionally edited photos against altered versions, the MLLM learns to recognize the visual characteristics of an optimally adjusted image for any given operation, building an internal aesthetic model.

submitted 26 days ago

comment in response to post

🧵4/10 🧩 Puzzle A builds an understanding of individual operations. The MLLM learns to map visual changes in before/after images to a specific tool and its precise parameter value, effectively learning the semantics of our procedural library.

submitted 26 days ago

comment in response to post

🧵3/10 Our key recipe: MLLMs struggle to predict edit values directly. We solve this by generating rich textual reasoning for each puzzle ✍️. We then fine-tune MonetGPT on this data, creating a 'reasoning pathway' that enables it to regress final adjustment values.

submitted 26 days ago

comment in response to post

🧵2/10 MLLMs lack the visual understanding to plan edits. 🧠 So, we use expert photos as our ground truth and work backward, procedurally creating puzzles by assuming any change to an expert edit makes it less optimal

submitted 26 days ago

comment in response to post

🧵10/10 Lastly, huge thanks to my co-advisors Niloy and Duygu! For more details check out our paper below- 🌐 Project Website: monetgpt.github.io 📄 Arxiv: arxiv.org/abs/2505.06176

submitted 26 days ago

comment in response to post

🧵9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).

submitted 26 days ago

comment in response to post

🧵8/10 Photo editing is subjective 🎨. Our framework adapts to user preference by guidance from natural language tags like ‘vibrant’ or ‘retro vibe’ to produce personalized and stylistically distinct retouching plans from the same input image.

submitted 26 days ago

comment in response to post

🧵7/10 Our puzzle-based training with a 'reasoning as a pathway' approach allows MonetGPT to generate detailed justifications for each edit, delivering truly explainable image retouching

submitted 26 days ago

comment in response to post

🧵6/10 🧩 Puzzle C builds planning capabilities. The model learns to generate a complete, multi-step retouching plan to enhance a photo, structuring its reasoning as a sequence of discrete issues and solutions for clarity and control.

submitted 26 days ago

comment in response to post

🧵5/10 🧩 Puzzle B imparts aesthetic judgement. By ranking professionally edited photos against altered versions, the MLLM learns to recognize the visual characteristics of an optimally adjusted image for any given operation, building an internal aesthetic model.

submitted 26 days ago

comment in response to post

🧵4/10 🧩 Puzzle A builds an understanding of individual operations. The MLLM learns to map visual changes in before/after images to a specific tool and its precise parameter value, effectively learning the semantics of our procedural library.

submitted 26 days ago

comment in response to post

🧵3/10 Our key recipe: MLLMs struggle to predict edit values directly. We solve this by generating rich textual reasoning for each puzzle ✍️. We then fine-tune MonetGPT on this data, creating a 'reasoning pathway' that enables it to regress final adjustment values.

submitted 26 days ago

comment in response to post

🧵2/10 MLLMs lack the visual understanding to plan edits. 🧠 So, we use expert photos as our ground truth and work backward, procedurally creating puzzles by assuming any change to an expert edit makes it less optimal

submitted 26 days ago

comment in response to post

Amazon came pretty late to India and we already had some homegrown companies like Flipkart which now competes with Amazon and is valued at $40B. I think big tech's early access killed homegrown companies. China and to an extent South Korea (Naver) has some great tech companies because of barriers

submitted 116 days ago

comment in response to post

Who will tell the silicon valley tech bros that it wasn't them alone

submitted 135 days ago

comment in response to post

Haha exactly what I did today!

submitted 149 days ago

comment in response to post

Have a great time in Seattle!

submitted 202 days ago

comment in response to post

Added you!

submitted 210 days ago

comment in response to post

Added you!

submitted 210 days ago

comment in response to post

Added you!

submitted 210 days ago

comment in response to post

👋

submitted 211 days ago

comment in response to post

Hey Orion Thanks for the great least. Currently working on RAG, could I be added as well?

submitted 211 days ago

comment in response to post

While the following feed doesn't depend on likes since it follows a chronological order, I think the discover feed uses likes to tune your experience + recommend popular content

submitted 214 days ago

comment in response to post

I think keeping likes anonymous allows people to freely like whatever content they want. Not liking reduces post engagement. It also allows one to freely like content without it popping on someone else's feed (I guess this is less of a problem here since only retweets are shown to your network)

submitted 214 days ago

comment in response to post

As an author Twitter allows you to see who liked your posts/replies but it's anonymous to others.

submitted 214 days ago

comment in response to post

Hi Kosta, Could you add me too?

submitted 215 days ago

comment in response to post

Exactly, not getting penalized for external links is a big win!

submitted 215 days ago

comment in response to post

Sure, added you :)

submitted 215 days ago

comment in response to post

Thanks for the list! I've created one for inverse graphics and 3D vision-- bsky.app/starter-pack...

submitted 216 days ago