Profile avatar
cdminix.bsky.social
PhD Student @ University of Edinburgh. Working on Synthetic Speech Evaluation at the moment. 🇳🇴 Oslo 🏴󠁧󠁢󠁳󠁣󠁴󠁿 Edinburgh 🇦🇹 Graz
64 posts 88 followers 262 following
Regular Contributor
Active Commenter

With the market & AI mythos reeling post-DeepSeek, it seems like a good time to reup this year-old paper, on the perils of the "bigger is better" approach to AI, coauthored with @sashamtl.bsky.social + @gaelvaroquaux.bsky.social arxiv.org/abs/2409.14160

Debunking can be a fun way to find an audience in science communication, but I think it can be over-relied on. We should be celebrating more than we are defending, not just because it’s less miserable, but also because it’s good marketing!

Fascinating that the latest text-to-speech models are still going into two (orthogonal?) directions. On one side we have non-autoregressive, efficient models like Kokoro that won’t be great at voice cloning (or not support it at all) or huge billion parameter language model style ones like Llasa.

when the Bluetooth headphones lag behind the video

Planning to submit to @interspeech.bsky.social 2025!? Our author kit is available! Good luck with the 📝!

I’m told it is mandatory in Norway to leave the city and go to a hytte in thewoods on the weekend, so doing my best.

Can't wait for this to be the topic of choice at all my upcoming dinner parties 🫣 arxiv.org/abs/2412.04984

What pitch prediction would you use for in-the-wild speech data (e.g. from YouTube)? CREPE seems to widely used, but algorithmic methods like DIO, Harvest or pYIN might be more robust for data with background noise/etc.

So you know how we often have a mel spectrogram in a system diagram as the input… I usually ask a friend if they want to be in my next paper or presentation when it comes to this.

It looks like deepfake detection/MOS prediction don’t generalise to new TTS systems released in 2023/24. Sure, if you stay within a class of models you have a chance (e.g. NAR + diffusion) but even there I highly doubt that a new model can be detected if hyper params are different. 🧵 on this soon

The starter pack I’ve been looking for!

Hey #linguistics, I've created a starter pack for phoneticians, speech people, and phriends. If you feel like I've left someone out or you have suggestions, just lmk! I want more spectrograms on my feed so let's make that happen. go.bsky.app/LPrwcbq

#AcademicSky #AI #ML #TTS