cdminix.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

cdminix.bsky.social

PhD Student @ University of Edinburgh. Working on Synthetic Speech Evaluation at the moment. 🇳🇴 Oslo 🏴󠁧󠁢󠁳󠁣󠁴󠁿 Edinburgh 🇦🇹 Graz

64 posts 88 followers 262 following

Posts 13 Comments 37

With the market & AI mythos reeling post-DeepSeek, it seems like a good time to reup this year-old paper, on the perils of the "bigger is better" approach to AI, coauthored with @sashamtl.bsky.social + @gaelvaroquaux.bsky.social arxiv.org/abs/2409.14160

submitted 30 days ago • 9 comments

Debunking can be a fun way to find an audience in science communication, but I think it can be over-relied on. We should be celebrating more than we are defending, not just because it’s less miserable, but also because it’s good marketing!

submitted 31 days ago • 147 comments

Fascinating that the latest text-to-speech models are still going into two (orthogonal?) directions. On one side we have non-autoregressive, efficient models like Kokoro that won’t be great at voice cloning (or not support it at all) or huge billion parameter language model style ones like Llasa.

submitted 30 days ago • 1 comment

when the Bluetooth headphones lag behind the video

submitted 73 days ago • 0 comments

Planning to submit to @interspeech.bsky.social 2025!? Our author kit is available! Good luck with the 📝!

submitted 72 days ago • 0 comments

I’m told it is mandatory in Norway to leave the city and go to a hytte in thewoods on the weekend, so doing my best.

submitted 75 days ago • 0 comments

Can't wait for this to be the topic of choice at all my upcoming dinner parties 🫣 arxiv.org/abs/2412.04984

submitted 77 days ago • 0 comments

What pitch prediction would you use for in-the-wild speech data (e.g. from YouTube)? CREPE seems to widely used, but algorithmic methods like DIO, Harvest or pYIN might be more robust for data with background noise/etc.

submitted 77 days ago • 0 comments

So you know how we often have a mel spectrogram in a system diagram as the input… I usually ask a friend if they want to be in my next paper or presentation when it comes to this.

submitted 78 days ago • 0 comments

It looks like deepfake detection/MOS prediction don’t generalise to new TTS systems released in 2023/24. Sure, if you stay within a class of models you have a chance (e.g. NAR + diffusion) but even there I highly doubt that a new model can be detected if hyper params are different. 🧵 on this soon

submitted 78 days ago • 0 comments

The starter pack I’ve been looking for!

submitted 78 days ago • 0 comments

Hey #linguistics, I've created a starter pack for phoneticians, speech people, and phriends. If you feel like I've left someone out or you have suggestions, just lmk! I want more spectrograms on my feed so let's make that happen. go.bsky.app/LPrwcbq

submitted 91 days ago • 26 comments

#AcademicSky #AI #ML #TTS

submitted 79 days ago • 0 comments