Profile avatar
heikohotz.bsky.social
AI Engineer @ Google ๐Ÿ‘จโ€๐Ÿ’ป โ€” Educator ๐Ÿ‘จโ€๐Ÿซ โ€” Traveller โœˆ๏ธ โ€” Hobby photographer ๐Ÿ“ท โ€” Foodie ๐ŸŒฎ โ€” Film fan ๐Ÿฟ โ€” Boardgamer ๐ŸŽฒ โ€” Londoner๐Ÿ’‚โ€โ™‚๏ธ Medium: https://heiko-hotz.medium.com/ Github: https://github.com/heiko-hotz LI: https://www.linkedin.com/in/heikohotz/
447 posts 458 followers 626 following
Regular Contributor
Active Commenter

Introducing Gemini-Powered Slide Creation by Voice! In this quick demo, Iโ€™ve integrated a โ€œSlide Creation Agentโ€ into my personal project, Project Pastra. Watch how it effortlessly generates slides based on voice instructions.

Multimodal AI models have the potential to finally deliver on the dream of language being the ultimate human-computer interface ๐ŸŽ™๏ธ youtu.be/0OEDHAjY6LM

Fade Out. Directed by Jason Zada. Created with Googleโ€™s Veo 2. youtu.be/9yQXkdA3u8k?...

The Gemini Multimodal Live API Developer Guide is live!

๐—–๐—ต๐—ฎ๐—ฝ๐˜๐—ฒ๐—ฟ ๐Ÿฒ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—š๐—ฒ๐—บ๐—ถ๐—ป๐—ถ ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น ๐—Ÿ๐—ถ๐˜ƒ๐—ฒ ๐——๐—ฒ๐˜ƒ๐—ฒ๐—น๐—ผ๐—ฝ๐—ฒ๐—ฟ ๐—š๐˜‚๐—ถ๐—ฑ๐—ฒ: ๐™’๐™๐™–๐™ฉ ๐™ž๐™จ ๐™– ๐™ซ๐™ž๐™™๐™š๐™ค, ๐™–๐™ฃ๐™ฎ๐™ฌ๐™–๐™ฎ? After the hard fought battle of implementing proper audio communication in chapter 5, adding video capabilities to the multimodal live app a la Project Astra was a breeze.

HELLO?? CAN YOU HEAR ME???? More times than I'm proud to admit did I utter these words into my laptop over the past few days ๐Ÿ˜…

๐—” ๐—ฑ๐—ฒ๐˜ƒ๐—ฒ๐—น๐—ผ๐—ฝ๐—ฒ๐—ฟ ๐—ด๐˜‚๐—ถ๐—ฑ๐—ฒ ๐—ณ๐—ผ๐—ฟ ๐—š๐—ฒ๐—บ๐—ถ๐—ป๐—ถโ€™๐˜€ ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น ๐—Ÿ๐—ถ๐˜ƒ๐—ฒ ๐—”๐—ฃ๐—œ

Developers are loving the Gemini 2.0 Multimodal Live API - we see so many of you starting to build with it ๐Ÿค— To get started with the API I wrote a small Python script (83 lines of code) that demonstrates how to set up a real-time, two-way audio communication with a Gemini language model.

One more, because it's so much fun ๐Ÿคฉ #google #gemini #deepmind

Math puzzle contest! Gemini 2.0 Flash Thinking vs GPT-4o vs Claude 3.5 Sonnet. I was honestly surprised by the results. Would love if someone could check with o1(Pro) ๐Ÿค— (Credit to the Bluesky community where I saw this puzzle a few days ago)

Gemini 2.0 Flash Thinking released! You thought we were done shipping, am I right? But the Google DeepMind folks had one more ace up their sleeves, and it's a big one! #google #gemini #gemini2.0 #deepmind

our web console serves as a valuable tool for developers exploring the vast capabilities of google's multimodal live api and its gemini foundation. #google #gemini #multimodal #api #devtools

the combination of react, websockets, and audio worklets creates a powerful and flexible development environment for the google multimodal live api. #google #gemini #react #websockets #audioworklets