У нас вы можете посмотреть бесплатно Inside Nano Banana 🍌 and the Future of Vision-Language Models [Oliver Wang] - 748 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose multimodal agents that can use both visual and textual data for a variety of tasks. Oliver explains how Nano Banana can generate and iteratively edit images while maintaining consistency, and how its integration with Gemini’s world knowledge expands creative and practical use cases. We discuss the tension between aesthetics and accuracy, the relative maturity of image models compared to text-based LLMs, and scaling as a driver of progress. Oliver also shares surprising emergent behaviors, the challenges of evaluating vision-language models, and the risks of training on AI-generated data. Finally, we look ahead to interactive world models and VLMs that may one day “think” and “reason” in images. For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/748. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confi... 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: / twimlai Follow us on LinkedIn: / twimlai Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 📖 CHAPTERS =============================== 00:00 - Introduction 4:39 - Nano banana 5:35 - Nano banana vs Imagen and trajectory of image generation models 7:01 - Integration of Nano banana in Gemini 9:52 - Nano banana— a general purpose model 13:42 - Model consistency and editing capabilities 15:41 - Data quality and model architecture 18:13 - Use cases 24:10 - One-shot models vs. node-based interfaces 28:33 - Fine-tuning 30:32 - Exciting trends in image generation and VLMs 32:40 - Overcoming the challenges of model quality 34:36 - Model evaluation challenges 36:32 - Nano banana pros and cons 38:58 - Prompt rewriting 40:36 - Papers 41:52 - Accessibility of the research 46:45 - Verifiable domains 49:49 - Tension between accuracy and aesthetics 52:50 - Narrow data distribution in image generation 55:15 - AI-generated images for training data 57:56 - Model scale versus data curation 58:55 - Maturity of text versus image domains 🔗 LINKS & RESOURCES =============================== Nano Banana: Image editing in Google Gemini just got a major upgrade - https://blog.google/products/gemini/u... Google Gemini’s AI image model gets a ‘bananas’ upgrade - https://techcrunch.com/2025/08/26/goo... Gemini Flash - https://deepmind.google/models/gemini... Genie 3: A New Frontier for World Models - 743 - https://twimlai.com/podcast/twimlai/g... Google I/O 2025 Special Edition - 733 - https://twimlai.com/podcast/twimlai/g... 📸 Camera: https://amzn.to/3TQ3zsg 🎙️Microphone: https://amzn.to/3t5zXeV 🚦Lights: https://amzn.to/3TQlX49 🎛️ Audio Interface: https://amzn.to/3TVFAIq 🎚️ Stream Deck: https://amzn.to/3zzm7F5