У нас вы можете посмотреть бесплатно Leveraging GenAI for Synthetic Data Generation to Improve Spark Testing and Performance in Big Data или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Testing Spark jobs in local environments is often difficult due to the lack of suitable datasets, especially under tight timelines. This creates challenges when jobs work in development clusters but fail in production, or when they run locally but encounter issues in staging clusters due to inadequate documentation or checks. In this session, we’ll discuss how these challenges can be overcome by leveraging Generative AI to create custom synthetic datasets for local testing. By incorporating variations and sampling, a testing framework can be introduced to solve some of these challenges, allowing for the generation of realistic data to aid in performance and load testing. We’ll show how this approach helps identify performance bottlenecks early, optimize job performance and recognize scalability issues while keeping costs low. This methodology fosters better deployment practices and enhances the reliability of Spark jobs across environments. Talk By: Satej Kumar Sahu, Principal Data Engineer, Zalando SE Here’s more to explore: Production ready data pipelines for analytics and AI: https://www.databricks.com/solutions/... The Big Book of Data Engineering: https://www.databricks.com/resources/... See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dat... Connect with us: Website: https://databricks.com Twitter: / databricks LinkedIn: / databricks Instagram: / databricksinc Facebook: / databricksinc