Fully Utilizing Spark for Data Validation скачать в хорошем качестве

Fully Utilizing Spark for Data Validation 4 года назад

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: Fully Utilizing Spark for Data Validation в качестве 4k

У нас вы можете посмотреть бесплатно Fully Utilizing Spark for Data Validation или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон Fully Utilizing Spark for Data Validation в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

Fully Utilizing Spark for Data Validation

Data validation is becoming more important as companies have increasingly interconnected data pipelines. Validation serves as a safeguard to prevent existing pipelines from failing without notice. Currently, the most widely adopted data validation framework is Great Expectations. They have support for both Pandas and Spark workflows (with the same API). Great Expectations is a robust data validation library with a lot of features. For example, Great Expectations always keeps track of how many records are failing a validation, and stores examples for failing records. They also profile data after validations and output data documentation. These features can be very useful, but if a user does not need them, they are expensive to generate. What are the options if we need a more lightweight framework? Pandas has some data validation frameworks that are designed to be lightweight. Pandera is one example. Is it possible to use a lightweight Pandas-based framework on Spark? In this talk, we’ll show how this is possible with a library called Fugue. Fugue is an open-source framework that lets users port native Python code or Pandas code to Spark. We will show an interactive demo of how to extend Pandera (or any other Pandas-based data validation library) to a Spark workflow. There is also a deficiency in the current frameworks we will address in the demo. With big data, there is a need to apply different validation rules for each partition. For example, data that encompasses a lot of geographic regions may have different acceptable ranges of values (think of currency). Since the current frameworks are designed to apply a validation rule to the whole DataFrame, this can’t be done. Using Fugue and Pandera, we can apply different validation rules on each partition of data. Connect with us: Website: https://databricks.com Facebook: / databricksinc Twitter: / databricks LinkedIn: / databricks Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...

Comments