У нас вы можете посмотреть бесплатно Apache Datasketches for Big Data Analysis или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
16:00 - 16:30 | FAST DATA THEATRE APACHE DATASKETCHES FOR BIG DATA ANALYSIS WEDNESDAY 20 SEPTEMBER 2023 SPEAKER: CHARLIE DICKENS, YAHOO Many businesses face queries such as counting unique identifiers, finding frequent items, and understanding data distributions. However, these tasks are incredibly resource intensive at a large scale; particularly on streaming data or for real-time analytics. Given the rapid growth in dataset sizes, performing this type of analysis is now crucial to organisations of all sizes, rather than simply large enterprises. We present Apache Software Foundation (ASF) DataSketches; a high-performance library for efficient large-scale data analysis. Using DataSketches, analysis can be performed orders of magnitudes faster than brute force. The sketches are extremely small compared to the original data and can be easily integrated into data cubes for efficient aggregate analysis. Our library is distributed in both Java and C++ and also has bindings to Python. It is compatible with Druid, Cloudera, Hive, Impala, PostgreSQL, Pinot, and Iceberg, in addition to being used by companies such as Yahoo. Our open-source library is free for any person or organisation to use. We will introduce the audience to the notion of data sketching and detail the key wins they can expect by deploying these approaches. We will demonstrate how to use the sketches for OLAP-type queries using the Python API. Finally, we will showcase the key mergeability feature of our sketches. Using this feature we will show how to include sketches in data cubes so that aggregate statistics can easily be found over varying time periods. This is an example of a type of analysis for which a brute-force approach simply would not scale.