У нас вы можете посмотреть бесплатно Accelerating Apache Spark Shuffle for Data Analytics on the Cloud w/ Remote Persistent Memory Pools или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
The increasing challenge to serve ever-growing data driven by AI and analytics workloads makes disaggregated storage and compute more attractive as it enables companies to scale their storage and compute capacity independently to match data & compute growth rate. Cloud based big data services is gaining momentum as it provides simplified management, elasticity, and pay-as-you-go model. However, Spark shuffle brings performance, scalability and reliability issues in the disaggregated architecture. Shuffle is an I/O intensive operation, which will lead to performance issues if using a typical cloud provisioned volume as shuffle media. Meanwhile, the shuffle operation of different tasks may interfere with each other thus limits Spark’s scalability. Moreover, shuffle re-computation in case of compute node failure poses significant overhead for long running jobs. Disaggregating shuffle from compute node is becoming more and more important for cloud-native Spark application. In this session, we propose a new fully disaggregated shuffle solution that leverage state-of-art hardware technologies including persistent memory and RDMA. It includes: a new pluggable shuffle manager, a persistent memory based distributed storage system, a RDMA powered network library and an innovative approach to use persistent memory as both shuffle media as well as RDMA memory region to reduce additional memory copies and context switch. This new remote shuffle solution improved Spark scalability by disaggregating Spark shuffle from compute node to a high-performance distributed storage, improved spark shuffle performance with high speed persistent memory and low latency RDMA network, and improved reliability by providing shuffle data replication and fault-tolerant optimization. Experiment performance numbers will be also presented, which demonstrates up to 10x performance speedup over traditional shuffle solution and three orders of magnitude reduction in terms of shuffle read block time. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Read more here: https://databricks.com/product/unifie... Connect with us: Website: https://databricks.com Facebook: / databricksinc Twitter: / databricks LinkedIn: / databricks Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...