У нас вы можете посмотреть бесплатно PDF Document Ingestion Accelerator for GenAI Applications или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Databricks Financial Service customers in the GenAI space have a common use case of ingestion and processing of unstructured documents — PDF/images — then performing downstream GenAI tasks such as entity extraction and RAG based knowledge Q&A. The pain points for the customers for these types of use cases are: The quality of the PDF/image documents varies since many older physical documents were scanned into electronic form The complexity of the PDF/image documents varies and many contain tables — images with embedding information — which require slower Tesseract OCR They would like to streamline postprocess for downstream workloads In this talk we will present an optimized structured streaming workflow for complex PDF ingestion. The key techniques include Apache Spark™ optimization, multi-threading, PDF object extraction, skew handling and auto retry logics Talk By: Qian Yu, Specialist Solution Architect, Databricks Here’s more to explore: Production ready data pipelines for analytics and AI: https://www.databricks.com/solutions/... The Big Book of Data Engineering: https://www.databricks.com/resources/... See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dat... Connect with us: Website: https://databricks.com Twitter: / databricks LinkedIn: / databricks Instagram: / databricksinc Facebook: / databricksinc