У нас вы можете посмотреть бесплатно DuckDB & Python | End-To-End Data Engineering Project (1/3) или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this video @mehdio goes over a fun end-to-end data engineering project : get usage insights from a python library using Python, SQL and DuckDB! This is the first part of the series. Check links below to learn about transformation and dashboarding using DuckDB ! 🎥 Part 2 of the end-to-end data engineering project : • DuckDB & dbt | End-To-End Data Engineering... 🎥 Part 3 : • DuckDB & dataviz | End-To-End Data Enginee... ☁️🦆 Start using DuckDB in the Cloud for FREE with MotherDuck : https://hubs.la/Q02QnFR40 📓 Resources Github Repo of the tutorial : https://github.com/mehd-io/pypi-duck-... BigQuery performance issue with certain libraries: https://github.com/googleapis/python-... DuckDB for beginner video : • DuckDB Tutorial For Beginners In 12 min ➡️ Follow Us LinkedIn: / motherduck Twitter : / motherduck Blog: https://motherduck.com/blog/ 0:00 Intro 1:06 Architecture 3:13 Ingestion Pipeline Python & DuckDB 41:08 Wrapping up & what's next #duckdb #dataengineering #sql #python Learn how to build a complete, end-to-end data engineering project using Python, SQL, and DuckDB. This video guides you through creating a robust Python data pipeline to ingest and analyze PyPI download statistics, providing valuable insights into any Python library's adoption. We'll cover the full architecture, from sourcing raw data in Google BigQuery to preparing it for transformation and visualization, making this a perfect tutorial for anyone looking to apply data engineering best practices in a real-world scenario. We kick off the data ingestion phase by demonstrating how to efficiently query massive public datasets in BigQuery without incurring high costs, focusing on partition filtering for optimization. You'll learn how to set up a professional development environment using Docker and VS Code dev containers, and we'll install all the necessary libraries, including the Google Cloud SDK, Pandas for data manipulation, and of course, the DuckDB Python package. This setup ensures your data pipeline is reproducible and isolated. Discover Python data pipeline best practices as we structure our code for maintainability and robustness. We use Pydantic to define clear data models for our job parameters and, critically, for schema validation against the source data from BigQuery. This prevents data quality issues from breaking your pipeline downstream. We also leverage the Fire library to automatically generate a powerful and flexible command-line interface (CLI) from our Pydantic models, making the pipeline easy to parameterize and run. See how DuckDB acts as the powerful core of our ingestion logic. After fetching data into a Pandas DataFrame, we seamlessly load it into an in-memory DuckDB instance. This simplifies complex tasks like creating reliable test fixtures for schema validation and exporting the validated data to multiple destinations. Learn the simple SQL commands to write data locally, push to a data lake on AWS S3 with efficient Hive partitioning, or load it directly into MotherDuck for a serverless cloud data warehouse experience. By the end of this tutorial, you'll have built a fully functional raw data ingestion pipeline, ready for the next step. This video sets the foundation for our series, where we'll next use DBT and DuckDB to build the transformation layer. You'll gain practical skills in data engineering, schema management, and building efficient pipelines with modern developer tools.