У нас вы можете посмотреть бесплатно End-to-End (small) Vision Language Model Fine-tuning Tutorial | On DGX Spark или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this video we fine-tune Hugging Face's SmolVLM2-500M Vision Language Model do structured data extraction from images. Because the SmolVLM2-500M model is is quite small in world of LLMs/VLMs, we're able to do all of the training locally on a NVIDIA DGX Spark (see here for more: https://nvda.ws/4iQXZU4). The code should also run in Google Colab. If you have any issues, please let me know in a comment. Links: Google Colab Notebook - https://colab.research.google.com/dri... GitHub - https://github.com/mrdbourke/learn-hu... Learn Hugging Face Book Version - https://www.learnhuggingface.com/note... Dataset - https://huggingface.co/datasets/mrdbo... Base model (SmolVLM2-500M) - https://huggingface.co/HuggingFaceTB/... Demo - https://huggingface.co/spaces/mrdbour... Livestreams (where I build this project from scratch): Part 1: Creating a VLM dataset - https://www.youtube.com/live/cZVU559B... Part 2: Fine-tuning a VLM with LoRA and QLoRA and getting many errors (mostly my fault) - https://www.youtube.com/live/Lgcp9hBq... Part 3: Switching from using LoRA and QLoRA (we’ll do these in a future video) to fine-tuning a smaller model (SmolVLM2) successfully, uploading it to the Hugging Face Hub and then creating an publishing a demo - https://youtube.com/live/cZVU559BLLM?... Courses I teach: Learn AI/ML (beginner-friendly course) - https://dbourke.link/ZTMMLcourse Learn Hugging Face - https://dbourke.link/ZTMHuggingFace Learn TensorFlow - https://dbourke.link/ZTMTFcourse Learn PyTorch - https://dbourke.link/ZTMPyTorch Connect elsewhere: Download Nutrify (my startup) - https://apple.co/4ahM7Wc My website - https://www.mrdbourke.com X/Twitter - / mrdbourke LinkedIn - www.linkedin.com/in/mrdbourke Get email updates on my work - https://dbourke.link/newsletter Read my novel Charlie Walks - https://www.charliewalks.com Timestamps: 00:00:00 - Introduction 00:02:19 - What is a VLM? 00:03:45 - Why fine-tune your own model? 00:06:05 - LLM fine-tuning mindset 00:06:51 - Case study: Nutrify 00:09:16 - Case study: Invoice Extractor 00:11:06 - Ingredients and tools we're going to use 00:12:16 - Exploring the FoodExtract-1k-Vision dataset 00:15:52 - My setup 00:16:13 - Dataset formatting for VLMs 00:16:54 - Dataset Creation for VLMs 00:17:20 - Getting a model to fine-tune 00:18:13 - Our task overview (structured data extraction from images) 00:20:11 - What we're going to end up with 00:22:38 - Code Starts 00:23:31 - Viewing a single data sample 00:29:08 - Splitting our data into train and test sets 00:34:25 - Inspecting our model's architecture 00:40:03 - Reading the recipe of the SmolDocling paper 00:45:29 - Freezing the vision encoder in our model 00:47:34 - Discussing batch sizes 00:49:06 - Setting up SFTConfig 00:52:03 - Training our model with SFTTrainer 00:54:11 - Model training starts 00:54:19 - Model training finishes 00:56:13 - Inspecting our model's loss curves 00:57:10 - Uploading our trained model to Hugging Face 00:58:19 - Model uploading to Hugging Face begins 00:58:26 - Model uploading finishes 00:59:38 - Comparing the base model to the fine-tuned model 01:01:06 - Viewing our fine-tuned model's first predictions 01:03:35 - Creating a demo with Gradio 01:06:46 - Uploading our demo to the Hugging Face Hub 01:07:35 - Trying out our demo 01:08:27 - What's next and extensions