У нас вы можете посмотреть бесплатно Mini Batch SGD | Loss Curve | Parameter Update in Neural Networks (Lecture 2) или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Welcome to the second lecture of my Advanced Deep Learning Concepts series! 🧠🚀 In this lecture, we take the mathematical foundation of SGD we built last time and apply it to the real world constraints of hardware and time. We answer a confusing phenomenon for many beginners: Why does my loss graph look so noisy, and is that a bad thing? The goal of this video is to master **Mini-Batch Stochastic Gradient Descent**. While standard Batch Gradient Descent provides a smooth path to the minima, it is computationally impossible for large datasets. We bridge the gap between theory and practice by manually implementing batching in PyTorch and visualizing the resulting "Loss Curves." We dive deep into the trade-offs between memory efficiency and gradient precision, using animations to show exactly how the weights jitter their way toward convergence. In this video, we cover: ✅ The Loss Curve: We move from the "Loss Landscape" to the "Loss Curve." We analyze the famous "Hockey Stick" shape and explain why high initial compression is followed by a plateau. ✅ Batch vs. Mini-Batch: We visualize the difference between calculating gradients on the entire dataset (smooth, exact) versus a subset (noisy, approximate). We explain why GPUs necessitate the latter. ✅ Visualizing the "Jitter": Using custom 2D and 3D animations, we demonstrate why Mini-Batch SGD produces a noisy, zig-zagging descent. You will understand that this noise represents the approximation error of the gradient. ✅ Steps vs. Epochs: We clarify a major point of confusion: the distinction between a "Step" (one update) and an "Epoch" (seeing the whole dataset). We explain why Mini-Batch training results in significantly more update steps. ✅ PyTorch Implementation: We don't just use `DataLoader` magic yet. We manually implement batch selection using `torch.randperm` to truly understand how data is shuffled and fed into the network during training. ✅ Selecting Batch Size: We discuss why powers of 2 (32, 64, 128) are standard, referencing the historic AlexNet (Batch Size 256), and explain the "Fill your GPU memory" rule of thumb. By the end of this lecture, you will understand that while Batch GD gives exact gradients, Mini-Batch SGD gives us the speed and efficiency required for Deep Learning. You will learn to embrace the noise! Resources: 🔗 GitHub Repository (Code & Notes): https://github.com/gautamgoel962/Yout... 🔗 Follow me on Instagram: / gautamgoel978 Subscribe and hit the bell icon! 🔔 In the next video, we will explore the "Heartbeat" of neural networks: **Learning Rate Schedules**. We will see what happens when the learning rate is too low, too high, or decays over time. Let's optimize! 📉⚡ #deeplearning #machinelearning #artificialintelligence #ai #datascience #python #pytorch #gradientdescent #stochasticgradientdescent #sgd #neuralnetworks #backpropagation #lossfunction #optimizer #minibatch #modeltraining #datascienceeducation #100daysofcode #learnai #buildinpublic #machinelearningengineer #aitrends #techeducation #stemeducation #codenewbie #coding #computerscience #gpucomputing #tech