У нас вы можете посмотреть бесплатно Self Distillation Fine Tuning SDFT: The On Policy Trick That Makes Continual Learning Finally Work или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Read full article here: https://binaryverseai.com/self-distil... Fine-tuning an LLM can feel like doing surgery with oven mitts. You ship a new skill, then discover you accidentally erased an old one. In this video, we break down Self-Distillation Fine-Tuning (SDFT), an on-policy approach that helps models keep learning without the usual catastrophic forgetting. You’ll learn: Why off-policy supervised fine-tuning (SFT) fails in sequential updates How Self-Distillation uses a demo-conditioned “teacher” to correct a “student” on its own trajectories What the results mean for continual learning, agent training, and real-world updates When to choose weight updates vs retrieval, including LLM fine tuning vs RAG Practical engineering details: rollouts, teacher stability, logging, and failure modes If you’re building agents, shipping sequential model updates, or trying to add knowledge without regressions, this is the clean mental model and workflow to keep in your toolkit. Chapters: 00:00 Intro: The On-Policy Cure 00:13 The Problem: Fine-Tuning with Oven Mitts 00:54 The Symptom: Catastrophic Forgetting 01:45 The Root Cause: Off-Policy Trajectories 03:10 The Solution: Self-Distillation Fine-Tuning (SDFT) 03:39 Methodology: Student vs. Teacher Roles 04:39 The Mechanism: Step-by-Step Correction 05:25 Analogy: The Golf Coach vs. Video 05:55 Safety Rails: Measuring Drift (Nats) 07:22 Sequential Learning: The Triple Threat Experiment 08:20 Injecting Knowledge: The 2025 Disasters Report 09:30 Comparison: SDFT vs. RAG Systems 10:35 Reasoning: Preserving the "Think" Trace 11:58 The Landscape: The Demo-Only Middle Ground 12:49 Engineering: The Three-Loop Architecture 14:02 Implementation: Teacher Stability & Logging 14:55 Philosophy: Detaching the Training Wheels 15:45 Vision: Recursive Self-Improvement 16:50 Diagnosis: When to Prescribe SDFT 17:18 Conclusion: Fix the Policy If you found this useful, subscribe for more practical deep dives on LLM training, continual learning, and deployment tradeoffs. Drop a comment with your setup, are you doing SFT, RL, or experimenting with Self-Distillation in production?