У нас вы можете посмотреть бесплатно Hallucinations Aren’t Magic—They’re Metrics или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
TLDR: LLMs hallucinate because we optimize them to answer, not to know when not to answer. Shift incentives toward calibrated confidence and abstention, and hallucinations drop. We train models to imitate language, then grade them with pass/fail leaderboards where guessing beats “I don’t know.” If abstaining is punished, models learn to bluff—confident, fluent, and wrong. That’s not a mystery; it’s a measurement problem. Make uncertainty first-class. Set a confidence target (e.g., only answer when ≥75% sure) and penalize being confidently wrong more than abstaining. Report correctness when the model chooses to answer (precision) alongside how often it answers (coverage). Reliability becomes the product, not just raw accuracy. If you’re shipping LLM features: add a one-line confidence target to instructions; build IDK/clarify paths in the UX; track reliability at fixed precision and show answer/IDK curves; favor abstention-aware training/evals so the model learns trust, not just test-taking. Big idea: change the objective and the behavior follows. You don’t need a tamer model—you need a smarter scoreboard. Original paper: https://cdn.openai.com/pdf/d04913be-3... #LLMs #Evaluation #Reliability #AIAgents #MLOps #AIProduct