У нас вы можете посмотреть бесплатно Making Robust AI Safeguards Run Deep – Stephen Casper или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Computer Science Seminar Series January 15, 2026 “Making Robust AI Safeguards Run Deep” Stephen Casper, Massachusetts Institute of Technology In 2025, frontier AI developers started warning that their AI systems were beginning to cross risk thresholds for dangerous cyber, chemical, and biological capabilities. This is unfortunate given how closed-weight AI systems are persistently vulnerable to prompt-injection attacks and open-weight systems are persistently vulnerable to malicious fine-tuning. Reinforcement learning from human feedback and refusal training aren’t enough. This presentation will focus on adversarial attacks that target model internals and their uses for making frontier AI safeguards “run deep.” In particular, we will focus on what technical tools can help us make open-weight AI systems safer. Along the way, we will discuss what AI safety can learn from the design of lightbulbs and why you should keep a close eye on Arkansas Attorney General Tim Griffin in 2026. Stephen "Cas" Casper is a final-year PhD student at the Massachusetts Institute of Technology in the Algorithmic Alignment Group, where is he advised by Dylan Hadfield-Menell. Casper leads a research stream for the MATS Program and mentors for ERA and GovAI. He is also a writer for the International AI Safety Report and the Singapore Consensus on Global AI Safety Research Priorities. Casper's research focuses on AI safeguards and governance, with features in the Conference on Neural Information Processing Systems; the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence; Nature; the ACM Conference on Fairness, Accountability, and Transparency; the Conference on Empirical Methods in Natural Language Processing; the Institute of Electrical and Electronics Engineers Conference on Secure and Trustworthy Machine Learning; Transactions on Machine Learning Research; and the annual conference of the International Association for Safe and Ethical AI—as well as in a number of workshops and over 20 press articles and newsletters.