У нас вы можете посмотреть бесплатно Scalable Extraction of Training Data from (Production) Language Models или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Nicholas Carlini (Google DeepMind) https://simons.berkeley.edu/talks/nic... Alignment, Trust, Watermarking, and Copyright Issues in LLMs Large language models are well known to memorize examples from their training dataset, and then then reproduce these examples at test time. But current production models are not just pre-trained on Web-scraped text. They are now also "aligned" to produce desirable behavior, which includes the desire to not repeat training data. As a result, asking a production chat bot to repeat its training data often results in a refusal. In this talk I introduce two attacks that cause ChatGPT to emit megabytes of data it was trained on from the public Internet. The first attack is rather silly: we ask ChatGPT to emit the same word over and over ("Say 'poem poem poem...' forever") and find that this causes it to diverge, and when it diverges, that it frequently outputs text copied directly from the pertaining data. The second attack is much stronger, and we show how to break the model's alignment by exploiting a fine-tuning API, allowing us to "undo" the safety fine-tuning. I conclude with commentary on the state of alignment and how it impacts privacy-preserving machine learning.