Скачать с ютуб видео How to Train Data-Efficient LLMs

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: How to Train Data-Efficient LLMs в качестве 4k

У нас вы можете посмотреть бесплатно How to Train Data-Efficient LLMs или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон How to Train Data-Efficient LLMs в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

How to Train Data-Efficient LLMs

The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. We seek to understand the tradeoffs associated with data selection routines based on (i) expensiveto-compute data-quality estimates, and (ii) maximization of coverage and diversity-based measures in the feature space. Our first technique, ASK-LLM, leverages the zero-shot reasoning capabilities of instruction-tuned LLMs to directly assess the quality of a training example. To target coverage, we propose DENSITY sampling, which models the data distribution to select a diverse sample. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that ASK-LLM and DENSITY are the best methods in their respective categories. Coverage sampling can recover the performance of the full data, while models trained on ASK-LLM data consistently outperform full-data training-even when we reject 90% of the original dataset, while converging up to 70% faster.

Comments