Large Language Models Are State-of-the-Art Evaluators of Translation Quality скачать в хорошем качестве

Large Language Models Are State-of-the-Art Evaluators of Translation Quality 4 месяца назад

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: Large Language Models Are State-of-the-Art Evaluators of Translation Quality в качестве 4k

У нас вы можете посмотреть бесплатно Large Language Models Are State-of-the-Art Evaluators of Translation Quality или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон Large Language Models Are State-of-the-Art Evaluators of Translation Quality в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Title: Large Language Models Are State-of-the-Art Evaluators of Translation Quality Source: https://arxiv.org/pdf/2302.14520 Summary: This paper introduces GEMBA (GPT Estimation Metric Based Assessment), a novel GPT-based metric designed for translation quality assessment that achieves state-of-the-art accuracy. GEMBA operates effectively both with and without a human reference translation, classifying it as both a quality metric and a quality estimation task. The research demonstrates that Large Language Models (LLMs), specifically GPT 3.5 and larger models, are capable of understanding zero-shot prompts for this task, with GPT-4 exhibiting the best performance. Evaluated against the WMT22 Metrics shared task data, GEMBA with GPT-4 (GEMBA-GPT4-DA) set a new state of the art in system-level pairwise accuracy for reference-based assessment (89.8%) and strongly outperformed other reference-less metrics in quality estimation mode (87.6%). The study experimented with four prompt variants—Direct Assessment (DA), Scalar Quality Metrics (SQM), Stars Ranking, and Quality Classes—finding that the least constrained templates often yielded the best results. While GEMBA shows exceptional system-level performance, its segment-level scores are slightly behind top-performing metrics, potentially due to the discrete nature of its scoring and frequent ties. The authors have publicly released their code and prompt templates to encourage external validation and reproducibility. This work offers a significant insight into the potential of pre-trained, generative LLMs for automated machine translation evaluation. #LargeLanguageModels #LLMs #GPT #MachineTranslation #TranslationQualityAssessment #TranslationEvaluation #NLP #ArtificialIntelligence #AIResearch #StateOfTheArt #WMT22 #GEMBA #QualityEstimation #ZeroShotLearning #GPT4 #ChatGPT #MicrosoftTranslator #AutomatedEvaluation #LanguageModels #MachineLearning

Comments