У нас вы можете посмотреть бесплатно Empowering Low-Resource Languages Through Technology | Voices of the Industry Ep10 w/Felipe Sánchez или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this episode of “Voices of the Industry” by the AI Localization Think Tank, Belén interviews Felipe Sánchez Martínez, associate professor at the University of Alicante, about building machine translation for low-resource languages and how the field has moved from rule-based to statistical, hybrid, and neural approaches. Felipe explains how neural MT enables transfer learning and multilingual systems, but highlights key data challenges: scarce parallel corpora, inconsistent orthography, and the difficulty of crawling usable web data. He describes work on predicting language and parallelism from URLs to guide crawling, and warns that much online text may be MT output, requiring detection and careful handling of synthetic data. He also discusses community-driven data creation for Mayan languages in Guatemala, including terminology agreement, guidelines, review workflows, and scanning/OCR hurdles. Finally, he outlines a new Spanish-government-funded project using LLMs for low-resource translation, including leveraging unstructured resources like grammar books and releasing outputs as open source. 00:00 Welcome and Guest Intro 01:05 Felipe Background in MT 03:30 Why Low Resource Matters 05:32 Crawling and Filtering Data 07:44 Mayan Languages Fieldwork 11:21 Finding Translators Partners 12:40 Detecting Machine Translations 16:24 LLMs and Creativity Gap 21:04 New Funded Research Project 24:54 Teaching LLMs with Grammars 27:42 Wrap Up and Thanks — ➡️Felipe Sánchez LinkedIn Profile: / felipe-s%c3%a1nchez-mart%c3%adnez-5817037a ➡️Link to Felipe’s research: https://www.dlsi.ua.es/~fsanchez/ ➡️Link to Transducens Project website: https://transducens.github.io/ai-tralow/ 👉 Subscribe to the AI Localization Think Tank channel and newsletter for more conversations like this. 📢 Join the discussion on LinkedIn and tell us: What do you think about the data challenge for low-resource languages?