У нас вы можете посмотреть бесплатно Using MuZero's Tree Search To Find Optimal Tic-Tac-Toe Strategy in a Spreadsheet или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
A video that explores how the MuZero algorithm combines aspects of Reinforcement Learning and Monte Carlo Tree Search to play efficiently. The animations in the video make use of a spreadsheet which acts as a worked example of the calculations involved in the algorithm. Formulas discussed include Upper Confidence Bounds (for Trees, also called UCB or UCT). MuZero Tree Search Spreadsheet: https://bit.ly/MuZeroSheetCopy (if you want to mess with the values yourself) https://bit.ly/MuZeroSheetView (if you just want to inspect) Playlist implementing other neural networks in a spreadsheet: • Neural Networks in Spreadsheets Reinforcement Learning explained by MIT: • MIT 6.S191 (2022): Reinforcement Lear... MuZero Paper and psuedocode: https://arxiv.org/pdf/1911.08265 https://arxiv.org/src/1911.08265v2/an... MuZero talk from one of the authors, Julian Schrittwieser: • MuZero - ICAPS 2020 Slides https://drive.google.com/file/d/1nwRR... Monte Carlo Tree Search Visualization by Vinícius Garcia • Monte Carlo Tree Search - Tic-Tac-Toe... https://vgarciasc.github.io/mcts-viz/ Other Articles on Reinforcement Learning / how-to-train-ai-agents-to-play-multiplayer... https://dev.to/satwikkansal/a-gentle-... Other Articles/Videos on Monte Carlo Tree Search (MCTS) https://towardsdatascience.com/monte-... http://jeffbradberry.com/posts/2015/0... / monte-carlo-tree-search-applied-to-letterp... Animation by J O • Monte Carlo Tree Search animation - R... Reinforcement Learning and RTS games by Edan Meyer • Reinforcement Learning in RTS Games Image Credits: / 1 https://www.deepmind.com/research/hig... Narration by James K. Script and visualizations by Kaylee L. 0:00 - Why is MuZero important? 1:05 - Outline/Overview 1:50 - MuZero's three neural networks 3:11 - Key Vocabulary Terms - state, reward, policy, action, value 4:50 - Assumptions for Tree Search Example 5:18 - What will Tree Search find? (The best action) 5:38 - Tree Search Setup 6:55 - Tree Search Begins: Selection, Expansion & Evaluation, Tree Update 11:37 - Upper Confidence Bound (UCB) Formula Explained 14:12 - First Winning Move Discovered! 15:03 - Tree Search Value Explained 18:45 - Using Tree Search Results to Form Policy and Select Action 20:30 - Collecting Training Data 22:16 - Unrolling Representation, Dynamics, and Prediction Networks for Training 24:26 - Playing Better with trained Neural Networks 26:13 - Recap Special Thanks to Robbie Close and Pat Berard for excellent feedback on initial drafts P.S. I was wondering where the UCB constant 19652 came from. I emailed the primary author and got the following response: "19652 was found using automatic hyperparameter optimization. The exact value is not critical i.e. 19000 or 20000 would work just as well." I looked more closely Appendix C (Hyperparameters) and see they did this tuning for AlphaZero and just used the same UCB constants for MuZero.