У нас вы можете посмотреть бесплатно TinyRL: Can AI Learn to Swing Up a Real Pendulum? | DigiKey или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Reinforcement learning (RL) is a form of machine learning that involves training agents to interact with an environment in order to maximize cumulative rewards. In this video, we teach an AI to swing up a pendulum using real hardware and RL. A write-up of the project can be found here: https://www.digikey.com/en/maker/proj... An RL agent learns to interact with its environment using trial and error. Shawn creates an interface in Arduino that can control a stepper motor and read the position of an encoder attached to the pendulum. The goal is to train an agent to learn to swing up the pendulum on its own. Intro to Reinforcement Learning video: • Introduction to Reinforcement Learning | D... Hyperparameter Optimization video: • Hyperparameter Optimization for Reinforcem... To accomplish this, the Arduino is connected to a computer running the Farama gymnasium and Stable Baselines3 frameworks. These frameworks take in the observations, have the agent guess an action, and tell the Arduino what action to take. The agent is updated using the proximal policy optimization (PPO) algorithm found in Stable Baselines3. Initially, Shawn tried to perform a full swing-up and balance with a continuous action set. However, this proved too difficult for the agent, as the round trip time to and from the Arduino along with model updates took too long to successfully balance the pole. To reduce the scope, the action set was made into a discrete set (+10 deg, 0 deg, -10 deg), and the episode ended when the pendulum reached the top under a particular speed. If the pendulum moved too fast near the top, it was considered to have “crashed,” and a penalty was applied. Once the agent successfully learned how to perform the swing-up, it was deployed to the Arduino. To perform the deployment, the critic portion of the actor-critic model in the PPO agent was stripped away, and the remaining actor model (3-layer dense neural network) was optimized using Edge Impulse. The model was then deployed to an ESP32S3 to perform the swing-up without any input from the computer. Product Links: STEVAL-EDUKIT01 - https://www.digikey.com/en/products/d... Seeed Studio XIAO ESP32S3 - https://www.digikey.com/en/products/d... Related Videos: • Introduction to Reinforcement Learning | D... • Exploring Reinforcement Learning: Can AI L... Related Project Links: https://www.digikey.com/en/maker/proj... https://www.digikey.com/en/maker/proj... Learn more: Maker.io - https://www.digikey.com/en/maker DigiKey’s Blog – TheCircuit https://www.digikey.com/en/blog Connect with DigiKey on Facebook / digikey.electronics And follow us on X (formerly Twitter) / digikey 00:00 - Introduction 01:10 - Hardware overview 03:00 - Modifying the pendulum tower 04:20 - Arduino communication interface 04:49 - Overview of reinforcement learning 06:17 - Reward function 08:32 - Agent actor-critic deep neural network 09:33 - Hyperparameter optimization overview 09:51 - Agent training with Python 14:57 - Troubleshooting an agent that does not learn 16:46 - Reduce scope to just swing up and use discrete action space 18:03 - Train simpler agent 18:22 - Deploy agent to ESP32 19:56 - Test agent on the pendulum 20:46 - Conclusion and further areas of research