So, if Pac-Man choses the “up” action, the new state is the same, except he is now one square up, and the ghosts have chosen new positions. These actions result in more possible states. For example, Pac-Man can move left, or right, or up, or down. Given one of these states, there are possible actions that can be taken. For Pac-Man, his world is the game maze, and the states are the different configurations of pellets, where the ghosts are, if they’re scared or not, and so on. So how is this possible? The algorithm used is called Q-learning, and the basic premise is this: There is some learning agent and it is contained in a world with different possible states. Take a look at this video, of a crawling machine learning to pull itself forward, using the same algorithm: 2 A strategy with such specific cases couldn’t apply to other domains. If a ghost is charging me, should I run? If a ghost is scared, how do I decide between pursuing it, and going after the food pellets? Should I try to stay as far away from ghosts as possible, or should I risk approaching them in order to improve my score? At what threshold does this become too risky? And once all of this is decided and implemented, surely the resulting code applies only to this one game. Think about all the different choices that are involved. How is this sort of thing possible? Not only must it learn what’s good and what’s bad (at the start it doesn’t even realize that dying is bad), it must also learn to exploit these details and come up with a strategy. What if I told you that not only is Pac-Man controlled by a computer, but that the computer is an AI agent that learned to play through trial and error? 1 When this technique was first demonstrated to me, I was thrown for a loop. Don’t worry, the clip is only 30 seconds long.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |