In this assignment, you have to guide the player agent in Minecraft through a maze to reach the goal block. For this exercise, we will assume that the agent can see the whole maze, i.e. it is completely observable. Given this information, the agent has to reach the goal in the minimum number of moves.
In this assignment, we have simulated a reinforcement learning based dog, whose life's purpose, as it tends to be, is to please its owner. You will define the details of the Markov Decision Process, and use the provided implementation of Q-Learning to learn a policy. Then, you will extend the MDP with more states and actions, and see if how well you can get the agent to do.