Skip to main content

Exploration and Machine Learning with the Horde Architecture

Exploration and Machine Learning with the Horde Architecture

Ravenna Thielstrom and Lisa Meeden

The Horde architecture, first developed by Modayil, Sutton, et al in 2011, uses unsupervised reinforcement learning to make a complicated environment easier to understand by separating the action-state space of interaction into numerous action-state pairs, each of which is assigned an independent learning agent within the Horde. However, training the Horde takes both a considerable amount of time, even within a simple virtual environment, and by itself does not provide any means of applying the learned knowledge to performing some task within the robot's environment. Therefore, it becomes necessary to ask how the Horde can be made more efficient, both in terms of its training period and how the knowledge resulting from its training can be utilized in a task-completing situation.

Efforts to refine the Horde's efficiency took the form of multiple experiments conducted with variations in the virtual environment, amount of sensors available to the robot, amount of actions available to the robot, amount of steps trained, and which selection of demons was being trained. To better represent the information being gathered, a tilecoding method was implemented which discretizes the sensor data being collected, and this tilecoding was also tested with the Horde to yield the best performance from the smallest possible time period of training. These virtual experiments cleared up the capabilities of the Horde as it exists currently in virtual simulation and the results provided a guide on how to optimize the Horde's performance in different situations.

Literature Cited

Sherstov, A.A., Stone, Peter (2005). Function Approximation via Tile Coding: 
Automating Parameter Choice. In Proc. Symposium on Abstraction, Reformulation, 
and Approximation.

Sutton, R.S., Modayil, J., Delp, M. (2011). Horde: A Scalable Real-Time 
Architecture For Learning Knowledge from Unsupervised Sensorimotor Interaction. 
University of Alberta.

White. A., Modayil, J., Sutton, R.S. (2012). Scaling Life-long Off-policy Learning. 
University of Alberta.