One general technique for discovering categories is to form clusters of points in a suitable space. This was the basis of Elman's work on learning lexical classes from word co-occurrence statistics [Elman90]. Elman first trained a recurrent neural network to predict successive words in a long input string. This then set the stage for hierarchical clustering of the hidden-unit activation space, where the result was groups of words that coincide with classes like noun-food or verb-percept. But how can we ground such syntactic classes in sensorimotor interaction with the environment?
We answer this question, not with recurrent neural networks, which require a great deal of training, but with the method of delays, which maps a snippet of time series data to a point in delay-coordinate space. Delay coordinates are just successive sensor readings taken at a suitable time interval [Takens81, Rosenstein94]. For instance, imagine that our mobile robot is two meters from an obstacle and its sonar reports the value 2015 mm. From this single reading there is no way to know if the robot will crash into the obstacle. But suppose the sonar reports 2140 mm a second later. We could plot the point (2140, 2015) in a two-dimensional space, and notice that all nearby points correspond to the situation where the robot moves away from an object. This example may seem trivial, yet it illustrates the intuition behind the method of the delays: if the state of the environment at time t is uncertain, then examination of sensor readings prior to time t will reduce the ambiguity.
Our first application of clustering with delay coordinates involved a simulated world where agents adopted one of several strategies for pursuit or avoidance of an opponent [Rosenstein97, Rosenstein98]. Whenever two agents interacted, the simulator recorded the distance between them as well as the final outcome (contact, escape, or perpetual chase). This distance time series was the input to an incremental algorithm that clustered points based on Euclidean distance in delay-coordinate space. (Figure 1 shows a two-dimensional projection of this space, constructed from three representative interactions.) At a fixed observation time, nearby points were merged to create an average time series, or prototype. A prototype is best understood as a representative class member, and our implementation is entirely consistent with Rosch's influential work on categories and prototypes [Rosch78].
Figure 2 shows the six prototypes derived from 100 agent interactions where the movement strategies were chosen randomly. These prototypes reflect actual differences in the simulated world, and one could describe the corresponding categories as ``chase,'' ``contact,'' contact after the agents first ``overshoot'' one another, and ``escape'' with short, medium, and long escape times. These categories not only were grounded in physical measurements, but also were formed in an entirely unsupervised manner. But where is meaning?
For this work the meaning associated with each category is the predictions it affords the agent; we call this a predictive semantics. So while a category may prove useful simply as an abstraction of sensory experience, the real benefit comes from its entailments, or consequences of category membership. In the pursuit/avoidance simulator, our agents like to predict outcomes, such as contact, escape, or chase. When the learning algorithm finds clusters of points in delay-coordinate space, it also keeps a count of the number of member interactions that finish in each of the three outcomes. The members themselves are forgotten, except for their influence on the prototype. For subsequent interactions, i.e., for new time series of distance measurements, the algorithm computes the prototype nearest the time series in delay-coordinate space and reports the majority outcome for the corresponding cluster. This straightforward predicition scheme performed quite well at recognizing new situations and predicting their outcomes.
Our present interest is the transfer of this work to Pioneer mobile robots (Figure 3) which come equipped with over 30 sensors including a real-time vision system and gripper. (For related work see [Schmill98].) In a preliminary experiment, the robot moved around an office environment, occasionally bumping into objects. One way for the robot to sense objects is by contact with the bump switches at the tips of its gripper paddles; such events cause the corresponding sensor readings to change from low to high. In this experiment, the clustering algorithm discovered several categories, including two that involved the bump sensors. Comparison of the category prototypes showed that the robot's stall sensors triggered for one cluster but not the other. Interestingly, the former cluster was due to events where the robot hit a wall, whereas the latter was the result of all times the robot pushed a waste basket several centimeters across the floor. These prototypes suggest that the robot discovered two categories with complementary semantics: immovable and movable.
Perhaps the most promising aspect of this work is the possibility of an agent discovering meaningful categories for itself. In particular, this work shows the value of delay coordinates for transforming time series into clusters of points. And since time series are grounded in measurements out in the world, the resulting categories provide the foundation on which to build a symbolic, grounded, conceptual structure.
Updated 26-May-99 mtr@cs.umass.edu