Saturday, 13 August 2016

Skill and Skill Learning - Machine Learning Perspective

Human skill is the ability to apply past knowledge and experience in performing various given tasks. Skill can be gained incrementally through learning and practicing. To acquire, represent, model, and transfer human skill or knowledge has been a core objective for more than two decades in the fields of artificial intelligence, robotics, and intelligent control. The problem is not only important to the theory of machine intelligence, but also essential in practice for developing an intelligent robotic system. The problem of skill learning is challenging because of the lack of a suitable mathematical model to describe human skill. Consider the skill as a mapping: mapping stimuli onto responses. A human associates responses with stimuli, associates actions with scenarios, labels with patterns, effects with causes. Once a human finds a mapping, intuitively he gains a skill. Therefore, if we consider the ‘ stimuli” as input and ‘ responses” as output, the skill can be viewed as a control system. This “control system” has the following characteristics: It is nonlinear, that is, there is no linear relationship between the stimuli and responses. It is time-variant, that is, the skill depends upon the environmental conditions from time to time. 0 It is non-deterministic, that is, the skill is of inherently stochastic property, and thus it can only be measured in the statistical sense. For example, even the most skillful artist can not draw identical lines without the aid of a ruler. It is generalizable, that is, it can be generalized through a learning process. 0 It is decomposable, that is, it can be decomposed into a number of low-level subsystems. 

The challenge of skill learning depends not only upon the above mentioned inherent nature of the skill, but also upon the difficulty of understanding the learning process and transferring human skill to robots. Consider the following: A human learns his skill through an incrementally improving process. It is difficult to exactly and quantitatively describe how the information is processed and the control action is selected during such a process. 0 A human possesses a variety of sensory organs such aa eyes and ears, but a robot has limited sensors. This implies that not all human skills can be transferred to robots. The environment and sensing are subject to noises and uncertainty for a robot. These characteristics make it difficult to describe human skill by general mathematical models or traditional AI methods. 

Skill learning has been studied from different disciplines in science and engineering with different emphasis and names. The idea of learning control presented in article is based on the observation that in machine learning, actions of learning machines being subject to "playback control mode", repeat their motions over and over in cycles. The research on learning control have been reviewed anf for a repeatable task operated over a fixed duration, each time the system input and response are stored, the learning controller computes a new input in a way that guarantees that the performance error will be reduced on the next trial. Under some assumptions, the P-, PI- and PD-type learning laws have been implemented. This approach is based on control theory, but the problem is certainly beyond the domain. According to the characteristics we discussed previously, it is obviously insufficient to approach such a comprehensive problem from only a control theory point of view. The concept of task-level learning can be found in related studies. 

The basic idea is that a given task can be viewed as an input/output system driven by an input vector responding with an output vector. There is a mapping which maps task commands onto task performance. In order to select the appropriate commands to achieve a desired task performance, an inverse task mapping is needed. Task-level learning has been studied in great deal for "trajectory learning" to provide an optimum trajectory through learning and has been successful for some simple cases. For a more complicated case which is realistic in practice, the inverse task mapping is too difficult to obtain. Both learning control and task-level learning emphasize achieving a certain goal by practice, and pay no attention to modeling and learning the skill. From a different angle, a research group at MIT has been working on representing human skill. The pattern recognition method and process dynamics model method were used to represent the control behavior of human experts for a debugging process. In the pattern recognition approach, the form of IF-THEN relationship: IF(signal pattern), THEN(control action) was used to represent human skill. 

Human skill pattern model is a non-parametric model and a large database is needed to characterize the task features. The idea of the process dynamics model method is to correlate the human motion to the task process state to find out how humans change their movements and tool holding compliance in relation to the task process characteristics. The problem with this approach is that human skill can not always be represented by the explicit process dynamics model and if there is no such model, or if the model is incorrect, this method will not be feasible. Considerable research efforts have been directed toward learning control architectures using connectionist or Neural Networks. Neural Network (NN) approaches are interesting because of the learning capacity. Most of the learning methods studied by connectionists are parameter estimation methods. In order to describe the input/output behavior of a dynamic system, NN is trained using input/output data, based on the assumption that the nonlinear static map generated by NN can adequately represent the system behavior for certain applications. Although NNs have been successfully applied to various tasks, their behaviors are difficult to analyze and interpret mathematically. Usually, the performance of the NN approach is highly dependent on the architectures; however, it is hard to modify the architecture to improve the performance.

Another issue is the real-time learning, i.e., dynamically updating the model to achieve the most likely performance. In real-time circumstance, we need to compute the frequencies of occurrence of the new data and add them to the model. The procedure is the same as that used to cope with multiple independent sequences. In this study, we have shown the fundamental theory and method that are needed and the preliminary experiments for real-time learning. However, various issues on real-time learning have not been discussed extensively. For example, what happens if the measured data fed in the learning process represents the poor skill, Le., unskilled performance. Using the current method, the model will be updated to best match the performance of the operator, not to best represent the good skill. This is because we have a criterion to judge the model of the skill, but do not have a criterion to judge the skill itself. In other words, it is possible to become more unskilled in real-time learning. This is a common problem in other recognition fields such as speech recognition. One way to minimize the problem is to ensure the feeding data always represents the good performance. This again needs criterion to describe how good the skill is. We will look at this issue in the future.

In this article I presented a novel method for human skill learning using HMM. HMM is a powerful parametric model and is feasible to characterize two stochastic processes - the measurable action process and immeasurable mental states - which are involved in the skill learning. Based on “the most likely performance’! criterion, we can select the best action sequence out from all previously measured action data by modeling the skill as HMM. This selection process can be updated in real-time by feeding new action data and updating the HMM, and learning through this selection process.

The method provides a feasible way to abstract human skill as a parametric model which is easily updated by new measurement. It will be found useful in various applications in education space, besides tele-robotics, such as human action recognition in man-machine interface, coordination in anthropomorphic master robot control, feedback learning in the system with uncertainty and time-varying, and pilot skill learning for the unmanned helicopter. By selecting different units for the measured data in different problems, the basic idea is applicable for a variety of skill learning problems.