An intriguing AI paradigm: interactive learning from unlabeled instructions

Last update: January 2019

Providing you neither know the game of chess nor the French language, could you learn the rules of chess from a person speaking French? In machine learning, this problem is usually avoided by freezing one of the unknowns (e.g. chess or french) during a calibration phase. During my PhD, I tackled the full problem and proposed an innovative solution based on a measure of consistency of the interaction. We applied our method to human-robot and brain-computer interaction and studied how humans solve this problem. This work was awarded a PhD price handed by Cédric Villani (2010 Fields Medal).

Illustration of consistency. The same dataset is labelled according to 3 different tasks. The left labelling is the most consistent with the structure of the data. How do we measure this?

Content

Approach

Can an agent learn a task from human instruction signals without knowing the meaning of the communicative signals? Can we learn from unlabeled instructions? [1] The problem resemble a chicken-and-egg scenario: to learn the task you need to know the meaning of the instructions (interactive learning), and to learn the meaning of the instructions you need to know the task (supervised learning). However, a common assumption is made when tackling each of the above problems independently: the user providing the instructions or labels is acting consistently with respect to the task and to its own signal-to-meaning mapping. In short, the user is not acting randomly but is trying to guide the machine towards one goal and using the same signal to mean the same things. The user is consistent.
Illustration of consistency. The same dataset is labelled according to 3 different tasks. The left labelling is the most consistent with the structure of the user signals - indicating the user was probably teaching that task.
Hence, by measuring the consistency of the user signal-to-meaning mapping with respect to different tasks, we are able to recover both the task and the signal-to-meaning mapping, solving the chicken-and-egg problem without the need for an explicit calibration phase. And we do so while avoiding the combinatorial explosion that would occur if we tried to generate hypothesis over the joint [task, signal-to-meaning] space, most prevalent when the signals are continuous. Our contribution is a variety of method to measure consistency of an interaction, as well as a planning algorithm based on the uncertainty on that measure. We further proposed method to scale this work to continuous state domains, infinite number of tasks, and multiple interaction frame hypothesis. We further applied these methods to a human-robot learning task using speech as the modality of interaction, and to a brain-computer interfaces with real subjects. I am currently working on a simple and visual explanation of this research. In the mean time, please refer to the videos and documents in the additional resources section.

Team

PhD advisors: Manuel Lopes and Pierre-Yves Oudeyer. BCI collaborators: Iñaki Itturate and Luis Montesano.

Projects

Online Demo

I am currently designing a web application to demonstrate the concept of self-calibrating interfaces. A preliminary version of the challenge is available online at as a puzzle game: http://discourse.cri-paris.org/t/introduction-to-the-open-vault-challenge/201 and more details are available here: https://arxiv.org/pdf/1906.02485.pdf. The demo takes the form of a challenge to open a vault. The vault is secured by a 4-digit code that can be type via a simple user interface on a screen. You have access to videos of a user entering the code into the interface and can watch it as much as required. The challenge is to crack the code, open the vault, and collect its content.

Left: Staffs and students trying to crack the code. Right: Example of demonstration videos available to crack the code at level 1.

There is 5 level of increasing complexity and from level 3 onward the interface is built using the calibration-free interfaces paradigms presented on this page, which makes it hard for an observer to infer what the user is typing as the meaning associated to each buttons/actions is decided on the fly by the human.

Resources

  1. The Open Vault Challenge–Learning how to build calibration-free interactive systems by cracking the code of a vault. Grizou, Jonathan (2019). International Joint Conferences on Artificial Intelligence. [pdf] [project]

Application to Brain-Computer Interfaces

In brain-computer interfaces (BCI), an explicit calibration phase is typically required to build a decoder that translates raw electroencephalography (EEG) signals from the brain of a user into meaningful instructions. By applying our method on BCI, we remove the need for a calibration phase. The experimental setup is shown below. The user have to guide an agent towards a specific location of its choosing on the grid. The user informs the agent by thinking yes or no after each movement. The user’s brain activity is recorded and used as our unlabeled feedback signal. Similar setups are use to help persons with handicaps to spell.
A grid is displayed on the screen. The user wants the green dot to move towards the red square. After each move of the green dot, the user think of yes if the dot moved towards the red target and no otherwise. His brain activity is recorded and used to learn where the red target is located.

A calibration procedure would run for a fixed period of interaction, say 400 movement of the green dot, before the user can actually start controlling the device. With our method, the system is able to estimate when sufficient evidences are accumulated and can solve a first task after only 85 iterations in the example below case.

Figure 1: Diagram of the experimental setup
The method has been evaluated in closed-loop online experiments with 8 users. The results show that it is possible to have a usable BCI control from the beginning of the experiment without any prior calibration. Furthermore, comparisons with simulations and previous results obtained using standard calibration hint that both the quality of recorded signals and the performance of the system were comparable to those obtained with a standard calibration approach.

Resources

  1. Exploiting task constraints for self-calibrated brain-machine interface control using error-related potentials. Iturrate, I., Grizou, J., Omedes, J., Oudeyer, P., Lopes, M., et al. (2015). PloS one. [pdf] [github]
  2. Calibration-Free BCI Based Control. Grizou, J., Iturrate, I., Montesano, L., Oudeyer, P.-Y., & Lopes, M. (2014). International AAAI Conference on Artificial Intelligence. [pdf] [poster]
  3. Interactive Learning from Unlabeled Instructions. Grizou, J., Iturrate, I., Montesano, L., Oudeyer, P.-Y., & Lopes, M. (2014). Conference on Uncertainty in Artificial Intelligence (UAI). [pdf] [poster]

Application to Human-Robot interaction

Our first demonstration of this work involved a pick-and-place robot that had to identify the cube configuration the user had in mind. The user provided feedback instruction (correct/incorrect) with his voice. Thanks to our approach, the user could use any word desired to mean correct or incorrect, e.g. ‘dog’ for correct and ‘cat’ for incorrect, would work as well as ‘yes’ and ‘no’.

Example of interaction. The robot performs an action and I provide a feedback on the action according to its optimality towards building a specific cube configuration I have in mind. I use my voice to provided the feedback and the robot does not know initially which sounds I use to mean correct or incorrect. Not my best video 🙂

We evaluated the performance of our algorithm in simulation using pre-recorded spoken words and a model of the robotic task. We showed that is is possible to learn the meaning of unknown and noisy teaching instructions, as well as a new task at the same time. And that the performance can vary depending on the classifier we use to measure consistency or the planning heuristic we use to select the agent action.

Each plot shows the computed probability of the correct task per number of human-robot interaction. Left: Study of different classifiers. Right: Study of different planning heuristic.

We then showed that once a first task has been identified, it is possible to reuse the acquired knowledge about instructions for learning new tasks much faster. Indeed, once a first task is identified, we can revisit the history of interactions and assign the true label to each signals, enabling to build a signal decoder and removing one side of our problem.

Once a first task is learned using our method, the second task is much faster to learn as we can train an explicit signal-to-meaning classifier.

Resources

  1. Robot learning simultaneously a task and how to interpret human instructions. Grizou, J., Lopes, M., & Oudeyer, P.-Y. (2013). Development and Learning and Epigenetic Robotics (ICDL), 2013 IEEE Third Joint International Conference on. Student Travels Award. [pdf] [slides]

 

Planning

In later studies, we have developed a new uncertainty measure taking into account the uncertainty on the task space and the signal space. This metric can be used to select the next actions of our agent in order to solve the problem faster. This measure is non-intuitive as it is a mix of trying to disambiguate the possible tasks, while trying to disambiguate the possible signal-to-meaning mappings, both often requiring opposite actions. By explicitly measuring the joint uncertainty, our agents was able to reducing considerably the time to identify a first task. Planning was key for the success of our BCI application.

Number of iterations to identify a first task from unlabeled instructions. The planning method have a significant impact of the performance. Planning using our uncertainty measure outperformed all other methods. Using only an uncertainty measure based on the tasks performs poorly. A 50% greedy on task / 50% random policy has shown good performance, showing the non-intuitive aspect of the planning problem.

Resources

  1. Interactive Learning from Unlabeled Instructions. Grizou, J., Iturrate, I., Montesano, L., Oudeyer, P.-Y., & Lopes, M. (2014). Conference on Uncertainty in Artificial Intelligence (UAI). [pdf] [poster]

Can human solve this problem?

Inspired by the above results, we devised an experimental setup to investigate the processes used by humans to negotiate a protocol of interaction when they do not already share one. More information on the dedicated project page.

Awards

I received the Prix Le Monde de la Recherche Universitaire 2015 for this work. Each year, this price is awarded to 5 young French scientists across all fields of ‘hard’ science. Cédric Villani (2010 Fields Medal) chaired the jury. We co-authored a book featuring the work of each laureates. More: [Application letter] [INRIA acticle] [HuffPost article] I received the student travel award for our first paper on the topic titled: Robot Learning Simultaneously a Task and How to Interpret Human Instructions, ICDL-EpiRob, 2013.

Resources

PhD defense

Download the slides and the manuscript.

Videos

Tutorial:

[All related videos]

Publications

  1. The Open Vault Challenge–Learning how to build calibration-free interactive systems by cracking the code of a vault. Grizou, Jonathan (2019). International Joint Conferences on Artificial Intelligence. [pdf] [project]
  2. Learning from Unlabeled Interaction Frames. Grizou, J. (2014). PhD Thesis. PhD Thesis Award. [pdf] [code] [latex]
  3. Exploiting task constraints for self-calibrated brain-machine interface control using error-related potentials. Iturrate, I., Grizou, J., Omedes, J., Oudeyer, P., Lopes, M., et al. (2015). PloS one. [pdf] [github]
  4. Calibration-Free BCI Based Control. Grizou, J., Iturrate, I., Montesano, L., Oudeyer, P.-Y., & Lopes, M. (2014). International AAAI Conference on Artificial Intelligence. [pdf] [poster]
  5. Interactive Learning from Unlabeled Instructions. Grizou, J., Iturrate, I., Montesano, L., Oudeyer, P.-Y., & Lopes, M. (2014). Conference on Uncertainty in Artificial Intelligence (UAI). [pdf] [poster]
  6. Robot learning simultaneously a task and how to interpret human instructions. Grizou, J., Lopes, M., & Oudeyer, P.-Y. (2013). Development and Learning and Epigenetic Robotics (ICDL), 2013 IEEE Third Joint International Conference on. Student Travels Award. [pdf] [slides]

Personal Notes

I am still looking for a real-world applications that can be useful to a wider number of people. Any ideas?

Footnotes

[1] Let me explain the title:

  • Learning means the agent should be able to perform something new at the end of the day, which should not be programmed by hand in the system.
  • Interactive learning means the agent is acting in the world and receive feedback from its environment.
  • Interactive learning from instructions means the agent learn by receiving instructions (e.g. “go left”, “go right”, “it was correct”, “it was incorrect”) in an interactive way. Such instructions are conveyed by a human (or another machine/agent) in the form of a discrete or continuous signal (speech, gesture, brain activity, etc).
  • Interactive learning from unlabelled instructions means the instructions are known to be of a specific type, i.e. there exist hidden labels (e.g. only feedback instruction of type correct/incorrect). However the agent does not known the mapping between the instruction signals (speech, gesture, brain activity, etc) and their labels/meanings.