About

BabyBench is a Multimodal Benchmark of Infant Behaviors for Developmental Artificial Intelligence. The BabyBench Competition hosted at IEEE ICDL invites participants to model infant-like learning in simulated environments using MIMo, the multimodal infant model.

Motivation
Behaviors
1. Self-touch
2. Hand regard
Modeling
MIMo
Organizing team
Acknowledgements

Motivation

How do infants develop a sense of self? And can AIs also develop a sense of self through similar mechanisms?

The early stages of human development are characterized by rich sensorimotor exploration where infants engage in self-touch and self-observation like looking at their hands (hand regard). These behaviors are observed within the first months of life and may be fundamental for the learning of the first body schema and the emergence of “sensorimotor self”.

This challenge invites participants to design and implement mechanisms that enable MIMo, a baby-sized humanoid agent equipped with a tactile skin and binocular vision, to autonomously generate self-touch and hand regard behaviors similar to those seen in human infants. The objective here is to model developmental principles such as learning how to explore, building and exploiting sensorimotor loops and intrinsic motivation that drives actions in human infants (Oudeyer & Smith, 2016). Participants will work within custom-designed MIMo environments. Successful solutions should demonstrate emergent behaviors where the agent discovers, through self-exploration, affordances of movements and refines motor skills without explicitly programmed trajectories.

Our aim is to encourage researchers from diverse backgrounds to contribute to the field of developmental robotics and artificial intelligence by developing innovative solutions to this challenging problem. We hope that this project will provide a unique opportunity to explore the intersection of machine learning, robotics, and developmental psychology.

Behaviors

The first BabyBench Competition focuses on two early behaviors that demonstrate the emergence of a sense of self in infant development: self-touch and hand regard. These behaviors may be central to how infants begin to build sensorimotor representations of their own bodies, and may even also serve as a foundation for more complex cognitive and social capabilities.

Self-touch

Self-touch behaviors have been observed in the foetus and after birth. They may constitute emerge within the first weeks of life and are a crucial form of sensorimotor exploration. Infants frequently touch their face, torso, and limbs, gradually refining their ability to reach and coordinate across the body.

Video: Czech Technical University in Prague, Czechia.

Behavioral insights

Self-touch experience may be a fundamental mechanism to bootstrap the formation of the sensorimotor self and perhaps even beyond. Frequent self-touch has been documented both before and after birth. Self-touch exploration is not random, but the spatial distribution of touch on the body evolves throughout development (DiMercurio et al. 2018; Thomas et al., 2015; Khoury et al., 2022).

Ideas to explore

The goal is to reproduce the developmental patterns in self-touch like frequency, type, and location of self-contacts across age, along with algorithms that generate these patterns. The family of algorithms of intrinsic motivation, artificial curiosity, and reinforcement learning are candidates to explore. We should also be open to the possibility that the developmental course has been genetically predetermined. Examples tackling this problem of discovering the skin space in an artificial agent are Mannella et al. (2018) and Gama et al. (2023).

Hand regard

Typically appearing around 2–3 months of age, hand regard is characterized by infants visually tracking and fixating on their own hands. This behavior marks a shift toward coordinated visual-motor integration.

Video: skeshr, Youtube.

Behavioral insights

Hand regard enables infants to link visual and proprioceptive modalities, paving the way for visually guided reaching and object manipulation (Corbetta, 2021). Studies show that infants often engage in extended hand gazing, appearing mesmerized by their own movements (van der Meer, 1997). The behavior is reduced at about 4 months, when infants shift their visual attention to objects as they learn to reach towards them, without the necessity of fixating their own hands.

Ideas to explore

Hand regard can lead to redundancies in the visual and proprioceptive sensory modalities, resulting in observations that are easier to encode (López et al., 2023). This behavior may also support the development of internal forward models, allowing infants to anticipate the outcomes of self-initiated actions.

Modeling

The BabyBench Competition invites participants to model behavior learning using a wide range of approaches. While there are no restrictions to the types of algorithms or architectures that can be used, we strongly encourage submissions that rely on biologically plausible and computationally interpretable approaches, including unsupervised, self-supervised, and reinforcement learning. In particular, models that draw inspiration from the fields of intrinsic motivations, open-ended learning, and other developmentally inspired approaches will be highly valued. Learning plausibility is one of the evaluated criteria for submissions. Next, we provide a brief overview of some of these modeling approaches and how they relate to the BabyBench competition.

Reinforcement learning

Reinforcement learning (RL) is one of the main branches of machine learning, inspired by how humans and animals learn through trail-and-error. RL agents learn from feedback they receive when interacting with their environments, which can be in the form of rewards or punishments. Over time, an agent will refine its behavior to maximize the cumulative rewards it receives. This framework is particularly well suited for learning behaviors in complex and dynamic environments like the ones of the BabyBench competition. However, BabyBench presents an important challenge for RL methods: the environments do not provide any explicit feedback (extrinsic rewards), and thus the agents need to learn autonomously.

One of the central challenges in reinforcement learning is the balance between exploration—trying new things—and exploitation—leveraging known strategies to maximize reward. Too much exploitation leads to stagnation, while too much exploration can be inefficient. Intrinsic motivations help agents explore more wisely, especially when external feedback is limited or nonexistent.

Intrinsic motivations

Intrinsic motivations introduce a more self-driven approach to learning. Inspired by human curiosity, intrinsically motivated agents seek out novelty, surprise, learning progress, empowerment, or autonomously calibrated skills. This helps them explore more broadly and develop general-purpose skills, even in the absence of extrinsic rewards. For example, curiosity-driven learning is a specific form of intrinsic motivation where an agent seeks out situations that yield new or unexpected information. Like a child poking around to see what happens, a curious agent explores the world not because it has to, but because it wants to reduce uncertainty or discover something novel.

Open-ended learning

Open-ended learning describes the ability of a system to continually develop new skills and behaviors without a predefined objective. Rather than being trained for one specific task, an open-ended learner grows over time by building on previous experiences and seeking out new challenges. One common approach for open-ended learning is to achieve self-generated goals, where the agent sets its own objectives and strives to achieve them. It enables more directed behavior and can help guide exploration in meaningful ways. By learning how to attain different goals, agents can develop a flexible understanding of their environment, enabling them to generalize and adapt to future tasks more efficiently.

Other developmentally-inspired approaches

Developmental learning takes inspiration from how human infants acquire knowledge and skills over time. Instead of jumping straight into complex tasks, a developmental learner progresses through stages, mastering simpler behaviors first and gradually tackling more complex ones. This scaffolding approach can lead to more robust and transferable learning, and is central to building truly adaptive, long-term learning systems.

Curriculum learning involves organizing training experiences in a meaningful sequence—starting with easier tasks and gradually increasing difficulty. Much like a school curriculum, this approach helps learning agents build a strong foundation before facing more challenging problems. When aligned with intrinsic motivation or open-ended goals, it can foster more efficient and stable learning.

Dynamic Field Theory (DFT) s a framework used to understand how the brain gives rise to behavior through the real-time coordination of large groups of neurons. Behaviors emerge from patterns of brain activity that form and shift as infants interact with their surroundings, allowing the brain to decide what is important and guide attention and action accordingly.

Bayesian learning explain behaviors as the result of the brain making predictions based on past experiences and updating those predictions when new information comes in. In infancy, this means that babies are not just passive observers: they are constantly forming expectations about their environment, even with limited experience. The process of combining prior knowledge with new evidence is central to how infants learn about cause and effect, recognize patterns, and adapt their behavior as they grow.

MIMo

MIMo is a multimodal infant model built on top of the MuJoCo simulation platform that can be run using Gymnasium environments. His body is composed of simple geometrical primitives adjusted to match the dimensions of an average 18-month-old child. Two different versions of MIMo are available: one with mitten-like hands, that has 44 degrees of freedom, and a computationally more expensive one with five-fingered hands and a total of 88 degrees of freedom.

MIMo has access to four sensory modalities: proprioception, vestibular system, vision, and touch. Proprioceptive observations contain the position, velocity, force and limit values for each joint in MIMo’s body. He has binocular vision from cameras located in his eyes, which render RGB images with a resolution and field-of-view defined by the user. Touch perception is implemented as a full-body virtual skin with uniformly distributed MuJoCo haptic sensors with densities defined per body part, allowing for higher density in the hands and fingers, and registering contacts with objects or other parts of MIMo’s body.

MIMo can move using three different types of actuation models: positional controllers that directly select the angle of each joint, torque controllers where actions are modeled as a motor with a spring-damper system, and muscle controllers where joints are actuated by differentially activating antagonistic muscle pairs.

A full description of MIMo’s sensory and actuation modules used in BabyBench can be found in the API page. You can download and use MIMo for your own experiments here. To know more, read the MIMo documentation.

Organizing team

Francisco M. López, Frankfurt Institute for Advanced Studies, Germany
Valentin Marcel, Czech Technical University in Prague, Czechia
Xavier Hinaut, INRIA Bordeaux, France
Jochen Triesch, Frankfurt Institute for Advanced Studies, Germany
Matej Hoffmann, Czech Technical University in Prague, Czechia

Acknowledgements

We thank the IEEE Robotics and Automation Society Technical Committee for Cognitive Robotics for contributing the prize money for the competition. BabyBench is supported by the cluster project ``The Adaptive Mind’’ funded by the Excellence Program of the Hessian Ministry of Higher Education, Research, Science and the Arts, Germany, by the Deutsche Forschungsgemeinschaft (DFG project “Abstract REpresentations in Neural Architectures (ARENA)”), by the Czech Science Foundation (GACR), Project no. 25-18113S, and by the Johanna Quandt foundation.

About

Table of Contents

Motivation

Recommended reading

Behaviors

Self-touch

Behavioral insights

Ideas to explore

Recommended reading

Hand regard

Behavioral insights

Ideas to explore

Recommended reading

Modeling

Reinforcement learning

Recommended reading

Intrinsic motivations

Recommended reading

Open-ended learning

Recommended reading

Other developmentally-inspired approaches

Recommended reading

MIMo

Recommended reading

Organizing team

Acknowledgements

Next page: Installation