Nymeria Dataset
A large-scale multimodal egocentric dataset
for full-body motion understanding
A massive dataset of multimodal egocentric daily motion in the wild
Nymeria is the world's largest dataset of human motion in the wild, capturing diverse people engaging in diverse activities across diverse locations. It is first of its kind to record body motion using multiple egocentric multimodal devices, all accurately synchronized and localized in one single metric 3D world. Nymeria also holds the distinction of the world's largest dataset of motion-language descriptions, featuring hierarchical in-context narration.
The dataset is designed to accelerate research in egocentric human motion understanding and present exciting new challenges beyond body tracking, motion synthesis and action recognition. It aims to advance contextualized computing and pave the way for future AR/VR technology.
Dataset Highlights
300 hours of daily activities
Project Aria and Machine Perception Services
Project Aria was utilized as a lightweight headset to record multimodal data, including 1 RGB video, 2 grayscale videos, 2 eye-tracking videos, 2 IMUs, 1 magnetometer, 1 barometer and audio. The recordings were processed by Aria Machine Perception Service to obtain accurate 6 DoF device trajectory, semi-dense point clouds, and eye gaze estimation with depth.
Novel 'miniAria' wristbands to resemble future wearables
The miniAria wristbands were developed by repacking the electronics and sensors of Project Aria into a wristband form factor to capture egocentric data from wrist. This novel setup is motivated by the potential of future wearable devices and to obtain accurate wrist motion to improve body tracking algorithms.
Inertial-based motion capture for full-body kinematics
The XSens MVN Link mocap suit was used to record high-quality body motion in the wild. The suit estimates full-body kinematics using 17 inertial trackers and a magnetometer. An optimization was developed to register the global motion tracking into the same coordinates as Project Aria and miniAria wristbands.
Motion retarget to parametric human model powered by Momentum
Leveraging Meta Momentum library, an optimization is developed for retargeting skeleton motion from XSens to a full parametric human model.
An observer with Project Aria for third-person perspective
An observer is added to each recording, who follows the participants as the moving camera and interact with them as needed. They provided a holistic view of the action. All recording devices are optimized to align in a single metric 3D world, and accurately synchronized via hardware solutions.
Representing the rich diversity of everyday life
Diverse scenarios
To capture natural authentic motion and interactions, 20 scenarios were designed for common daily activity with high-level descriptions. Each recording is 15 minutes long. The following demo shows 1 subject performing 6 different scenarios.
Diverse participants
A total of 264 participants were recruited to capture how different people perform the same activities in various manners. The demographics are balanced in terms of gender, ethnicity, weight, height and age. The following demo shows 6 subjects performing 3 sets of activities: badminton, cooking and party decorations.
Diverse locations
In total, 47 single-family houses of different layout were rented, comprising 201 rooms, 45 gardens and 37 multi-story houses. Each location contributed 4 to 15 hours of recordings. In the examples, all device trajectories are overlaid to show the density of actions. The clusters of head (red), left wrist (green) and right wrist (blue) merges naturally as expected.
Additionally, 3 locations from an open-space campus were captured, including a cafeteria with an outdoor patio, a multi-level office building, and a parking lot connected to multiple hiking/biking trails. Together, the video shows different subjects performing various activities in these locations.
Connecting human motion with natural language
Nymeria dataset provides motion-language descriptions, where annotators describe in-context motion by watching the playback video of synchronized egocentric view, third-person view and motion rendering. Motion is described with a coarse-to-fine schema, which includes detail-oriented motion narration, simplified atomic action and high-level activity summarization. The following word cloud visualization provides insight into the narrations.
Motion narration
39 hours, 117.2 K sentences, 2.72 M words, 3739 vocabulary
Activity summarization
196 hours, 22.6 K sentences, 0.45 M words, 3168 vocabulary
Atomic action
207 hours, 170.6 K sentences, 5.47 M words, 5129 vocabulary
Combined
230 hours, 310.5 K sentences, 8.64 M words, 6545 vocabulary
Created with privacy and ethics in mind
Nymeria dataset was collected with rigorous privacy and ethics policy. We strictly follow Project Aria Research guideline. Prior to data collection, formal consent was obtained from participants and home owners regarding data recording and usage. Data was collected and stored with de-identification. EgoBlur was used to blur faces and license plates for all videos.
Access data and tools
Nymeria dataset and tools are released by Meta under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY_NC 4.0). By submitting your email and accessing the Nymeria dataset, you agree to abide by the license and to receive emails in relation to the dataset.
Stay in the loop with the latest news from Project Aria.
By providing your email, you agree to receive marketing related electronic communications from Meta, including news, events, updates, and promotional emails related to Project Aria. You may withdraw your consent and unsubscribe from these at any time, for example, by clicking the unsubscribe link included on our emails. For more information about how Meta handles your data please read our Data Policy.