Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, while there have been years of research in this area, no standardized and openly available dataset exists to support the development and evaluation of such systems. We therefore propose the The KIT Motion-Language Dataset, which is large, open, and extensible.
Date | Number of Motions | Number of Annotations | Size | Download |
---|---|---|---|---|
June 22, 2017 Latest | 3911 | 6353 | 3.9 GB | 2017-06-22.zip |
Oct. 10, 2016 | 3911 | 6278 | 3.9 GB | 2016-10-10.zip |
Sept. 20, 2016 | 3913 | 6030 | 3.9 GB | 2016-09-20.zip |
June 14, 2016 | 3917 | 5486 | 3.9 GB | 2016-06-14.zip |
@article{Plappert2016, author = {Matthias Plappert and Christian Mandery and Tamim Asfour}, title = {The {KIT} Motion-Language Dataset}, journal = {Big Data} publisher = {Mary Ann Liebert Inc}, year = 2016, month = {dec}, volume = {4}, number = {4}, pages = {236--252}, url = {http://dx.doi.org/10.1089/big.2016.0028}, doi = {10.1089/big.2016.0028}, }
The content of the extracted dataset looks like this:
00001_annotations.json 00001_meta.json 00001_mmm.xml 00001_raw.c3d 00002_annotations.json 00002_meta.json 00002_mmm.xml 00002_raw.c3d 00003_annotations.json 00003_meta.json 00003_mmm.xml 00003_raw.c3d ... 03966_annotations.json 03966_meta.json 03966_mmm.xml 03966_raw.c3d
Each file is prefixed with an ID followed by its type: <id>_<type>.<ext>
. Please notice that the IDs are not necessarily consecutive. However, each ID is unique and for each ID, there are always exactly four types of files. The format and contents of the available files are explained below.
<id>_raw.<ext>
)The format of the raw motion data depends on the data source. Currently, the dataset only contains C3D files. However, since this might change in the future, we recommend that you use the unified MMM representation instead.
Currently, our dataset contains motion data from the following data sources:
<id>_mmm.xml
)The motions are available as XML files that represent the motion using the Master Motor Map (MMM) framework. We recommend that you always use this file when working with the motion data since it abstracts from the concrete motion capture system in use and allows you to use new data sources in the future without effort. An comprehensive documentation of the data format is available on the MMM website. Please refer to this documentation for details. We also provide a simple Python example (see below) that parses the most important parts of the XML file without any third-party dependencies.
<?xml version='1.0'?>
<MMM>
<Motion name='subject_id_4'>
<Model>
<File>mmm.xml</File>
</Model>
<ModelProcessorConfig type='Winter'>
<Height>1.82</Height>
<Mass>100</Mass>
</ModelProcessorConfig>
<JointOrder>
<Joint name='BLNx_joint'/>
<Joint name='BLNy_joint'/>
<Joint name='BLNz_joint'/>
<Joint name='BPx_joint'/>
<Joint name='BPy_joint'/>
<Joint name='BPz_joint'/>
<Joint name='BTx_joint'/>
<Joint name='BTy_joint'/>
<Joint name='BTz_joint'/>
<Joint name='BUNx_joint'/>
<Joint name='BUNy_joint'/>
<Joint name='BUNz_joint'/>
<Joint name='LAx_joint'/>
<Joint name='LAy_joint'/>
<Joint name='LAz_joint'/>
<Joint name='LEx_joint'/>
<Joint name='LEz_joint'/>
<Joint name='LHx_joint'/>
<Joint name='LHy_joint'/>
<Joint name='LHz_joint'/>
<Joint name='LKx_joint'/>
<Joint name='LSx_joint'/>
<Joint name='LSy_joint'/>
<Joint name='LSz_joint'/>
<Joint name='LWx_joint'/>
<Joint name='LWy_joint'/>
<Joint name='LFx_joint'/>
<Joint name='LMrot_joint'/>
<Joint name='RAx_joint'/>
<Joint name='RAy_joint'/>
<Joint name='RAz_joint'/>
<Joint name='REx_joint'/>
<Joint name='REz_joint'/>
<Joint name='RHx_joint'/>
<Joint name='RHy_joint'/>
<Joint name='RHz_joint'/>
<Joint name='RKx_joint'/>
<Joint name='RSx_joint'/>
<Joint name='RSy_joint'/>
<Joint name='RSz_joint'/>
<Joint name='RWx_joint'/>
<Joint name='RWy_joint'/>
<Joint name='RFx_joint'/>
<Joint name='RMrot_joint'/>
</JointOrder>
<MotionFrames>
<MotionFrame>
<Timestep>0</Timestep>
<RootPosition>1142.97 -252.854 973.526</RootPosition>
<RootRotation>-0.0735726 -0.0238353 1.64312</RootRotation>
<JointPosition>-0.612953 0.0476239 0.212333 0.161793 0.0028466 0.0190509 -0.0479624 -0.0228922 -0.0694337 0.387513 -0.0018089 0.135106 0.158669 -0.324085 -0.165832 0.35672 -0.0738947 0.103075 0.0545089 0.135437 -0.221003 -0.154341 0.179823 -0.279305 0.0844597 -0.254741 -0.122429 0.224831 0.0616969 0.324959 -0.483339 0.308075 0.857774 0.00596291 0.0430199 0.373412 -0.021832 -0.11818 -0.083582 -0.152354 0.120838 0.373251 -0.0313812 -0.261799 </JointPosition>
<JointVelocity>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </JointVelocity>
<JointAcceleration>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </JointAcceleration>
</MotionFrame>
<MotionFrame>
<Timestep>0.01</Timestep>
<RootPosition>1142.89 -252.863 973.795</RootPosition>
<RootRotation>-0.0752048 -0.0231932 1.64472</RootRotation>
<JointPosition>-0.651594 -0.142824 0.211258 0.17367 0.000509898 0.0173047 -0.0596829 -0.0227031 -0.0701363 0.471133 0.261797 0.116722 0.155988 -0.323779 -0.172119 0.356309 -0.0834862 0.106037 0.0545055 0.139367 -0.222049 -0.151358 0.178517 -0.269665 0.0925315 -0.257984 -0.113151 0.221422 0.0603637 0.326164 -0.478189 0.309305 0.879796 0.00786525 0.0430097 0.363636 -0.0238971 -0.115228 -0.0827109 -0.163222 0.134703 0.373612 -0.021385 -0.261799 </JointPosition>
<JointVelocity>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </JointVelocity>
<JointAcceleration>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </JointAcceleration>
</MotionFrame>
<!-- More <MotionFrame> elements follow -->
</MotionFrames>
</Motion>
</MMM>
<id>_annotations.json
)The annotations for each motion are available as strings in a very simple JSON array. One motion can have many motions. Notice also that we include motions that haven't been annotated yet in the dataset, which means that the array may also contain zero elements.
[
"A person walks.",
"a person walks four steps forward"
]
<id>_meta.json
)The metadata for each entry is provided as a simple JSON dictionary. It links each entry to the Motion Annotation Tool and to the source database of the motion. This information is useful if you need to look up additional data from the database, e.g. the subject's height or weight. It also contains the perplexity of each annotation under a simple 3-gram language model. This allows you to find and filter out outliers.
{
"motion_annotation_tool":{
"id":3966,
"annotation_ids":[
5524
]
},
"source":{
"mirror_database":{
"identifier":"kit",
"motion_file_id":53570,
"motion_id":1149
},
"institution":{
"identifier":"cmu",
"name":"Carnegie Mellon University"
},
"database":{
"identifier":"cmu",
"motion_file_id":38,
"motion_id":127
}
},
"nb_annotations":1,
"annotation_perplexities":[2.78172, 7.43163]
}
Notice that the mirror_database
entry is not always available.
You can use the Ice interface of the KIT Whole-Body Human Motion Database to obtain additional information on each motion. The Ice interface can be used with a wide variety of different programming and scripting languages. You can use the the motion_id
entry from the metadata dictionary to obtain the corresponding ID for each motion. Additional information as well as the necessary Ice interface specification is available here.
We provide a simple example that illustrates how the entire dataset can be loaded using Python. The only dependency that we use is numpy
.
import argparse
import os
import json
import xml.etree.cElementTree as ET
import logging
import numpy as np
def parse_motions(path):
xml_tree = ET.parse(path)
xml_root = xml_tree.getroot()
xml_motions = xml_root.findall('Motion')
motions = []
if len(xml_motions) > 1:
logging.warn('more than one <Motion> tag in file "%s", only parsing the first one', path)
motions.append(_parse_motion(xml_motions[0], path))
return motions
def _parse_motion(xml_motion, path):
xml_joint_order = xml_motion.find('JointOrder')
if xml_joint_order is None:
raise RuntimeError('<JointOrder> not found')
joint_names = []
joint_indexes = []
for idx, xml_joint in enumerate(xml_joint_order.findall('Joint')):
name = xml_joint.get('name')
if name is None:
raise RuntimeError('<Joint> has no name')
joint_indexes.append(idx)
joint_names.append(name)
frames = []
xml_frames = xml_motion.find('MotionFrames')
if xml_frames is None:
raise RuntimeError('<MotionFrames> not found')
for xml_frame in xml_frames.findall('MotionFrame'):
frames.append(_parse_frame(xml_frame, joint_indexes))
return joint_names, frames
def _parse_frame(xml_frame, joint_indexes):
n_joints = len(joint_indexes)
xml_joint_pos = xml_frame.find('JointPosition')
if xml_joint_pos is None:
raise RuntimeError('<JointPosition> not found')
joint_pos = _parse_list(xml_joint_pos, n_joints, joint_indexes)
return joint_pos
def _parse_list(xml_elem, length, indexes=None):
if indexes is None:
indexes = range(length)
elems = [float(x) for idx, x in enumerate(xml_elem.text.rstrip().split(' ')) if idx in indexes]
if len(elems) != length:
raise RuntimeError('invalid number of elements')
return elems
def main(args):
input_path = args.input
print('Scanning files ...')
files = [f for f in os.listdir(input_path) if os.path.isfile(os.path.join(input_path, f)) and f[0] != '.']
basenames = list(set([os.path.splitext(f)[0].split('_')[0] for f in files]))
print('done, {} potential motions and their annotations found'.format(len(basenames)))
print('')
# Parse all files.
print('Processing data in "{}" ...'.format(input_path))
all_ids = []
all_motions = []
all_annotations = []
all_metadata = []
reference_joint_names = None
for idx, basename in enumerate(basenames):
print(' {}/{} ...'.format(idx + 1, len(basenames))),
# Load motion.
mmm_path = os.path.join(input_path, basename + '_mmm.xml')
assert os.path.exists(mmm_path)
joint_names, frames = parse_motions(mmm_path)[0]
if reference_joint_names is None:
reference_joint_names = joint_names[:]
elif reference_joint_names != joint_names:
print('skipping, invalid joint_names {}'.format(joint_names))
continue
# Load annotation.
annotations_path = os.path.join(input_path, basename + '_annotations.json')
assert os.path.exists(annotations_path)
with open(annotations_path, 'r') as f:
annotations = json.load(f)
# Load metadata.
meta_path = os.path.join(input_path, basename + '_meta.json')
assert os.path.exists(meta_path)
with open(meta_path, 'r') as f:
meta = json.load(f)
assert len(annotations) == meta['nb_annotations']
all_ids.append(int(basename))
all_motions.append(np.array(frames, dtype='float32'))
all_annotations.append(annotations)
all_metadata(meta)
print('done')
assert len(all_motions) == len(all_annotations)
assert len(all_motions) == len(all_ids)
print('done, successfully processed {} motions and their annotations'.format(len(all_motions)))
print('')
# At this point, you can do anything you want with the motion and annotation data.
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='')
parser.add_argument('input', type=str)
main(parser.parse_args())
Notice that this only parses the joint values of the MMM file. You can also use the MMM framework to do much more with the data. Please refer to the MMM documentation for details.