The KIT Motion-Language Dataset

Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, while there have been years of research in this area, no standardized and openly available dataset exists to support the development and evaluation of such systems. We therefore propose the The KIT Motion-Language Dataset, which is large, open, and extensible.

Downloads

Date Number of Motions Number of Annotations Size Download
Oct. 10, 2016 Latest 3911 6278 3.9 GB 2016-10-10.zip
Sept. 20, 2016 3913 6030 3.9 GB 2016-09-20.zip
June 14, 2016 3917 5486 3.9 GB 2016-06-14.zip

Citing

If you use the KIT Motion-Language dataset, please cite the following paper:
@article{Plappert2016,
    author = {Matthias Plappert and Christian Mandery and Tamim Asfour},
    title = {The {KIT} Motion-Language Dataset},
    journal = {Big Data}
    publisher = {Mary Ann Liebert Inc},
    year = 2016,
    month = {dec},
    volume = {4},
    number = {4},
    pages = {236--252},
    url = {http://dx.doi.org/10.1089/big.2016.0028},
    doi = {10.1089/big.2016.0028},
}

Dataset Content

The content of the extracted dataset looks like this:

00001_annotations.json
00001_meta.json
00001_mmm.xml
00001_raw.c3d
00002_annotations.json
00002_meta.json
00002_mmm.xml
00002_raw.c3d
00003_annotations.json
00003_meta.json
00003_mmm.xml
00003_raw.c3d
...
03966_annotations.json
03966_meta.json
03966_mmm.xml
03966_raw.c3d

Each file is prefixed with an ID followed by its type: <id>_<type>.<ext>. Please notice that the IDs are not necessarily consecutive. However, each ID is unique and for each ID, there are always exactly four types of files. The format and contents of the available files are explained below.

Raw Motion Data (<id>_raw.<ext>)

The format of the raw motion data depends on the data source. Currently, the dataset only contains C3D files. However, since this might change in the future, we recommend that you use the unified MMM representation instead.

Currently, our dataset contains motion data from the following data sources:

MMM Motion Data (<id>_mmm.xml)

The motions are available as XML files that represent the motion using the Master Motor Map (MMM) framework. We recommend that you always use this file when working with the motion data since it abstracts from the concrete motion capture system in use and allows you to use new data sources in the future without effort. An comprehensive documentation of the data format is available on the MMM website. Please refer to this documentation for details. We also provide a simple Python example (see below) that parses the most important parts of the XML file without any third-party dependencies.

Example

<?xml version='1.0'?>
<MMM>
    <Motion name='subject_id_4'>
        <Model>
            <File>mmm.xml</File>
        </Model>
        <ModelProcessorConfig type='Winter'>
            <Height>1.82</Height>
            <Mass>100</Mass>
        </ModelProcessorConfig>
        <JointOrder>
            <Joint name='BLNx_joint'/>
            <Joint name='BLNy_joint'/>
            <Joint name='BLNz_joint'/>
            <Joint name='BPx_joint'/>
            <Joint name='BPy_joint'/>
            <Joint name='BPz_joint'/>
            <Joint name='BTx_joint'/>
            <Joint name='BTy_joint'/>
            <Joint name='BTz_joint'/>
            <Joint name='BUNx_joint'/>
            <Joint name='BUNy_joint'/>
            <Joint name='BUNz_joint'/>
            <Joint name='LAx_joint'/>
            <Joint name='LAy_joint'/>
            <Joint name='LAz_joint'/>
            <Joint name='LEx_joint'/>
            <Joint name='LEz_joint'/>
            <Joint name='LHx_joint'/>
            <Joint name='LHy_joint'/>
            <Joint name='LHz_joint'/>
            <Joint name='LKx_joint'/>
            <Joint name='LSx_joint'/>
            <Joint name='LSy_joint'/>
            <Joint name='LSz_joint'/>
            <Joint name='LWx_joint'/>
            <Joint name='LWy_joint'/>
            <Joint name='LFx_joint'/>
            <Joint name='LMrot_joint'/>
            <Joint name='RAx_joint'/>
            <Joint name='RAy_joint'/>
            <Joint name='RAz_joint'/>
            <Joint name='REx_joint'/>
            <Joint name='REz_joint'/>
            <Joint name='RHx_joint'/>
            <Joint name='RHy_joint'/>
            <Joint name='RHz_joint'/>
            <Joint name='RKx_joint'/>
            <Joint name='RSx_joint'/>
            <Joint name='RSy_joint'/>
            <Joint name='RSz_joint'/>
            <Joint name='RWx_joint'/>
            <Joint name='RWy_joint'/>
            <Joint name='RFx_joint'/>
            <Joint name='RMrot_joint'/>
        </JointOrder>
        <MotionFrames>
            <MotionFrame>
                <Timestep>0</Timestep>
                <RootPosition>1142.97 -252.854 973.526</RootPosition>
                <RootRotation>-0.0735726 -0.0238353 1.64312</RootRotation>
                <JointPosition>-0.612953 0.0476239 0.212333 0.161793 0.0028466 0.0190509 -0.0479624 -0.0228922 -0.0694337 0.387513 -0.0018089 0.135106 0.158669 -0.324085 -0.165832 0.35672 -0.0738947 0.103075 0.0545089 0.135437 -0.221003 -0.154341 0.179823 -0.279305 0.0844597 -0.254741 -0.122429 0.224831 0.0616969 0.324959 -0.483339 0.308075 0.857774 0.00596291 0.0430199 0.373412 -0.021832 -0.11818 -0.083582 -0.152354 0.120838 0.373251 -0.0313812 -0.261799 </JointPosition>
                <JointVelocity>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </JointVelocity>
                <JointAcceleration>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </JointAcceleration>
            </MotionFrame>
            <MotionFrame>
                <Timestep>0.01</Timestep>
                <RootPosition>1142.89 -252.863 973.795</RootPosition>
                <RootRotation>-0.0752048 -0.0231932 1.64472</RootRotation>
                <JointPosition>-0.651594 -0.142824 0.211258 0.17367 0.000509898 0.0173047 -0.0596829 -0.0227031 -0.0701363 0.471133 0.261797 0.116722 0.155988 -0.323779 -0.172119 0.356309 -0.0834862 0.106037 0.0545055 0.139367 -0.222049 -0.151358 0.178517 -0.269665 0.0925315 -0.257984 -0.113151 0.221422 0.0603637 0.326164 -0.478189 0.309305 0.879796 0.00786525 0.0430097 0.363636 -0.0238971 -0.115228 -0.0827109 -0.163222 0.134703 0.373612 -0.021385 -0.261799 </JointPosition>
                <JointVelocity>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </JointVelocity>
                <JointAcceleration>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </JointAcceleration>
            </MotionFrame>

            <!-- More <MotionFrame> elements follow -->

        </MotionFrames>
    </Motion>
</MMM>

Annotation Data (<id>_annotations.json)

The annotations for each motion are available as strings in a very simple JSON array. One motion can have many motions. Notice also that we include motions that haven't been annotated yet in the dataset, which means that the array may also contain zero elements.

Example

[
    "A person walks.",
    "a person walks four steps forward"
]

Metadata (<id>_meta.json)

The metadata for each entry is provided as a simple JSON dictionary. It links each entry to the Motion Annotation Tool and to the source database of the motion. This information is useful if you need to look up additional data from the database, e.g. the subject's height or weight. It also contains the perplexity of each annotation under a simple 3-gram language model. This allows you to find and filter out outliers.

Example

{  
    "motion_annotation_tool":{  
        "id":3966,
        "annotation_ids":[  
            5524
        ]
    },
    "source":{  
        "mirror_database":{  
            "identifier":"kit",
            "motion_file_id":53570,
            "motion_id":1149
        },
        "institution":{  
            "identifier":"cmu",
            "name":"Carnegie Mellon University"
        },
        "database":{  
            "identifier":"cmu",
            "motion_file_id":38,
            "motion_id":127
        }
    },
    "nb_annotations":1,
    "annotation_perplexities":[2.78172, 7.43163]
}

Notice that the mirror_database entry is not always available.

You can use the Ice interface of the KIT Whole-Body Human Motion Database to obtain additional information on each motion. The Ice interface can be used with a wide variety of different programming and scripting languages. You can use the the motion_id entry from the metadata dictionary to obtain the corresponding ID for each motion. Additional information as well as the necessary Ice interface specification is available here.

Python Example

We provide a simple example that illustrates how the entire dataset can be loaded using Python. The only dependency that we use is numpy.

import argparse
import os
import json
import xml.etree.cElementTree as ET
import logging

import numpy as np


def parse_motions(path):
    xml_tree = ET.parse(path)
    xml_root = xml_tree.getroot()
    xml_motions = xml_root.findall('Motion')
    motions = []

    if len(xml_motions) > 1:
        logging.warn('more than one <Motion> tag in file "%s", only parsing the first one', path)
    motions.append(_parse_motion(xml_motions[0], path))
    return motions


def _parse_motion(xml_motion, path):
    xml_joint_order = xml_motion.find('JointOrder')
    if xml_joint_order is None:
        raise RuntimeError('<JointOrder> not found')

    joint_names = []
    joint_indexes = []
    for idx, xml_joint in enumerate(xml_joint_order.findall('Joint')):
        name = xml_joint.get('name')
        if name is None:
            raise RuntimeError('<Joint> has no name')
        joint_indexes.append(idx)
        joint_names.append(name)

    frames = []
    xml_frames = xml_motion.find('MotionFrames')
    if xml_frames is None:
        raise RuntimeError('<MotionFrames> not found')
    for xml_frame in xml_frames.findall('MotionFrame'):
        frames.append(_parse_frame(xml_frame, joint_indexes))

    return joint_names, frames


def _parse_frame(xml_frame, joint_indexes):
    n_joints = len(joint_indexes)
    xml_joint_pos = xml_frame.find('JointPosition')
    if xml_joint_pos is None:
        raise RuntimeError('<JointPosition> not found')
    joint_pos = _parse_list(xml_joint_pos, n_joints, joint_indexes)

    return joint_pos


def _parse_list(xml_elem, length, indexes=None):
    if indexes is None:
        indexes = range(length)
    elems = [float(x) for idx, x in enumerate(xml_elem.text.rstrip().split(' ')) if idx in indexes]
    if len(elems) != length:
        raise RuntimeError('invalid number of elements')
    return elems


def main(args):
    input_path = args.input
    
    print('Scanning files ...')
    files = [f for f in os.listdir(input_path) if os.path.isfile(os.path.join(input_path, f)) and f[0] != '.']
    basenames = list(set([os.path.splitext(f)[0].split('_')[0] for f in files]))
    print('done, {} potential motions and their annotations found'.format(len(basenames)))
    print('')

    # Parse all files.
    print('Processing data in "{}" ...'.format(input_path))
    all_ids = []
    all_motions = []
    all_annotations = []
    all_metadata = []
    reference_joint_names = None
    for idx, basename in enumerate(basenames):
        print('  {}/{} ...'.format(idx + 1, len(basenames))),

        # Load motion.
        mmm_path = os.path.join(input_path, basename + '_mmm.xml')
        assert os.path.exists(mmm_path)
        joint_names, frames = parse_motions(mmm_path)[0]
        if reference_joint_names is None:
            reference_joint_names = joint_names[:]
        elif reference_joint_names != joint_names:
            print('skipping, invalid joint_names {}'.format(joint_names))
            continue
        
        # Load annotation.
        annotations_path = os.path.join(input_path, basename + '_annotations.json')
        assert os.path.exists(annotations_path)
        with open(annotations_path, 'r') as f:
            annotations = json.load(f)

        # Load metadata.
        meta_path = os.path.join(input_path, basename + '_meta.json')
        assert os.path.exists(meta_path)
        with open(meta_path, 'r') as f:
            meta = json.load(f)

        assert len(annotations) == meta['nb_annotations']
        all_ids.append(int(basename))
        all_motions.append(np.array(frames, dtype='float32'))
        all_annotations.append(annotations)
        all_metadata(meta)
        print('done')
    assert len(all_motions) == len(all_annotations)
    assert len(all_motions) == len(all_ids)
    print('done, successfully processed {} motions and their annotations'.format(len(all_motions)))
    print('')

    # At this point, you can do anything you want with the motion and annotation data.


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='')
    parser.add_argument('input', type=str)
    main(parser.parse_args())

Notice that this only parses the joint values of the MMM file. You can also use the MMM framework to do much more with the data. Please refer to the MMM documentation for details.

Useful Software