Shutter Interaction Dataset

Plotly Shutter Visualization
Plotly Shutter Visualization

About the Dataset

The Shutter Interaction Dataset is a new dataset for predicting human intent with a public robot. This dataset provides a richer observation space than existing public datasets.

Collection

A Shutter robot was mounted to a movable cart and placed in two university lobbies to collect data. Data was collected from Shutter's sensors, including two forward-facing Microsoft Azure Kinect sensors that collect body tracking data. Additional data comes from various ways in which Shutter interacts with users, such as Shutter's motion, gaze, text-to-speech, and three buttons on the front side of the cart. Data collection took place across 13 days in August 2022.

The diagrams below illustrate the different layouts of the lobbies. The visualizations above depict interactions recorded in Lobby 1 and Lobby 2 respectively.

Lobby 1 Lobby 2

Interaction Walkthrough

During an interaction, the behavior of the robot was managed with a behavior tree. Shutter maintained a resting state until a person entered the field of view of either Kinect sensor. Shutter then randomly selected one of three greeting behaviors. The interaction continued beyond the greeting phase if the person pressed any of the cart's three buttons. Once an interaction had started, Shutter tracked the person with both body motion and gaze, explained its role as a photographer, took a photograph with a camera attached to its head, and showed the photograph on a monitor behind the robot. Then, Shutter verbally praised the image and displayed a QR code for the user to access the image. Interactions could be halted by users at any time by exiting the robot's field of view, after which the robot returned to a resting state.

Shutter with labeled sensors and buttons.

The Kinect sensors used to collect body tracking data and three buttons for a user to interact with Shutter.

Dataset Components

Here is a high-level overview of the Lobby 1 and Lobby 2 datasets:

Lobby 1 Lobby 2
Dataset size 131 MB 151 MB
Number of positive scenarios (interactions) 125 scenarios 189 scenarios
Number of negative scenarios (non-interactions) 221 scenarios 522 scenarios
Average scenario length 15.8 seconds 8.81 seconds
Average number of people in each frame 1.45 people 1.14 people
Average number of people per scenario 2.07 people 2.32 people
Maximum number of people visible in one frame 4 people 5 people

Skeleton Data

All of the skeleton and Shutter robot position information are in a fixed coordinate frame which is positioned at a static point in the environment.

For each of the joints listed below, the dataset includes x, y, z coordinates of that joint as well as sin and cos measurements for each dimension. This data comes from the Kinect cameras and was transformed to the environment coordinate frame.

  • ankle left
  • ankle right
  • clavicle left
  • clavicle right
  • ear left
  • ear right
  • elbow left
  • elbow right
  • eye left
  • eye right
  • foot left
  • foot right
  • hand left
  • hand right
  • handtip left
  • handtip right
  • head
  • hip left
  • hip right
  • knee left
  • knee right
  • neck
  • nose
  • pelvis
  • shoulder left
  • shoulder right
  • spine chest
  • spine navel
  • thumb left
  • thumb right
  • wrist left
  • wrist right

Robot Position

The dataset includes the x, y, and z positions of three places on the Shutter robot: cart, robot base, and robot head (see image below). The sin and cos measurements in each dimension for the three points are also recorded.

Shutter with three robot points labeled.

The three points on the Shutter robot recorded in the Shutter Interaction Dataset and the coordinate axes for each point.

Robot to Person Measurements

The dataset contains measurements from a user's head and pelvis to the robot head and cart. These measurements include both distance between the person joint and robot point and the cos and sin measurements for each combination.

Shutter Data

The dataset also includes information about Shutter's actions and a user's interaction with Shutter. This includes Shutter's current action, if each button on Shutter's cart is currently being pressed, Shutter's greeting, what fsm state Shutter is in (waiting, greeting, or interacting), and Shutter's current speech, if any.

Other Features

The dataset contains the following other features:

  • pid - person id
  • original frame id - original Kinect camera to record this data, either master or sub
  • original file name - original file to store the data
  • resampled timestamp - new timestamp taken from the three data sources
  • skel timestamp - original timestamp from the skeleton sata source
  • tf timestamp - original timestamp from the tf data source, which records robot position and robot to person measurements
  • shutter timestamp - original timestamp from the shutter data source
  • deployment location - location in which data was recorded, either location A or location B