ASL Finger Spelling Dataset

We propose two datasets for American Sign Language (ASL) finger spelling recognition. The datasets contain a set of RGB and depth images for each letter in the alphabet, organized by subject, for estimating generalization. You can see a demo of our fingerspelling system here.

Samples from the dataset A

Dataset A: 5 users (easy)

The first dataset comprises 24 static signs (excluding letters j and z because they involve motion). This was captured in 5 different sessions, with similar lighting and background. This is the dataset that was used for [1]. download

Dataset B: 9 users (hard)

The second dataset (depth only) is captured from 9 different persons in two very different environments and lighting. download

Performance baseline

Performance baseline according to the method discussed in [1]. The evaluation methodology used here was, for both datasets, to train the forest on all but one of the subjects, and test on the last one. This is to show generalization to unseed users, arguably the most relevant performance criterion. The results are then averaged over all the subjects.
Generalization
Datasetdepthintensitycombined
A0.490.350.47
B0.41--

FAQ

Q: How do I decode the depth image?
A: The depth images are saved as single channel short unsigned int images - this is the format provided by the kinect. You can read this using OpenCV using the function

cvLoadImage(filename.c_str(), CV_LOAD_IMAGE_UNCHANGED)

that will give you a pointer to iplImage of format IPL_DEPTH_16U. How to do the same thing with OpenCV 2.x C++ or python APIs is left as an exercise :)

References

  • [1] Pugeault, N., and Bowden, R. (2011). Spelling It Out: Real-Time ASL Fingerspelling Recognition In Proceedings of the 1st IEEE Workshop on Consumer Depth Cameras for Computer Vision, jointly with ICCV'2011. (pdf)