We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors. As a result, our dataset is more challenging than existing ones, and will help push the field forward to enable real-world applications.
Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection
Find the training set (with labels) of Okutama-Action in the following link. Find the test set in the following link.
The creation of this dataset was supported by Prendinger Lab at the National Institute of Informatics, Tokyo, Japan.