简介

谷歌发布一个大型视频数据集 YouTube-8M 。其中包含了 800 万个 YouTube 视频的 URL，代表 50 万小时长度的视频，并带有视频标注。这些标注来自一个多样化的、包含了 4800 个知识图谱实体（Knowledge Graph entity）的集合。与之前已有的视频数据集相比，YouTube-8M 的规模和多样性都得到了显著的提升。先前最大的视频数据集 Sports-1M ，包含了大约 100 万段 YouTube 视频和 500 个体育领域的分类。

Git地址：https://github.com/google/youtube-8m
比赛官网：https://research.google.com/youtube8m/index.html
官方发布视频特征提取代码：https://github.com/google/youtube-8m/tree/master/feature_extractor
冠军代码：https://github.com/antoine77340/Youtube-8M-WILLOW

主要内容与使用

数据集介绍

4716 类标签，多标签体系，平均每个视频 3.4 个标签。标签定义下载：https://research.google.com/youtube8m/csv/vocabulary.csv
Each video must be public and have at least 1000 views
Each video must be between 120 and 500 seconds long
Each video must be associated with at least one entity from our target vocabulary
Adult & sensitive content is removed (as determined by automated classifiers)
特征分两种：frame-leval, video-level，每种都包括 rgb 特征、audio 特征。官网下载
视频特征源自 inception-v3 TensorFlow model & PCA
音频特征源自《CNN Architectures for Large-Scale Audio Classification》
保存文件格式为 .tfrecord

本地特征提取

官方开放的只有 frame-level 的特征提取工具。成绩比较好的队伍，用到的也是 frame-level 特征（信息更多）参见论文《YouTube-8M: A Large-Scale Video Classification Benchmark》
运行环境检查
（1）环境要求：TensorFlow, OpenCV (linked with ffmpeg)
（2）检查语句，返回 True 即可：

python -c ‘import tensorflow; import cv2; print cv2.VideoCapture().open(“/[path]/[to]/[some]/video1.mp4”)’

视频名称、类别信息 .csv 格式保存为 /[path]/[to]/[some]/vid_dataset.csv，video1.mp4、video2.mp4 是本地视频的名称。52;3;10 是其所属的类别号（人为定义），可以包括多标签，分号隔开。同一份文件可以包含多个视频：

/[path]/[to]/[some]/video1.mp4,52;3;10
/[path]/[to]/[some]/video2.mp4,1;2
提取特征语句，特征保存到 output.tfrecord 文件：

python extract_tfrecords_main.py –input /[path]/[to]/[some]/vid_dataset.csv –output_tfrecords_file /[path]/[to]/[some]/output.tfrecord

训练 & inference

有 .tfrecord 文件后，参见冠军代码：https://github.com/antoine77340/Youtube-8M-WILLOW
模型保存在参数 –train_dir 指定的位置会产生的文件夹，训练 & inference 指定相同的文件夹
参考：https://blog.csdn.net/yOung_One/article/det