{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "\n", " \n", "# 使用 Transformer 进行语音识别\n", "\n", "# 0. 视频理解与字幕" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# 下载demo视频\n", "!test -f work/source/subtitle_demo1.mp4 || wget -c https://paddlespeech.bj.bcebos.com/demos/asr_demos/subtitle_demo1.mp4 -P work/source/" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import IPython.display as dp\n", "from IPython.display import HTML\n", "html_str = '''\n", "\n", "'''.format(\"work/source/subtitle_demo1.mp4 \")\n", "dp.display(HTML(html_str))\n", "print (\"ASR结果为:当我说我可以把三十年的经验变成一个准确的算法他们说不可能当我说我们十个人就能实现对十九个城市变电站七乘二十四小时的实时监管他们说不可能\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "> Demo实现:[https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/automatic_video_subtitiles/](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/automatic_video_subtitiles/)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# 1. 前言\n", "\n", "## 1.1 背景知识\n", "语音识别(Automatic Speech Recognition, ASR) 是一项从一段音频中提取出语言文字内容的任务。 \n", "目前该技术已经广泛应用于我们的工作和生活当中,包括生活中使用手机的语音转写,工作上使用的会议记录等等。\n", "\n", "