This is a document of how to run Espnet (v1) ASR Demo and its model quantization
Test enviroment:

Ubuntu	CUDA	GCC
21.04	11.6	11.2

Installation

Note: Please follow the original installation guide provided by Espnet. Only some notes below should be paid attention to.

Requirements

sox	sndfile	ffmpeg	flac
installed	installed	not installed	not installed

Exactly follow the installation guide
Notes:

The Kaldi installation includes two parts: 1. tools installation 2. src installation. Make sure install them all in order
Once installed, many .o binary files can be found in directories such as: <kaldi-root>\{featbin,fgmmbin,fstbin,etc.}

Exactly follow the installation guide
Notes:

Kaldi should be linked into <espnet>/tools (check guide)
Option A) Setup Anaconda environment is choosen in this document, so a virtual enviroment espnet is created with python==3.8
Since the current CUDA version is 11.6, which is not compatible with pytorch 1.10.1, so espnet should be installed by $ make TH_VERSION=1.10.1 CUDA_VERSION=11.3, which specifies the version pytorch and CUDA
Custom tools in [Optional] Custom tool installation are not installed
install chainer in the espnet conda enviroment by pip install chainer==6.0.0 (cupy is not installed due to some errors)

This demo is to decode (translate) .wav audio file into words

Notes: some

Prepare the audio file
eg. the test.wav file in espnet/utils
Put the .wav file in espnet/egs/tedlium2/asr1
Perform decoding
a. cd espnet/egs/tedlium2/asr1 and source ./path.sh
b. recog_wav.sh --models <downloaded-model> test.wav
Notes: The default approach is to use godown package, which could cause a time out error due to the network disconnection. In this case, the model file, eg. model.streaming.v1.tar.gz, need to be downloaded manually from google drive (see Espnet readme)
Then, modify the download_from_google_drive.sh file in espnet/utils directory as follows:
a. create a variable manual_download_dir that specifies the path of the downloaded model file. eg. manual_download_dir="/home/glinttsd/espnet/egs/tedlium2/asr1/model.streaming.v1.tar.gz"
b. replace the codes in line 46-47 with
```
	if [ -f "$manual_download_dir" ]
	then 
	echo "File download locally"
	decompress "${manual_download_dir}" "${download_dir}"
	else
	echo "File download from url: ${share_url}"
	gdown --id "${file_id}" -O "${tmp}"
	decompress "${tmp}" "${download_dir}"
	fi
```
which skips the download part and decompress the model file directly.

To quantize the model from FP32 to INT8

Espnet provides dynamic quantization method through pytorch API.

To enable dynamic quantization, add the following codes in espnet/utils/recog_wav.sh file line 248-249

        --quantize-asr-model True \
        --quantize-dtype "qint8" \

Now we can perform decoding as described in the last section

More usage can be found here

加:2022-04-04 12:11:32 更:2022-04-04 12:14:27

-2025/7/19 12:11:52-

网站联系: qq:121756557 email:121756557@qq.com IT数码