Espnet ASR Demo & Quantization Document
- This is a document of how to run Espnet (v1) ASR Demo and its model quantization
- Test enviroment:
Ubuntu | CUDA | GCC |
---|
21.04 | 11.6 | 11.2 |
Installation
Note: Please follow the original installation guide provided by Espnet. Only some notes below should be paid attention to.
Requirements
sox | sndfile | ffmpeg | flac |
---|
installed | installed | not installed | not installed |
Install Kaldi
Exactly follow the installation guide Notes:
- The Kaldi installation includes two parts: 1. tools installation 2. src installation. Make sure install them all in order
- Once installed, many
.o binary files can be found in directories such as: <kaldi-root>\{featbin,fgmmbin,fstbin,etc.}
Install Espnet
Exactly follow the installation guide Notes:
- Kaldi should be linked into
<espnet>/tools (check guide) Option A) Setup Anaconda environment is choosen in this document, so a virtual enviroment espnet is created with python==3.8 - Since the current CUDA version is 11.6, which is not compatible with pytorch 1.10.1, so
espnet should be installed by $ make TH_VERSION=1.10.1 CUDA_VERSION=11.3 , which specifies the version pytorch and CUDA - Custom tools in
[Optional] Custom tool installation are not installed - install chainer in the
espnet conda enviroment by pip install chainer==6.0.0 (cupy is not installed due to some errors)
Run ASR Demo
This demo is to decode (translate) .wav audio file into words
Notes: some
- Prepare the audio file
eg. the test.wav file in espnet/utils Put the .wav file in espnet/egs/tedlium2/asr1 - Perform decoding
a. cd espnet/egs/tedlium2/asr1 and source ./path.sh b. recog_wav.sh --models <downloaded-model> test.wav Notes: The default approach is to use godown package, which could cause a time out error due to the network disconnection. In this case, the model file, eg. model.streaming.v1.tar.gz , need to be downloaded manually from google drive (see Espnet readme) Then, modify the download_from_google_drive.sh file in espnet/utils directory as follows: a. create a variable manual_download_dir that specifies the path of the downloaded model file. eg. manual_download_dir="/home/glinttsd/espnet/egs/tedlium2/asr1/model.streaming.v1.tar.gz" b. replace the codes in line 46-47 with if [ -f "$manual_download_dir" ]
then
echo "File download locally"
decompress "${manual_download_dir}" "${download_dir}"
else
echo "File download from url: ${share_url}"
gdown --id "${file_id}" -O "${tmp}"
decompress "${tmp}" "${download_dir}"
fi
which skips the download part and decompress the model file directly.
Model Quantization
To quantize the model from FP32 to INT8
Espnet provides dynamic quantization method through pytorch API.
To enable dynamic quantization, add the following codes in espnet/utils/recog_wav.sh file line 248-249
--quantize-asr-model True \
--quantize-dtype "qint8" \
Now we can perform decoding as described in the last section
More usage can be found here
|