yong-it: RPi에서 음성 인식하기

라즈베리 파이에서 USB 마이크와 PocketSphinx라는 소프트웨어를 이용하여 음성인식을 시험해보았다. 이 글에서는 마이크를 사용하여 소리를 녹음해보고, 음성인식에 필요한 소프트웨어를 설치한 다음, 음성을 인식하는지 테스트해보고, 그것을 이용해서 다른 명령을 실행시키는 과정을 다룬다.

USB 마이크

구입과 설치

USB 마이크의 구입과 설치에 대해서는 Raspberry Pi에서 소리 입력 및 녹음하기를 참조. 아래와 같은 명령으로 소리를 녹음하고 재생할 수 있으면, 다음으로 진행할 수 있다(이 글에서는 소리의 녹음을 위해 ALSA 프레임워크를 사용한다).

$ arecord -D plughw:1,0 -d 5 test.wav
$ aplay test.wav

USB 마이크를 기본 녹음 장치로 설정

USB 마이크를 기본 녹음 장치로 만들어두는 것이 편리할 것이다.

$ more ~/.asoundrc
pcm.!default {
    type asym
    playback.pcm {
        type plug
        slave.pcm "hw:0,0"
    }
    capture.pcm {
        type plug
        slave.pcm "hw:1,0"
    } 
}

홈 디렉터리에 위와 같이 .asoundrc를 작성한다. 이제 arecord를 실행할 때 -D 옵션을 생략해도 된다.

$ arecord -d 5 test.wav

음성인식

의존 패키지 설치

PocketSphinx를 설치하기에 앞서, 필요한 패키지들을 먼저 설치한다.

sudo apt install libasound2-dev autoconf libtool bison swig python-dev python-pyaudio
sudo pip install gevent grequests

SphinxBase와 PocketSphinx 설치

SphinxBase를 설치한다.

git clone git://github.com/cmusphinx/sphinxbase.git
cd sphinxbase
./autogen.sh
make
sudo make install
cd ..

PocketSphinx를 설치한다.

git clone git://github.com/cmusphinx/pocketsphinx.git
cd pocketsphinx
./autogen.sh
make
sudo make install
cd ..

ldconfig를 실행.

sudo ldconfig

사전 만들기

말뭉치(corpus)를 텍스트 파일로 작성한다. 아래의 말뭉치는 구글 캘린더를 조작하는 명령과 관련된 것이다.

$ more corpus.txt 
day
week
month
4
agenda
refresh
next
previous
today

언어 모델과 사전을 만들 차례이다. 위에서 만든 말뭉치 파일을 http://www.speech.cs.cmu.edu/tools/lmtool-new.html 페이지의 Upload a sentence corpus file:에서 선택한 다음, "COMPILE KNOWLEDGE BASE" 버튼을 누르면 잠시 후에 다운로드 페이지로 이동한다. .lm과 .dic 파일을 다운로드하거나, .tgz로 압축된 파일을 다운로드하여 압축을 푼다. 파일들을 적당한 위치로 옮기고, 필요하다면 이름도 바꾼다.

음성인식 테스트

다음과 같이 pocketsphinx_continuous 명령을 실행하여 음성을 인식하는지 테스트해본다. -lm과 -dict 옵션에는 .lm 파일과 .dic 파일을 각각 지정한다. USB 마이크가 디폴트 녹음 장치로 설정되지 않았다면, -adcdev plughw:1,0 옵션을 추가하여 실행한다.

$ pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm cal.lm -dict cal.dic -samprate 16000/8000/48000 -inmic yes
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/en-us/en-us/feat.params
Current configuration:
[NAME]   [DEFLT]  [VALUE]
-agc   none  none
-agcthresh  2.0  2.000000e+00
-allphone    
-allphone_ci  no  no
-alpha   0.97  9.700000e-01

...

INFO: ps_lattice.c(1441): Joint P(O,S) = -144424 P(S|O) = -24085
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.001 xRT
NEXT
INFO: continuous.c(275): Ready....
^C

위에서는 음성을 NEXT라고 인식한 것을 볼 수 있다.

음성인식 결과를 이용하여 다른 명령 실행하기

pocketsphinx_continuous를 사용하여 구글 캘린더(이전 글 참조)를 조작하는 셸 스크립트를 다음과 같이 작성해보았다.

#!/bin/sh

cd /home/pi/calendar

pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us \
    -lm cal.lm -dict cal.dic -samprate 16000/8000/48000 -inmic yes | \
    xargs -L1 sh -c 'echo "$*" | head -c1 | tr A-Z a-z | \
    xargs xdotool search -onlyvisible -class "chromium" windowactivate type' -

위에서 pocketsphinx_continuous는 마이크로 입력된 음성을 분석해서 그에 해당하는 단어를 출력하는 것을 반복한다. 그러면, 출력이 있을 때마다 파이프와 xargs를 통해 head, tr, xdotool이 실행된다. 'NEXT'를 인식하였다면, 첫 글자 'N'을 취해서, 소문자인 'n'으로 변환하고, Chromium 창에서 'n'을 타이핑하는 것이다. 구글 캘린더는 키보드 입력으로 'n'이 들어오면 다음(next)으로 이동한다.
자동 실행을 위해 다음과 같은 systemd unit 파일을 ~/.config/systemd/user/pocketsphinx.service에 작성했다.

[Unit]
Description=PocketSphinx
After=graphical.target

[Service]
ExecStart=/home/pi/calendar/start-pocketsphinx.sh
Environment=DISPLAY=:0
Restart=no

[Install]
WantedBy=default.target

참고

댓글 7개:

Unknown2016년 11월 4일 오후 1:54
pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm cal.lm -dict cal.dic -samprate 16000/8000/48000 -inmic yes

이 명령을 실행했을 때

eroor : "dict.c" line 275 : failed to open dictionary file 'cal.dic' for reading : no such file or directory
라는 오류가..
답글삭제
답글
Unknown2016년 11월 4일 오후 1:55
pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -inmic yes
이 명령을 수행했을땐 실행이 됩니다
인식이 잘 안되는듯 합니다
답글삭제
답글
Unknown2016년 11월 6일 오후 6:15
작성자가 댓글을 삭제했습니다.
답글삭제
답글
Unknown2016년 11월 6일 오후 7:14
아 이제 됐습니다 감사합니다 ㅋㅋ
근데 한글인식 부분에 대해서는 어떻게 되고있나욤
답글삭제
답글

댓글 추가

2016년 10월 31일 월요일

RPi에서 음성 인식하기