boostcampaitech3 / final-project-level3-nlp-02 Goto Github PK
View Code? Open in Web Editor NEWNLP_2조_강의 음성 데이터 요약 및 키워드 추출
License: Apache License 2.0
NLP_2조_강의 음성 데이터 요약 및 키워드 추출
License: Apache License 2.0
상황: 서버의 Jupyter 에서 Python 코드가 실행되지 않음
우측 상단의 Select Kernel에서 Python 3을 선택해도, 조금 뒤 No Kernel로 자동 전환
Shell들은 실행되지 않음
해결: 터미널 창에서 다음을 입력
conda install -n base ipykernel --update-deps --force-reinstall
실행문
python ./bin/main.py model=ds2 train=ds2_train train.dataset_path=/opt/ml/input/kspon_dataset/train train=ds2_train train.transcripts_path=/opt/ml/input/kospeech/dataset/kspon/transcripts.txt
에러 내용
OSError: /opt/conda/lib/python3.8/site-packages/torchaudio/_torchaudio.so: undefined symbol: _ZN3c104impl23ExcludeDispatchKeyGuardC1ENS_14DispatchKeySetE
get_model_binary.py 모델 실행 시 발생하는 오류
[오류]
TypeError: load() missing 1 required positional argument: 'Loader'
recognize엔 3개의 arguments가 들어가야 하는데, 4개가 들어가고 있다.
opt.device를 사용하지 않으므로 빼 준다.
/opt/ml/input/kospeech/bin/inference.py
변경 전
if isinstance(model, ListenAttendSpell):
model.encoder.device = opt.device
model.decoder.device = opt.device
y_hats = model.recognize(feature.unsqueeze(0), input_length, opt.device)
elif isinstance(model, DeepSpeech2):
model.device = opt.device
y_hats = model.recognize(feature.unsqueeze(0), input_length, opt.device)
elif isinstance(model, SpeechTransformer) or isinstance(model, Jasper) or isinstance(model, Conformer):
y_hats = model.recognize(feature.unsqueeze(0), input_length, opt.device)
변경 후
if isinstance(model, ListenAttendSpell):
model.encoder.device = opt.device
model.decoder.device = opt.device
y_hats = model.recognize(feature.unsqueeze(0), input_length)
elif isinstance(model, DeepSpeech2):
model.device = opt.device
y_hats = model.recognize(feature.unsqueeze(0), input_length)
elif isinstance(model, SpeechTransformer) or isinstance(model, Jasper) or isinstance(model, Conformer):
y_hats = model.recognize(feature.unsqueeze(0), input_length)
feature를 변경하여 해결
/opt/ml/input/kospeech/bin/inference.py
변경 전
elif isinstance(model, DeepSpeech2):
model.device = opt.device
y_hats = model.greedy_search(feature.unsqueeze(0), input_length, opt.device)
변경 후
elif isinstance(model, DeepSpeech2):
model.device = opt.device
feature = feature.unsqueeze(0).to(torch.device("cuda"))
y_hats = model.recognize(feature, input_length)
pip install numpy
pip install torch
pip install pandas
pip install matplotlib
conda install -c conda-forge librosa
pip install torchaudio==0.6.0
pip install tqdm
pip install warp_rnnt
pip install sentencepiece
pip install hydra-core --upgrade
pip install warp_rnnt는 CUDA error 발생
ModuleNotFoundError: No module named 'openspeech.data.text'
train.sh를 만들고, sh train.sh를 실행시켰으나 잘 되지 않았다.
해결
pip install -e .
@hydra.main(config_path=os.path.join("..", "openspeech", "configs"), config_name="train")
LexerNoViableAltException:
^
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details
train example 1을 실행하면 위와 같은 에러 발생
실행 내용 >
python ./bin/main.py model=ds2 train=ds2_train train.dataset_path=/opt/ml/input/kspon_dataset/train train.transcripts_path=/opt/ml/input/kospeech/dataset/kspon/transcripts.txt
오류 >
ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /opt/conda/envs/kospeech/lib/python3.8/site-packages/scipy/linalg/_matfuncs_sqrtm_triu.cpython-38-x86_64-linux-gnu.so)
실행 코드 >
python ./openspeech_cli/hydra_train.py \ .....
오류 >
OSError: /opt/ml/input/env/lib/python3.8/site-packages/torchaudio/_torchaudio.so: undefined symbol: _ZNK2at6Tensor6deviceEv
Traceback (most recent call last):
File "./bin/inference.py", line 26, in
from kospeech.data.audio.core import load_audio
File "/opt/ml/input/kospeech/bin/kospeech/data/init.py", line 15, in
from kospeech.data.audio.parser import SpectrogramParser
File "/opt/ml/input/kospeech/bin/kospeech/data/audio/parser.py", line 18, in
from kospeech.data.audio.core import load_audio
File "/opt/ml/input/kospeech/bin/kospeech/data/audio/core.py", line 17, in
import librosa
File "/opt/conda/envs/kospeech/lib/python3.8/site-packages/librosa/init.py", line 209, in
from . import core
File "/opt/conda/envs/kospeech/lib/python3.8/site-packages/librosa/core/init.py", line 6, in
from .audio import * # pylint: disable=wildcard-import
File "/opt/conda/envs/kospeech/lib/python3.8/site-packages/librosa/core/audio.py", line 8, in
import soundfile as sf
File "/opt/conda/envs/kospeech/lib/python3.8/site-packages/soundfile.py", line 17, in
from _soundfile import ffi as _ffi
File "/opt/conda/envs/kospeech/lib/python3.8/site-packages/_soundfile.py", line 2, in
import _cffi_backend
ImportError: libffi.so.7: cannot open shared object file: No such file or directory
inference 실행에서 위와 같은 에러 발생
require는 없어서 생기는 오류
/opt/ml/input/kospeech/bin/inference.py
변경 전
parser.add_argument('--model_path', type=str, require=True)
parser.add_argument('--audio_path', type=str, require=True)
parser.add_argument('--device', type=str, require=False, default='cpu')
변경 후
parser.add_argument('--model_path', type=str, required=True)
parser.add_argument('--audio_path', type=str, required=True)
parser.add_argument('--device', type=str, required=False, default='cpu')
from kospeech.models.las.decoder import DecoderRNN, BeamDecoderRNN
from kospeech.models.las.decoder import DecoderRNN
터미널 입력
python ./bin/main.py model=ds2 train=ds2_train train.dataset_path=/opt/ml/input/kspon_dataset/train train.transcripts_path=/opt/ml/input/kospeech/dataset/kspon/transcripts.txt
오류명
File "/opt/conda/envs/stt/lib/python3.8/site-packages/soundfile.py", line 142, in
raise OSError('sndfile library not found')
OSError: sndfile library not found
greedy_search도 없다고 한다.
/opt/ml/input/kospeech/bin/inference.py
변경 전
if isinstance(model, ListenAttendSpell):
model.encoder.device = opt.device
model.decoder.device = opt.device
y_hats = model.greedy_search(feature.unsqueeze(0), input_length, opt.device)
elif isinstance(model, DeepSpeech2):
model.device = opt.device
y_hats = model.greedy_search(feature.unsqueeze(0), input_length, opt.device)
elif isinstance(model, SpeechTransformer) or isinstance(model, Jasper) or isinstance(model, Conformer):
y_hats = model.greedy_search(feature.unsqueeze(0), input_length, opt.device)
변경 후
if isinstance(model, ListenAttendSpell):
model.encoder.device = opt.device
model.decoder.device = opt.device
y_hats = model.recognize(feature.unsqueeze(0), input_length, opt.device)
elif isinstance(model, DeepSpeech2):
model.device = opt.device
y_hats = model.recognize(feature.unsqueeze(0), input_length, opt.device)
elif isinstance(model, SpeechTransformer) or isinstance(model, Jasper) or isinstance(model, Conformer):
y_hats = model.recognize(feature.unsqueeze(0), input_length, opt.device)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.