Giter Club home page Giter Club logo

aicup_audio_2023's Introduction

AICUP_audio_2023

完整程式說明報告(Link)

多模態病理嗓音分類競賽(Link)

此repository為 AICUP-2023 多模態病理嗓音分類競賽,用以上傳程式以及提交的答案用的。其比賽內容為主辦方提供五種類型的人的發聲音檔,分別是嗓音誤用、聲帶麻痺、聲帶腫瘤、聲帶閉合不全與正常組,並還提供受測者之生理資料供我們利用,參賽者需要透過訓練模型,儘可能準確地預測每一組未知資料,努力提高Unweighted Average Recall(UAR)指標。

比賽重點與遇到之問題

  • 各族群資料筆數「極」為不平均,最多的跟最少的族群資料量相差至17倍
  • 官方提供了各受測者的生理資料,需要斟酌聲音與生理資料的權重比例

詳細內容

  • 為確保資料等長,僅選取音頻的第一秒,並利用MFCC對聲音訊號進行前處理
  • 建立雙輸入之深度學習模型,同時輸入聲音資訊與生理資訊,輸出特徵數量比例為1:2,且卷積神經網路之卷積核的大小為3*10,這是一種非典型的長方形卷積核
  • 建立損失函數時,加入權重以解決族群間數量差距過大問題
  • 使用CELU作為模型的激活函數,而非RELU。
  • 選擇隨機梯度下降(SGD)作為優化器
  • 透過調整參數,訓練出五個有差異性的模型,並做Voting Ensemble來提高預測準確度

Leaderboard

Public Score Public Rank Private Score Private Rank
0.600654 7 / 371 0.607568 6 / 371

更新

最後結果:

第四名 + 趨勢科技潛力獎

final
final

Getting the code

git clone https://github.com/JulianLee310514065/AICUP_audio_2023.git

Repository structure

┌ submit┌ output_mfcc13.npy
│       ├ output_mfcc17.npy
│       ├ output_mfcc21.npy
│       ├ output_mfcc30.npy
│       └ output_mfcc50.npy
│
├ 1_DataPreprocessing.ipynb
├ 2_AI_CUP_mfcc13.ipynb
├ 2_AI_CUP_mfcc17.ipynb
├ 2_AI_CUP_mfcc21.ipynb
├ 2_AI_CUP_mfcc30.ipynb
├ 2_AI_CUP_mfcc50.ipynb
├ 3_Ensemble_pub_pri.ipynb
├ LICENSE
├ README.md
├ mfcc13_use_all.pth
├ mfcc17_use_all.pth
├ mfcc21_use_all.pth
├ mfcc30_use_all.pth
├ mfcc50_use_all.pth
└ submission.csv

Setting the environment

conda create -n pytorch-gpu python=3.9
conda activate pytorch-gpu

根據合適的顯卡版本安裝pytorch、cudatoolkit

# CUDA 11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

安裝必須庫

pip install librosa
pip install sklearn

若因模組更新導致出現問題,可指定版本

pip install librosa==0.10.0.post2
pip install sklearn==1.2.2

Prepared dataset

訓練資料連結 Google Drive Link

1_DataPreprocessing.ipynb有定義make_mfcc函數,可透過輸入DataFramen_mfcc來獲得可供訓練或驗證的.npy檔。

def make_mfcc(df:pd.DataFrame, n_mfcc=13):
    for file_path in df['wave_path'].to_list():

        signal_tem, sample_rate = librosa.load(file_path, sr=44100)
        signal = signal_tem[:44100]        

        n_fft = int(16/1000 * sample_rate)  
        hop_length = int(8/1000 * sample_rate)

        # MFCCs
        MFCCs = librosa.feature.mfcc(y=signal, sr =sample_rate, n_fft=n_fft, hop_length=hop_length, n_mfcc=n_mfcc)
        # print(MFCCs.shape)

        np.save(file_path.replace('.wav', f'_mfcc_{n_mfcc}.npy'), MFCCs)

Download the best model

repository中有五個mfcc??_use_all.pth檔,即為對應2_AI_CUP_mfcc??.ipynb數的模型之最好的參數,下載使用即可。

Training

模型訓練程式寫在2_AI_CUP_mfcc??.ipynb中的Training部分,??代表n_mfcc數,這裡值得注意的是,因為這次比賽的各類數量差異太大,故在定義CrossEntropyLoss時,我們使用了一個weight張量來設定每個類別的權重,以平衡各個類別在訓練過程中的影響。

# Calculate the count of each class.
numberlist = training_df['Disease category'].value_counts().sort_index().to_list()

# model 
model = Network().to(device)

# optimizer
weight = torch.tensor([1/numberlist[0], 1/numberlist[1], 1/numberlist[2], 1/numberlist[3], 1/numberlist[4]]).to(device)
criterion = nn.CrossEntropyLoss(weight=weight)
optimizer = SGD(model.parameters(), lr=0.01, weight_decay= 0.0001)

Inference (public、private data)

在驗證模型的部分,我們首先從讀取最佳模型開始,然後分別運行Public dataPrivate data,最後將它們合併在一起,以2_AI_CUP_mfcc13.ipynb做舉例。

# Load model
model.load_state_dict(torch.load("{}.pth".format("mfcc13_use_all")))

# Predict public data
data_df = pd.read_csv(r'..\Public_Testing_Dataset\test_datalist_public.csv')
...
y_pub = [x.numpy() for x in pub_save]
y_pub[:5]

# Predict private data
data_private_df = pd.read_csv(r'..\Private_Testing_Dataset\test_datalist_private.csv')
...
y_pri = [x.numpy() for x in pub_save_private]
y_pri[:5]

# Combine and save
y_all = y_pub + y_pri
mmffcc13 = np.array(y_all)
np.save('output_mfcc13.npy', mmffcc13)

Ensemble

結果集成的方面我使用的是機率相加,及五種模型與測出來的機率相加,然後取最高的類別作為該聲音資訊的預測結果,程式於3_Ensemble_pun_pri.ipynb中。

# Load numpy file
out1 = np.load('output_mfcc13.npy')
out2 = np.load('output_mfcc17.npy')
out3 = np.load('output_mfcc21.npy')
out4 = np.load('output_mfcc30.npy')
out5 = np.load('output_mfcc50.npy')

# Ensemble
out_all = out1 + out2 + out3 + out4 + out5
predict_out = out_all.argmax(1)

Reproducing submission

若要重現最終提交結果,可以做以下步驟:

  1. 完整跑 Prepared dataset
  2. 依序跑五個2_AI_CUP_mfccxx.ipynb,但不須跑Training部分
  3. 完整跑 Ensemble,即可得到最後結果。

若需最後高分之上傳結果,也在repository中,為submission.csv

Acknowledgement

前處理:

模型架構:

Citation

@misc{
    title  = {AICUP_audio_2023},
    author = {Chang-Yi Lee}
    url    = {https://github.com/JulianLee310514065/AICUP_audio_2023},
    year   = {2023}
}

aicup_audio_2023's People

Contributors

julianlee310514065 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

aicup_audio_2023's Issues

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object

我有依照 readme.md 的SOP, 但沒辦法訓練,請問以下的error要怎麼處理

TypeError Traceback (most recent call last)
Cell In[12], line 15
12 losses = 0.
14 # training dataset
---> 15 for batch, (mfcc_img, medicals, label, img_path) in enumerate(train_dl):
16 model.train()
18 inputs, medicals, labels = mfcc_img.float().to(device), medicals.float().to(device), label.to(device)

File ~/anaconda3/envs/ai_cup_noise/lib/python3.8/site-packages/torch/utils/data/dataloader.py:681, in _BaseDataLoaderIter.next(self)
678 if self._sampler_iter is None:
679 # TODO(pytorch/pytorch#76750)
680 self._reset() # type: ignore[call-arg]
--> 681 data = self._next_data()
682 self._num_yielded += 1
683 if self._dataset_kind == _DatasetKind.Iterable and
684 self._IterableDataset_len_called is not None and
685 self._num_yielded > self._IterableDataset_len_called:

File ~/anaconda3/envs/ai_cup_noise/lib/python3.8/site-packages/torch/utils/data/dataloader.py:721, in _SingleProcessDataLoaderIter._next_data(self)
719 def _next_data(self):
720 index = self._next_index() # may raise StopIteration
--> 721 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
722 if self._pin_memory:
723 data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)

File ~/anaconda3/envs/ai_cup_noise/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py:52, in _MapDatasetFetcher.fetch(self, possibly_batched_index)
50 else:
51 data = self.dataset[possibly_batched_index]
---> 52 return self.collate_fn(data)

File ~/anaconda3/envs/ai_cup_noise/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py:175, in default_collate(batch)
172 transposed = list(zip(*batch)) # It may be accessed twice, so we use a list.
174 if isinstance(elem, tuple):
--> 175 return [default_collate(samples) for samples in transposed] # Backwards compatibility.
176 else:
177 try:

File ~/anaconda3/envs/ai_cup_noise/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py:175, in (.0)
172 transposed = list(zip(*batch)) # It may be accessed twice, so we use a list.
174 if isinstance(elem, tuple):
--> 175 return [default_collate(samples) for samples in transposed] # Backwards compatibility.
176 else:
177 try:

File ~/anaconda3/envs/ai_cup_noise/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py:147, in default_collate(batch)
144 if elem_type.name == 'ndarray' or elem_type.name == 'memmap':
145 # array of string classes and object
146 if np_str_obj_array_pattern.search(elem.dtype.str) is not None:
--> 147 raise TypeError(default_collate_err_msg_format.format(elem.dtype))
149 return default_collate([torch.as_tensor(b) for b in batch])
150 elif elem.shape == (): # scalars

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.