This study uses publicly available online question and answering dataset in Arabic language. There are almost 430K questions and answers for 20 diseases specific categories. GPT-3.5-turbo model was fine-tuned with a portion of this dataset.
- Download original file from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/Y2JBEZ source.
- In case you are unable to convert to csv file, you can download it from https://www.dropbox.com/scl/fi/y392p992605bd6kg11nse/ds5b.csv?rlkey=2beoqwhkcie36vhb07ix2887e&dl=0 (cite orignial author)
- The file is in db format.
- We converted them them to csv format.
- Then we converted them json format to train the model.
- gpt3.5turbo.ipynb train the model
- Test file are different test cases