vasanthengineer4949 / nlp-projects-nhv Goto Github PK

View Code? Open in Web Editor NEW

388.0 10.0 154.0 96.32 MB

NLP Projects playlist

Python 1.46% Jupyter Notebook 98.54%

nlp-projects-nhv's Introduction

NLP ADVANCED - NHV

NLP A-Z COURSE

Roadmap

Video Link

COURSE

Topic	Video Link
NLP A-Z Course Part - 1	Video Link
NLP A-Z Course Part - 2	Video Link

VIDEOS IN ORDER

Serial Number	Topic	Code Link	Video Link
1	Transformers From Scratch	Code Link	Video Link
2	BERT For Text Classification	Code Link	Video Link
3	BERT NER	Code Link	Video Link
4	T5 - All NLP Tasks	Code Link	Video Link
5	Llama2 Finetuning	Code Link	Video Link
6	LoRA Paper Explanation and Implementation	Code Link	Video Link
7	DPO Paper Explanation and Implementation	Code Link	Video Link
8	Mistral Architecture Explanation	Code Link	Video Link
9	Mistral Finetuning	Code Link	Video Link
10	Mistral DPO Finetuning	Code Link	Video Link
11	LLM Evaluation using Mistral	Code Link	Video Link
12	Mistral RAG	Code Link	Video Link
13	LLM Finetuning Crash Course	Code Link	Video Link
14	LLM For Information Extraction	Code Link	Video Link
15	Gemma Architecture Explained with Finetuning	Code Link	Video Link
16	My Best LLM using Model Merging	Code Link	Video Link
17	Mixture of Experts from Scratch	Code Link	Video Link
18	LoRA Merging	Code Link	Video Link
19	Deploy and Serve LLM using Ollama WebUI	No Code	Video Link
20	Whatsapp Chatbot using Twilio and Open Source LLMs	Code Link	Video Link
21	Edubot - Llama RAG Application	Code Link	Video Link
22	AI Girlfriend - Benefits of Prompting	Code Link	Video Link
23	YTBuddy - Chat with Videos	Code Link	Video Link
24	Cricbot - Chat with CSV	Code Link	Video Link
25	Codepal - Chat with Git Repo	Code Link	Video Link
26	Building your own Copilot in VSCode Realtime	Code Link	Video Link
27	Realtime Research Agent with Deployment	Code Link	Video Link
28	AI Database Administrator - Chat with Database	Code Link	Video Link
29	Building my own AI Startup	Code Link	Video Link
30	1 Bit LLM Pretraining - Era of 1 Bit LLM	Code Link	Video Link

nlp-projects-nhv's People

Contributors

Stargazers

Watchers

Forkers

kumar045 nickydark1 soumenksarker bet0x wayan123 jan-karsten-kuhnke makkiloyola wesley7137 plaban1981 taltaf913 almugabo lilwonga angshucornell tonywhite11 xjohnxjohn corticalstack jmullings tanio253 davidlanz namitasujeet polya20 martinobettucci jeffara nikhilkudavemnh phuvinhnguyen smarthi balaji1732000 rshivanipriya pruspai ramnathv varunkhanna1993 abdoiiii donwany pvh1602 ainfachalex pauliusism pawanhv somitm wxqianggo dillip9676 amukelani-ngobeni dataspherex shekharcode hasnainhaider07 ml-algorhythms dharmaraj777 maheshmechengg ai-bassem arafathusayn techthiyanes wayne-arul prabhurajendhran kromian plaethos27 ibelieveai skaiphd gokulbk01 giriprasad51 surya8singh mekongdelta-mind er-vivekkumar ganpatirathia ayitharevanth samarpan-rai11 rajkumar17493 hrishikesh9890 kartik-dixit-11 rauthbibek fastdaima dhirendra-lab ab2021 ravina029 lalroshan590 prabhakaran2395 mishi-sarfraz rahulharshawardhan skynoid2612 maharengarajan krsinghmayank banzom amit-k-maurya-16 raykarr rishika70 eugenio-schiavoni abilaashss abhishek351 fahimhossen55 msaif28 binderjoe mohadata22 anvesh4161 adam-aalah jatinmishra235 shibu4064 nethajinirmal13 shakilsustswe srv-sh parth673 ayushichakrabarty iuvenis-pictorem

nlp-projects-nhv's Issues

Passing Dataset

Hi all,

I'm currently working on fine tuning Mistral using my own dataset. But I'm wondering of how to upload the data. I used dataset = load_dataset('json', data_files='/path to the dataset/.jsonl', split='train')

But this error is showing when I run AutoModelForCausalLM.from_pretrained:
ValueError: You need to pass dataset in order to quantize your model

It looks like it can't see it.

Will appreciate any help.

Can you provide a Mistral-7B-Instruct-v0.1 model Lora-tuning tutorial?

ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

Finetuning Zephyr:

You passed quantization_config to from_pretrained but the model you're loading already has a quantization_config attribute and has already quantized weights. However, loading attributes (e.g. use_exllama, exllama_config, use_cuda_fp16, max_input_length) will be overwritten with the one you passed to from_pretrained. The rest will be ignored.
WARNING:auto_gptq.nn_modules.qlinear.qlinear_cuda:CUDA extension not installed.
WARNING:auto_gptq.nn_modules.qlinear.qlinear_cuda_old:CUDA extension not installed.
ERROR:auto_gptq.nn_modules.qlinear.qlinear_exllama:exllama_kernels not installed.

Device error DPO part2 notebook

When running the notebooks out of the box, I am getting an device error when trying to train the DPO.
Somehow something is on the cpu while it expects a CUDA device. I am running on the T4 in google colab... do you know what it could be and how to solve? Thanks!

Dataloader error while running DPO part 2 notebook.

Hi,

I keep getting the following dataloader error when I run the DPO code:
Traceback (most recent call last): File "/home/ml/users/---/research/learning_from_preferences/rlhf_starter_code/rlhf/dpo_falcon.py", line 114, in <module> dpo_trainer.train() File "/home/ml/users/---/anaconda3/envs/trl/lib/python3.9/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/home/ml/users/---/anaconda3/envs/trl/lib/python3.9/site-packages/transformers/trainer.py", line 2178, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/home/ml/users/----/anaconda3/envs/trl/lib/python3.9/site-packages/accelerate/data_loader.py", line 454, in __iter__ current_batch = next(dataloader_iter) File "/home/ml/users/----/anaconda3/envs/trl/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 631, in __next__ data = self._next_data() File "/home/ml/users/----/anaconda3/envs/trl/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/ml/users/----/anaconda3/envs/trl/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/home/ml/users/----/anaconda3/envs/trl/lib/python3.9/site-packages/trl/trainer/utils.py", line 338, in __call__ to_pad = [torch.LongTensor(ex[k]) for ex in features] File "/home/ml/users/----/anaconda3/envs/trl/lib/python3.9/site-packages/trl/trainer/utils.py", line 338, in <listcomp> to_pad = [torch.LongTensor(ex[k]) for ex in features] TypeError: an integer is required (got type NoneType) 0%| | 0/50 [00:00<?, ?it/s]

NameError: name 'sft_config' is not defined in Pretraining llama3 code

NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 train_sft = Trainerr(data, sft_config)
2 train_sft.train_and_save_model()

NameError: name 'sft_config' is not defined

IN the following lines "train_sft = Trainerr(data, sft_config)
train_sft.train_and_save_model()" sft_config is not defined anywhere

phi-1.5 finetuning

Hi,
First of all thanks for your contributions and sharing.
I'm having trouble to get inference stopped. Actually, after my fine-tuning, when I'm doing an inference. the model keep generating tokens until it reaches "max_tokens". What should I do (I'm newbie 😃) to get rid of this and stop at the end of the answer.

Any insight ?

thanks for your feedback.