Giter Club home page Giter Club logo

albert-text-classification's Introduction

Text Classification, ๐Ÿ‡ฐ๐Ÿ‡ท ๋ฒ„์ „

ALBERT is "A Lite" version of BERT, a popular unsupervised language representation learning algorithm. ALBERT uses parameter-reduction techniques that allow for large-scale configurations, overcome previous memory limitations, and achieve better behavior with respect to model degradation.

For a technical description of the algorithm, see our paper: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Using the ktrain library, proceed with the text classification. Detailed descriptions can be found at Blog

๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป System requirements

pip install -r requirements.txt

๐Ÿ‘จ๐Ÿฟโ€๐Ÿ’ป How to use

With simple commands, you can proceed with text classification for datasets made up of csv files, use main.py:

python main.py \
	--csv data.csv \
	--label Category \
	--data Resume \
	--epoch 5

๐ŸŽจ parser detail

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--csv', help='train model csv file')
parser.add_argument('--label', help='train label of dataset')
parser.add_argument('--data', help='train dataset')
parser.add_argument('--epoch', help='traing Epoch')

๐Ÿ“ read_dataset

def read_dataset(dataset, data, label):
	df = pd.read_csv(dataset)
	label_list = list(set(df[args.label]))
	df.sample(frac=1)
	x_train, x_test, y_train, y_test = train_test_split(
    	list(df[data]), list(df[label]), test_size=0.33, random_state=42)
	return x_train, x_test, y_train, y_test, label_list

โ˜„๏ธ Available models

Replace the bottom part with the model you want.

	MODEL_NAME = 'albert-base-v2'
Model Type of detail
BERT: bert-base-uncased, bert-large-uncased, bert-base-multilingual-uncased, and others.
DistilBERT: distilbert-base-uncased, distilbert-base-multilingual-cased, distilbert-base-german-cased, and others
ALBERT: albert-base-v2, albert-large-v2, and others
RoBERTa: roberta-base, roberta-large, roberta-large-mnli
XLM: xlm-mlm-xnli15โ€“1024, xlm-mlm-100โ€“1280, and others
XLNet: xlnet-base-cased, xlnet-large-cased

๐Ÿดโ€โ˜ ๏ธ Performance

97.16 ๐Ÿ“ˆ

๐Ÿƒ predictor

You can use the function below.

def predictor(learner, test):
	predictor = ktrain.get_predictor(learner.model, preproc=t)
	print(predictor.predict(test))

๐Ÿ“Š tensorboard

tensorboard \
	--logdir==training:your_log_dir \
	--host=127.0.0.1

๐Ÿ”ฌ Library

https://github.com/amaiya/ktrain

albert-text-classification's People

Contributors

gyunggyung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.