Giter Club home page Giter Club logo

kave_summarify's Introduction

ka-ve_summarify

Takım Üyeleri

Kullanım

Türkçe Metin Özetleme ve Ka|Ve Stemmer

Gerekli Kütüphaneler

import pandas as pd
import numpy as np
from nltk.corpus import stopwords
import heapq
from gensim.summarization import keywords
from nltk import sent_tokenize
from sklearn.metrics.pairwise import cosine_similarity
from gensim.models import KeyedVectors
import networkx as nx
import re
import numpy as np
import json
import pickle
from keras.models import model_from_json
from keras.models import load_model
import Extraction_Based_Text_Summarization

Yukarıdaki kütüphaneleri programınıza ekledikten sonra,

ex_sum = extraction_based_sum()

kodu ile class'ı çağırabilir. Sonrasında elinizde özetlenmesini istediğiniz veri ile

ex_sum.get_sentences(text,k)

k kadar cümleyi metinden alabilirsiniz. Anahtar kelime çıkarımı için ise,

ex_sum.get_keywords(text,ratio)

kodunu kullanarak, metindeki kelime sayısının oranı kadar anahtar kelime alabilirsiniz. Bu sayıyı ratio değeri ile kontrol edebilirsiniz.

Doc2Vec Method

Gerekli Kütüphaneler

import pandas as pd
import re
import numpy as np
import gensim

Kütüphaneler eklendikten sonra, önceden eğitilmiş Word2Vec modeliyle[2] Döküman benzerliği class'ımızı çalıştırıyoruz.

ds = DocSim(w2v_model,stopwords)

Sonrasında ise, elimizdeki dökümana en çok benzeyen 10 dökümanı sıralamak için aşağıdaki satırı çalıştırıyoruz.

sim_scores = ds.calculate_similarity(source_doc, target_docs)

Flask API

Flask API'sini kullanabilmek için ilk önce bir sanal flask ortamı oluşturmanız gerekiyor. Sonrasında ise aşağıda belirtilen kütüphaneleri kurmanız gerekmekte:

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup as bs
import requests 
import datetime
from nltk.corpus import stopwords
import heapq
from gensim.summarization import keywords
from nltk import sent_tokenize
from sklearn.metrics.pairwise import cosine_similarity
import networkx as nx
import re
import json
import pickle
from keras.models import model_from_json
from keras.models import load_model
from gensim.models import KeyedVectors
from gensim.corpora.wikicorpus import WikiCorpus
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.models import Doc2Vec
import tensorflow as tf

Kütüphaneleri kurduktan sonra, python main.pykomutunu çalıştırarak, ürünü demo hâlinde deneyebilirsiniz.

Veri Çekme / Oluşturma

Examples klasöründe bulunan hürriyetScraper dosyasından hurriyet.com.tr sitesi üzerinde haberleri çekerek, kendi verisetinizi oluşturabilir, bu örneği kullanarak TsCorpus üzerinden elinizdeki veriyi parçalayacak bir scraper yazabilirsiniz.

Referanslar

Demo Video

Demo Video

----------------------------------------------------------------------------------------------

Team Members

How to use

Turkish Text Summarization and Ka|Ve Stemmer

Necessary Libraries

import pandas as pd
import numpy as np
from nltk.corpus import stopwords
import heapq
from gensim.summarization import keywords
from nltk import sent_tokenize
from sklearn.metrics.pairwise import cosine_similarity
from gensim.models import KeyedVectors
import networkx as nx
import re
import numpy as np
import json
import pickle
from keras.models import model_from_json
from keras.models import load_model
import Extraction_Based_Text_Summarization

After you've added those libraries,

ex_sum = extraction_based_sum()

you can use the class with that code, and after that, with the text you want to be summarized,

ex_sum.get_sentences(text,k)

you can take the best k sentences from it. For keyword extraction,

ex_sum.get_keywords(text,ratio)

with the code above, you can get keywords according to the ratio of the word count in the text. You can control that by changing ratio value.

Doc2Vec Method

Necessary Libraries

import pandas as pd
import re
import numpy as np
import gensim

After importing the libraries, using the pre-trained Word2Vec model for Turkish[2], we've created our similarity model.

ds = DocSim(w2v_model,stopwords)

After that, using the code below, we've listed the most similar 10 documents to the document we've entered.

sim_scores = ds.calculate_similarity(source_doc, target_docs)

Flask API

In order to use Flask, you first need to create a virtual flask environment in order to avoid any conflict. The libraries used are as follows:

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup as bs
import requests 
import datetime
from nltk.corpus import stopwords
import heapq
from gensim.summarization import keywords
from nltk import sent_tokenize
from sklearn.metrics.pairwise import cosine_similarity
import networkx as nx
import re
import json
import pickle
from keras.models import model_from_json
from keras.models import load_model
from gensim.models import KeyedVectors
from gensim.corpora.wikicorpus import WikiCorpus
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.models import Doc2Vec
import tensorflow as tf

After downloading the libraries, you can use the live dome by running the code: `python main.py

Data Scraping

In the Examples folder, in hurriyetScraper file, you can find our scraper for hurriyet.com.tr. From this file,you can scrape your own dataset, and by referencing it you can write your own TsCorpus scraper for further analysis.

References

Demo Video

Demo Video

kave_summarify's People

Contributors

eruimdas avatar yemregundogmus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.