Giter Club home page Giter Club logo

deepnews's Introduction

DeepNews License

Automatic Headline Generation from News Articles in Hindi Language

DeepNews is a high-level headline generating tool, written in Python and capable of running on top of either Keras, TensorFlow or Theano. It was developed for media orgnizations or writters where they can quickly come up with headline that is short and information conveying.


Getting started

Installing

DeepNews in written on top of Python and Keras, ThensorFlow and Theano.

Installing Python:

Installing Keras

  • sudo pip install keras
  • Windows Based System can follow this steps Stackoverflow

Installing TensorFlow

Amazon AWS (All libraries are installed in the AMI image)

Neural networks are computations heavy, GPU configuration is recommended.


Dataset

Word2Vec (Hindi Language)

Word2Vec Link Image

Neural Network Model

Input Model

Input NN Model

Dataset Statistics

Length of Article histogram

Length of Article Histogram

Length of Headline histogram

Length of Headline Histogram

FIRE Dataset stats

features values
no of articles 2,97,965
no of tokens 85,940,081 (85.94M)
no of unique tokens in articles 3,88,449
no of unique tokens in headlines 58,448
avg length of article 272
avg length of headline 7
size of dataset 1.06GB
avg. of (ratio len(article)/len(headline)) (Behind 43 words of description, headline contain 1 word) 43

Crawled Dataset stats

features values
no of articles 5,95,847
no of tokens 20,92,32,922 (209M)
no of unique tokens in articles 10,26,083
no of unique tokens in headlines 1,24,965
avg length of article 316
avg length of headline 11
size of dataset 3.70GB
avg. of (ratio len(article)/len(headline)) (Behind 43 words of description, headline contain 1 word) 34

Number of Crawled Articles per source

News Website Number of Articles URL
Aaj Tak 92765 http://www.aajtak.intoday.in
ABP News 13654 http://www.abpnews.abplive.in
Amar Ujala 181 http://www.amarujala.com
BBC Hindi 28861 http://bbc.com/hindi
Deshbandhu 3174 http://deshbandhu.co.in
Economic Times 993 http://hindi.economictimes.indiatimes.com
Jagran 73290 http://www.jagran.com
Navbharat Times 10329 http://www.navbharattimes.indiatimes.com
NDTV 92942 http://www.khabar.ndtv.com/news/
News18 38833 http://www.news18.com
Patrika 68288 http://www.patrika.com
Punjab Kesari 15494 http://www.punjabkesari.in
Rajasthan Patrika 89038 http://www.rajasthanpatrika.patrika.com
Zee News 10463 http://www.zeenews.india.com/hindi

deepnews's People

Contributors

kabrapratik28 avatar vatsalgit avatar pranavghate94 avatar ykedia avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.