Giter Club home page Giter Club logo

new-pointer-generator-networks-for-summarization-chinese's Introduction

zn

指针生成网络,中文数据集下生成摘要, 详情 https://blog.csdn.net/weixin_46133588/article/details/104419213

改动的地方

原论文的指针生成网络,对于正文和摘要的特征抽取是采用单层(双向)的LSTM进行抽取的,我将其变为Bert的embedding的结构。模型的整体框架没有变动,但是工程上的处理进行了微调。(并非使用了Bert)

中文数据: https://github.com/brightmart/nlp_chinese_corpus 250万篇新闻( 原始数据9G,压缩文件3.6G;新闻内容跨度:2014-2016年) Google Drive下载百度云盘下载,密码:k265

tokenizer

新闻数据集的分词代码

new-point-generate-zh

指针生成网络在新闻数据集下的应用

运行

先是tokenizer python main.py --original_data_dir E:\0000_python\point-genge\point-generate\zh\data --tokenized_dir ./tokenized_single E:\0000_python\point-genge\point-generate\zh\datal是我存放新闻数据的地方 这步需要挺多时间的。

然后进入new-point-generate-zh python main.py --token_data xxx/tokenized --use_coverage --pointer_gen --do_train --do_decode xxx_toenized 是存放分词后的文件夹

#效果 rouge-1 39% rouge-2 15% rouge-l 37%

new-pointer-generator-networks-for-summarization-chinese's People

Contributors

hquzhuguofeng avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.