Giter Club home page Giter Club logo

nlp-practice-program's Introduction

NLP练手项目路线

写在前面:

很抱歉,这个repo我没有精力再往下写了,看到star竟然还在缓慢地增加,每天在愧疚中度过,甚至以泪洗面。

各位大佬有自己写完的demo,欢迎pull给我!!!我会合并进来,并且署名是你。

工具不限,可以tensorflow,也可以pytorch。可以原创,也可以搬运(不能侵权),希望注释详细,和我现有的代码一样。

我们来做一个nlp最全demo合集吧!冲鸭!

——来自一个面对star惶惶不可终日的nlp搬砖实习生

简介

各种NLP练手项目,贯彻注释比代码多的风格,学起来更带劲。博客地址

版本:

TensorFlow 1.4.0

包含内容

1.word2vec词嵌入

词嵌入:基于skip-gram训练词嵌入矩阵,每个词由300维向量表示,相同意义的词向量相似。
在NLP处理中通常会采用词嵌入来表示每个词。
-->项目入口
-->代码详解_视频入口

运行结果 (选取其中一个单词为例,根据词嵌入矩阵计算邻近词)

训练前
hemoglobin --> alden, vive, deviations, dlp, taj, beauvoir, pillow, allying
有道翻译结果:血红蛋白 --> 奥尔登,vive,偏差,dlp,泰姬陵,波伏娃,枕头,结盟
训练后
hemoglobin --> ligand, molecules, ligands, photosynthesis, aerobic, enzyme, pancreatic, chlorophyll
有道翻译结果:血红蛋白 --> 配体、分子、配体、光合作用、需氧、酶、胰腺、叶绿素

 

2.文本生成

风格仿写:学习哈利波特1-7全文,训练结束后给定起始单词(下方运行结果中,给定的起始单词为'Hi, '),由模型自主生成哈利波特风格的句子。
-->项目入口
-->代码详解_视频入口

运行结果

Hi, he was nearly off at Harry to say the time that and she had been back to his staircase of the too the Hermione?

 

3.字母排序

seq2seq最基础应用,给定单词如bca,使用seq2seq排序为abc。
-->项目入口

运行结果

the input is: hello
the output is: ['e', 'h', 'l', 'l', 'o']

 

4.摘要生成

seq2seq应用,给定一段话,自动生成摘要。
-->项目入口

运行结果

------------the text is:----------------
Use olive oil to cook this, salt it well, and it is the best, most tender popcorn I have ever eaten. I add a tiny bit of butter to mine, but don't need it. My nine year old daughter didn't like popcorn until she reluctantly tried this. After a few bites, she consumed half the bowl!
I bought mine at a specialty popcorn shop in Long Grove IL, so I didn't have to pay shipping costs, but when it's gone, I might have to bite the bullet and order it from here.",Spoiled me for other popcorn
------------the summary is:-------------
best tasting popcorn ever

nlp-practice-program's People

Contributors

dod-o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlp-practice-program's Issues

skip-gram的数据来源

你好,我想问一下,skip gram项目下的数据是来自哪里?我看了一下里面一个标点符号都没有。而且没有段落,感觉是被人处理过的。正常情况做词向量的训练应该是一个自然段一个自然段的数据吧,不会像这样直接给出1亿个单词的文件。

关于harry_potter_lstm.py 152行 tf.concat有个疑问

按照示例中我们有的是三维的:
[n_seqs, n_sequencd_length, lstm_num_units]
现在要变成二维的:
[n_seqs * n_sequencd_length, lstm_num_units]
是不是应该在第0维度上进行拼接?axis=0而不是axis=1?

比如我有下面的数据:
t1 = [ [[0,1],[2,3],[3,4],[4,5]], [[5,6],[6,7],[7,8],[8,9]], [[9,10],[10,11],[11,12],[12,13]] ] t2=tf.concat(t1,axis=0) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print(sess.run(t2)) # t2是[[ 0 1] [ 2 3] [ 3 4] [ 4 5] [ 5 6] [ 6 7] [ 7 8] [ 8 9] [ 9 10] [10 11] [11 12] [12 13]]
如果按照axis=1拼接的话,是在第二个维度拼接,会变成:
[[ 0 1 5 6 9 10] [ 2 3 6 7 10 11] [ 3 4 7 8 11 12] [ 4 5 8 9 12 13]]

是我理解错了还是这里有问题,感觉输出的最终维度(列)应该就是lstm的units

关于b站地址

作者你好,我在看写的4-seq2seq_summary_burner的summary_burner.py代码时,对于你去掉summary的最后个word还是不理解。看到你注释里说到的b站视频,可以给个链接吗?谢谢了

更新

怎么不更新了老哥,感觉你写的小项目很不错,上手很快

博客网址

你好,问一下,博客用啥搭建的?谢谢哈

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.