Giter Club home page Giter Club logo

cn-words's Introduction

Get Similar Chinese Words and Sentences

Requirements:

  • Python >= 3.5
  • NumPy

To do List:

  • Chinese Words
  • Reimplementation GloVe Model
  • Chinese Sentence

Usage:

git clone https://github.com/HoratioJSY/cn-words.git
cd cn-words
python inference.py --word 机器学习

If you want train a simplify model that use a part of Chinese corpus, you could see sample-training , or test in Colab:

For examples

  • Get the similar words

Set Flag 「word」to chose target and search similar words:

>>> python inference.py --word 机器学习
机器学习 is close to: 模式识别、数据挖掘、深度学习、图形学、人工智能、神经网络、信号处理、运筹学、信息论、地理信息系统、数字图像处理、微分方程、面向对象、并行计算、概率论、故障诊断、生物信息学、数理统计
Inference time: 0.7688698768615723
>>> python inference.py --word 谢霆锋                                                                                
谢霆锋 is close to: 张学友、国语专辑、周杰伦、刘德华、王力宏、张惠妹、郭富城、陈奕迅、林俊杰、孙燕姿、梁咏琪、林忆莲、梅艳芳、任贤齐、容祖儿、谭咏麟、张韶涵、陈慧琳
Inference time: 0.7493560314178467

Set Flag 「top_k」to chose the numbers of nearest words:

>>> python inference.py --word 聚精会神 --top_k 12
聚精会神 is close to: 全神贯注、专心致志、凝神、扎扎实实、一心一意、认认真真、伟大旗帜、认真、真抓实干、专心、解放**、集中精力
Inference time: 0.7596039772033691
  • Get the similarity of two words

Base on word Vector to get Cosine similarity:

>>> python inference.py --word 微积分/概率论
Similarity is:  0.8017578125
Inference time: 0.0002090930938720703

>>> python inference.py --word 微积分/物理
Similarity is:  0.404052734375
Inference time: 0.00020599365234375

>>> python inference.py --word 微积分/文科生
Similarity is:  0.2337646484375
Inference time: 0.0002460479736328125
  • Words analogies
>>> python inference.py --word 卷积+深度学习
卷积 + 深度学习 is close to: 卷积神经网络、深度学习、循环神经网络、神经网络、微分方程、模式识别、自适应、傅里、数据挖掘、时域、差分、信号处理、滤波、频域、多项式、运算符、非线性、随机变量
Inference time: 0.7463030815124512

>>> python inference.py --word 摩托车-单车
摩托车 - 单车 is close to: 摩托车、汽车、轿车、客车、拖拉机、摩托、机动车、零部件、农用、三轮、卡车、变速器、电视机、跑车、小轿车、柴油、奥迪、汽油
Inference time: 0.7612090110778809

If you want to do some Words analogies, such as "冬天-夏天=寒冷-炎热",you should change the equation as "冬天-夏天+炎热=寒冷":

>>> python inference.py --word 冬季-夏季+炎热 --top_k 12
冬季-夏季+炎热 is close to: 寒冷、严寒、酷暑、多雨、酷热、冬季、少雨、季风气候、凉爽、湿润气候、气温、平均气温
  • Adding New Words

There is a simple and intuitive way to add new word to vocabulary. If flag 「add_vocabulary」is missing, output the testing results:

>>> python inference.py --add_word '残差网络=0.3*卷积神经网络+0.3*残差+0.3*图像识别+0.1*人工智能'
残差网络 is close to: 深度学习、循环神经网络、模式识别、神经网络、数据挖掘、自适应、信号处理、数字信号、时域、差分、微分方程、频域、图形学、人工智能、随机变量、线性规划、滤波、数字图像处理

If flag 「add_vocabulary」 is True, a new word vector is generated, and it will be added to vocabulary:

>>> python3 inference.py --add_word '残差网络=0.3*卷积神经网络+0.3*残差+0.4*图像识别' --add_vocabulary True
残差网络 is close to: 循环神经网络、深度学习、模式识别、神经网络、数字信号、时域、自适应、频域、数据挖掘、差分、信号处理
Successfully update vocabulary: 残差网络

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.