Giter Club home page Giter Club logo

chinese-law-bert-similarity's Introduction

How to use

Prediction

This project, I improve model which was trained, so you can download it, and use it to prediction!

  • this project just support every sentences with 45 char length
  • download model file, pwd: vv1k
  • just use like this
    • first

      bs = BertSim(gpu_no=0, log_dir='log/', bert_sim_dir='bert_sim_model\\', verbose=True)
    • second

      similarity sentences

      text_a = '技术侦查措施只能在立案后采取'
      text_b = '未立案不可以进行技术侦查'
      bs.predict([[text_a, text_b]])

      you will get result like this: [[0.00942544 0.99057454]]

      not similarity sentence

      text_a = '华为还准备起诉美国政府'
      text_b = '飞机出现后货舱火警信息'
      bs.predict([[text_a, text_b]])

      you will get result like this: [[0.98687243 0.01312758]]

Parameter

name type detail
gpu_no int which gpu will be use to init bert ner graph
log_dir str log dir
verbose bool whether show tensorflow log
bert_sim_model str bert sim model path

Train

Code

In this project, I just use bert pre model to fine tuning, so I just use their original code. I try to create new one, but the new one just same as the original code, so I given up.

Dataset

Because of my domain work, my work is based on judicial examination education, so I didn't use common dataset, my dataset were labeled by manual work, it include 80000+, 50000+ are similar, 30000+ are dissimilar, because of the privacy, I can't open source of this dataset

Suggest:

In original code, they just got the model pool output, I think there may be other ways to increase the accuracy, I tried some ways to increase the accuracy, but I found one, just concat the [CLS] embedding of the fourth from bottom to tailender in encoder output list, if you want to use my way, just do like this。

  • Delete the following code
output_layer = model.get_pooled_output()
  • Use the following code, it can increase the accuracy 1%.
output_layer = tf.concat([tf.squeeze(model.all_encoder_layers[i][:, 0:1, :], axis=1) for i in range(-4, 0, 1)], axis=-1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.