Giter Club home page Giter Club logo

go-flashtext's Introduction

go-flashtext

This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm.

Compared with standard FlashText algorithm, there are some differences which make go-flashtext more powerful:

  • Chinese is support fully. Python implement supports Chinese not well.
  • We break nonWordBoundaries in FlashText algorithm to make it more powerful, which means that keyword could contains char not in [_0-9a-zA-Z].
  • We allow the same keyword with different cleanNames exists, which means keywords are not unique. We found this is very useful in Industry envs.

Installation

To install GoFlashText package, you need to install Go and set your Go workspace first.

  1. The first need Go installed, then you can use the below Go command to install GoFlashText.
$ go get -u github.com/waltsmith88/go-flashtext
  1. Import it in your code:
imoprt gf "github.com/waltsmith88/go-flashtext"

Usage

  • Extract keywords
package main

import (
	"fmt"
	gf "github.com/waltsmith88/go-flashtext"
)

func main() {
	// add keywords from Map
	keywordMap := map[string]string{
		"love": "love",
		"hello": "hello",
	}
	keywordProcessor := gf.NewKeywordProcessor()
	keywordProcessor.AddKeywordsFromMap(keywordMap)
	foundList := keywordProcessor.ExtractKeywords("I love coding.")
	fmt.Println(foundList)
}
// [love]
  • Extract keywords With Chinese Support
package main

import (
	"fmt"
    	gf "github.com/waltsmith88/go-flashtext"
)

func main() {
	// add keywords from Map
	keywordMap := map[string]string{
		"love": "love",
		"**": "中文",
	}
	keywordProcessor := gf.NewKeywordProcessor()
	keywordProcessor.AddKeywordsFromMap(keywordMap)
	keywordProcessor.AddKeyword("love", "ove")
	foundList := keywordProcessor.ExtractKeywords("I Love **.")
	fmt.Println(foundList)
}
// [中文]
  • Case Sensitive example
package main

import (
	"fmt"
    	gf "github.com/waltsmith88/go-flashtext"
)

func main() {
	// add keywords from Map
	keywordMap := map[string]string{
		"love": "love",
		"**": "中文",
	}
	keywordProcessor := gf.NewKeywordProcessor()
	keywordProcessor.SetCaseSensitive(false)
	keywordProcessor.AddKeywordsFromMap(keywordMap)
	keywordProcessor.AddKeyword("love", "ove")
	foundList := keywordProcessor.ExtractKeywords("I Love **.")
	fmt.Println(foundList)
}
// [love|ove 中文]
  • Unique Keywords example
func main() {
	// add keywords from Map
	keywordMap := map[string]string{
		"love": "love",
		"**": "中文",
	}
	keywordProcessor := gf.NewKeywordProcessor()
	keywordProcessor.SetUniqueKeyword(true)
	keywordProcessor.SetCaseSensitive(false)
	keywordProcessor.AddKeywordsFromMap(keywordMap)
	keywordProcessor.AddKeyword("love", "ove")
	foundList := keywordProcessor.ExtractKeywords("I Love **.")
	fmt.Println(foundList)
}
// [ove 中文]
  • Span of keywords extracted
func main() {
	// add keywords from Map
	keywordMap := map[string]string{
		"love": "love",
		"**": "中文",
	}
	keywordProcessor := gf.NewKeywordProcessor()
	keywordProcessor.AddKeywordsFromMap(keywordMap)
	sentence := "I love **."
	cleanNameRes := keywordProcessor.ExtractKeywordsWithSpanInfo(sentence)
	sentence1 := []rune(sentence)
	for _, resSpan := range cleanNameRes {
		fmt.Println(resSpan.CleanName, resSpan.StartPos, resSpan.EndPos, fmt.Sprintf("%c", sentence1[resSpan.StartPos:resSpan.EndPos]))
	}
}
// love 2 6 [l o v e]
// 中文 7 9 [中 国]
  • Add Multiple Keywords simultaneously
// way 1: from Map
keywordMap := map[string]string{
		"abcd": "abcd",
		"student": "stu",
	}
keywordProcessor.AddKeywordsFromMap(keywordMap)
// way 2: from Slice
keywordProcessor.AddKeywordsFromList([]string{"student", "abcd", "abc", "中文"})
// way 3: from file. Line: keyword => cleanName
keywordProcessor.AddKeywordsFromFile(filePath)
  • To Remove keywords
keywordProcessor.RemoveKeyword("abc")
keywordProcessor.RemoveKeywordFromList([]string{"student", "abcd", "abc", "中文"})
  • To Replace keywords
newSentence := keywordProcessor.ReplaceKeywords(sourceSentence)
  • To check Number of terms in KeywordProcessor
keywordProcessor.Len()
  • To check if term is present in KeywordProcessor
keywordProcessor.IsContains("abc")
  • Get all keywords in dictionary
keywordProcessor.GetAllKeywords()

More Examples about Usage in go-flashtext/examples/examples.go and you could have a taste by using following command:

$ go run examples/examples.go

Test

$ git clone github.com/waltsmith88/go-flashtext
$ cd go-flashtext
$ go test -v

Why not Regex?

It's a custom algorithm based on Aho-Corasick algorithm and Trie Dictionary.

Benchmark

Time taken by FlashText to find terms in comparison to Regex.

https://thepracticaldev.s3.amazonaws.com/i/xruf50n6z1r37ti8rd89.png

Time taken by FlashText to replace terms in comparison to Regex.

https://thepracticaldev.s3.amazonaws.com/i/k44ghwp8o712dm58debj.png

Link to code for benchmarking the Find Feature and Replace Feature.

The idea for this library came from the following StackOverflow question.

Citation

The original paper published on FlashText algorithm.

@ARTICLE{2017arXiv171100046S,
   author = {{Singh}, V.},
    title = "{Replace or Retrieve Keywords In Documents at Scale}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1711.00046},
 primaryClass = "cs.DS",
 keywords = {Computer Science - Data Structures and Algorithms},
     year = 2017,
    month = oct,
   adsurl = {http://adsabs.harvard.edu/abs/2017arXiv171100046S},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

The article published on Medium freeCodeCamp.

Contribute

License

The project is licensed under the MIT license.

go-flashtext's People

Contributors

waltsmith88 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.