Giter Club home page Giter Club logo

licensor's Issues

Add additional non-official training examples

Many projects use an abbreviated form of certain licenses (specifically Apache 2.0). Add additional (non-SPDX?) examples for training/classification.

The goal is to classify licenses correctly, not enforce the license body perfectionism.

Move spdx.License --> Jen method

It is nice to have spdx.License.JenValue() for locality reasons, but it's ugly to have in the public API. Consider moving this helper somewhere, as the spdx.License struct won't change very often.

Improve speed of word set generation

Using regex to split on whitespace is slow. Write benchmarks for current implementation and then optimize. Possible optimizations:

  • Build set as map[[]byte]struct{} instead of map[string]struct{}
  • Build set concurrently with multiple indices scanning text.
  • (For SPDX data) Create index and persist to source at go generate time

Move generated spdx index to a submodule

Raw data currently pollutes the SPDX namespace, specifically, with auto-generated package comments. Move it into an internal sub-package.

license
 `- spdx
     `- internal
         `- index
             |- mit.gen.go
             |- apache.gen.go
             `- bsd.gen.go

Implement a LICENSE file finder

Other libraries do this. Given the root directory of some project, look for files that look like LICENSE, LICENSE.txt, COPYING.txt, etc.

Split generated spdx index into multiple files

Current index file is sufficiently large that Github refuses to display it (not counting raw mode). Split this file into multiple files for clarity. May also speed up incremental compilation? Benchmark the resulting increase in init() functions.

Implement a "good enough" matcher

For example; If you only consider matches with > 0.9 confidence as "acceptable", you probably don't need to compare additional licenses after finding the first that meets the confidence bar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.