Giter Club home page Giter Club logo

licensor's Introduction

License GoDoc Go Report Card CircleCI

Licensor

๐Ÿ“ Detect what license a project is distributed under

Installing

You can fetch this library by running the following

go get -u github.com/joshdk/licensor

Usage

import (
	"fmt"
	"github.com/joshdk/licensor"
)

// Example content from https://github.com/golang/go/blob/master/LICENSE
const unknown = `
	Copyright (c) 2009 The Go Authors. All rights reserved.
	Redistribution and use in source and binary forms, with or without
	...
	(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
	OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
`

// Find license that is the closest match 
match := licensor.Best([]byte(unknown))

fmt.Printf("License name:     %s\n",   match.License.Name)
fmt.Printf("SPDX identifier:  %s\n",   match.License.Identifier)
fmt.Printf("Match confidence: %.2f\n", match.Confidence)
// License name:     BSD 3-clause "New" or "Revised" License
// SPDX identifier:  BSD-3-Clause
// Match confidence: 0.96

License

This library is distributed under the MIT License, see LICENSE.txt for more information.

licensor's People

Contributors

joshdk avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

pombredanne

licensor's Issues

Implement a LICENSE file finder

Other libraries do this. Given the root directory of some project, look for files that look like LICENSE, LICENSE.txt, COPYING.txt, etc.

Improve speed of word set generation

Using regex to split on whitespace is slow. Write benchmarks for current implementation and then optimize. Possible optimizations:

  • Build set as map[[]byte]struct{} instead of map[string]struct{}
  • Build set concurrently with multiple indices scanning text.
  • (For SPDX data) Create index and persist to source at go generate time

Split generated spdx index into multiple files

Current index file is sufficiently large that Github refuses to display it (not counting raw mode). Split this file into multiple files for clarity. May also speed up incremental compilation? Benchmark the resulting increase in init() functions.

Add additional non-official training examples

Many projects use an abbreviated form of certain licenses (specifically Apache 2.0). Add additional (non-SPDX?) examples for training/classification.

The goal is to classify licenses correctly, not enforce the license body perfectionism.

Move generated spdx index to a submodule

Raw data currently pollutes the SPDX namespace, specifically, with auto-generated package comments. Move it into an internal sub-package.

license
 `- spdx
     `- internal
         `- index
             |- mit.gen.go
             |- apache.gen.go
             `- bsd.gen.go

Move spdx.License --> Jen method

It is nice to have spdx.License.JenValue() for locality reasons, but it's ugly to have in the public API. Consider moving this helper somewhere, as the spdx.License struct won't change very often.

Implement a "good enough" matcher

For example; If you only consider matches with > 0.9 confidence as "acceptable", you probably don't need to compare additional licenses after finding the first that meets the confidence bar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.