Giter Club home page Giter Club logo

scarfer's Introduction

scarfer

Source code scan report file reporter

Introduction

Scarfer outputs compliance related information from a scan report.

A scan report contain lots of information, for example Scancode has 37 entries on the top level for each file, about a file and it is sometimes cumbersome to open with an editor to extract the information wanted. Scarfer provides a quick command line access to scan reports.

Features

Scarfer can output the following information per file:

  • copyright (using -c)

  • license (using -l)

  • text that caused the license detection (-m)

Scarfer can output the following summaries

  • license summary (using -ls)

  • copyright summary (using -cs)

Filter

Scarfer can filter files:

  • include files with:

    • license name (-il) using Python's regular expressions

    • files (-if) using Python's regular expressions

    • files (-iff) by reading a file, containing file names, using Python's regular expressions

    • copyright (-ec) using Python's regular expressions

  • exclude files with:

    • license name (-el) using Python's regular expressions

    • files (-ef) using Python's regular expressions

    • files (-eff) by reading a file, containing file names, using Python's regular expressions

    • copyright (-ec) using Python's regular expressions

Note: if you're using more than one filter then filters are AND:ed together

Curate

Scarfer can curate (fix, amend) license identifications:

  • curate license (-cml) for all files with missing license

  • curate license (-cfl) for all files matching Python's regular expressions

Configuration file

Scarfer can write and read configuration files:

  • output current (-oc) command line options to a configuration output

  • read configuration file (--config)

Example use

Output the file names (full path) of all the files in the Scancode report example-data/cairo-1.16.0-scan.json:

$ scarfer example-data/cairo-1.16.0-scan.json 

As above but output only files with path matching drm:

$ scarfer example-data/cairo-1.16.0-scan.json -if drm

Output the file names (full path) of all the files in the Scancode report example-data/cairo-1.16.0-scan.json with a license matching gpl-3:

$ scarfer example-data/cairo-1.16.0-scan.json -il gpl-3

Output the file names (full path) of all the files in the Scancode report example-data/cairo-1.16.0-scan.json with a license matching mpl and files with path matching drm. The output should also contain information (per file) about license and copyright:

$ scarfer example-data/cairo-1.16.0-scan.json -il mpl -if drm -c -l 

To filter in all files containing "/*pdi" and ending with ".c":

$ scarfer example-data/cairo-1.16.0-scan.json -if "/.*pdi.*\.c$"

To filter out all files containing "/*pdi" and ending with ".c":

$ scarfer example-data/cairo-1.16.0-scan.json -ef "/.*pdi.*\.c$"

Supported scan report formats

  • Scancode Toolkit, version 21 and upwards

  • Scancode Output Format version 1.0.0, 2.0.0

Hints on source code scanners

Scancode 32.0*

Assuming you want to scan a directory called cairo and store the output in cairo-scan.json:

scancode -clipe \
  --license-text   --license-text-diagnostics        \
  --classify       --license-clarity-score --summary \
  -n $(cat /proc/cpuinfo | grep processor | wc -l)   \
  --json-pp cairo-scan.json cairo

scarfer's People

Contributors

hesa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

scarfer's Issues

Usual excludes

Feature request

Add support for:

  • files containing list of files and folders that can be excluded when calculating the license (e.g. README, LICENSE, docs/)
  • add option to use the file to exclude, e.g. --common-excludes, -ce

Is your feature request related to a problem? Please describe.

When determining the license for a project one usually adds exclude expressions such as -ef COPYING LICENSE README tests. Would be nice to provide a list of common such expressions to ease up use.

Make reuse compliant

Description

Code lacks (c) and license. Add this and use reuse (tool) to verify correctness.

Steps for reproduction

reuse lint

Expected behavior

exit code 0

Actual behavior

Environment

all

add additional tools you used and their versions here

Additional logs

Add version option

Feature request

Add --version (-V) to output version information

Is your feature request related to a problem? Please describe.

Useful for obvious reasons

Describe a tool that might help here

Example data that can be used for tests

List filter and curation rule with: output fixes

Feature request

When listing filtered and curated files, using --output-fixes it would be nice to see what expression matched those.

Example data that can be used for tests

Something like:

# File filters

## README* COPYING*

./README.md
./COPYING


## tests/

./tests/donkey.c
./tests/kong.c

# License curations

## Missing license

./src/monkey.c

## MPL-2.0

./src/dong.c



Add scan instructions for Scancode

Feature request

To get the most out of scancode (and scarfer) some command line options are needed. Specify these in some kind of doc.

Add filter to include files matching a (c) expression

Feature request

Add filter to include/exclude files matching a (c) expression.

This would be similar to:

  • -if src/runtime - that includes only files with name matching src/runtime
  • -il mit - that includes only files with license matching mit

Proposed syntax:

  • -ic EXPRESSION, --include-copyright EXPRESSION
  • -ec EXPRESSION, --exclude-copyright EXPRESSION

exclude file from config output

Description

Do not output the variable "file" from the configuration output sine it interfers with command line option when using a config file for a new version of a software

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.