iotile / foss_finder Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 1.0 58 KB

Finds Open Source dependencies for a GitHub Organization and/or Project

License: MIT License

Python 100.00%

foss python

foss_finder's People

Contributors

Stargazers

Watchers

Forkers

pombredanne

foss_finder's Issues

Feature: Global User Defined Information

It should be possible to have a global configuration file for user defined information because some fields of .foss.json would be the same in every repositories.

Of course, a local configuration should have a higher priority than a global configuration (if there is a conflict).

Which version do we need?

Which version of a given package do we need to add to the database / csv?

Currently, by running the script on iotile-mobile-ionic, I get the following output:

----> ionic-angular: =1.3.5
https://registry.npmjs.org/ionic-angular/latest/
----> {'registry': 'NPM', 'package': 'ionic-angular', 'license': 'MIT', 'version': '3.9.2', 'url': 'https://github.com/ionic-team/ionic#readme'}

So, the version in the output is 3.9.2 (latest) even though the version that is used in the repository must match 1.3.5 exactly.

Do we want the latest version of a package or the version that is actually used in the repo?

Open source script

We are approaching a good enough state for open source. This is to remind me to change from private to public

Improve the way foss_finder looks for Python requirements files

It currently looks for requirements.txt, base.txt, development.txt, docker.txt, production.txt. It should be improved so that it can look for more Python requirements files such as requirements-build.txt, requirements-test.txt, etc.

Add optional -o/--outdir argument to control output file name

We do not want to generate all the output files under the current directory, specially as this is a version controlled directory, so I recommend we change:

By default, the CSV should be generated under out/<depo>.csv
Add a -o/--outdir argument to control the destination directory.

You can do both things with the new argument defaulting to out:

parser.add_argument('-o', '--outdir', dest='outdir', type=str, default='out' help='Output directory')

Add option to process a specific depo

Very useful for us now (instead of hard coding a if depo == archway), but will also be useful to users.

Add an optional --project that will force the script to ONLY process that project within the Org:

foss_finder --project archway iotile

Optimize lookup for deep dependencies

The current recursive approach is naive so the script can take some time to find all the deep dependencies, we should use memoization to optimize the recursive lookup.

See if lru_cache can be used.

Need to fix deprecation warnings

DeprecationWarning: Call to deprecated method get_dir_contents. (
        Repository.get_dir_contents() is deprecated, use
        Repository.get_contents() instead.
        )
  files = repo.get_dir_contents(path)

Feature: checks/validators raising exceptions

It should be possible to add checks that raise exceptions if certain conditions are met.

For instance, we could have a check that raises an exception if a project has a GPL-licensed dependency. Or, we could have a check that raises an exception if a project has a multi-licensed dependency but no license was chosen.

It should be possible to choose which checks we want to use from the configuration file.

Feature: Python deep dependencies

The script currently looks for NPM deep dependencies but this is not the case for Python. However, it should be possible. The script currently uses the JSON endpoint of PyPi, which provides this information.

>>> import requests
>>> json = requests.get('https://pypi.org/pypi/js2py/json').json()
>>> json['info']['requires_dist']
['tzlocal (>=1.2)', 'six (>=1.10)', 'pyjsparser (>=2.5.1)']

So, it should be possible to provide a depth argument (would make sense if it was the same one as the NPM one) and look for dependencies as deep as this argument suggests.

Handle scoped NPM packages

For the moment, the script doesn't support scoped NPM packages (e.g. @angular/common) since it doesn't use the right URL to get the information. It should be improved so that these packages are also saved in the .csv output.
For instance, the correct URL for @angular/common is https://registry.npmjs.org/@angular%2Fcommon
Unfortunately, adding /latest/ (or %2Flatest) at the end of the URL doesn't seem to work.

Overwrite CSV files when they already exist

The current behavior of the script is: if <some_repo_name>.csv already exists, then only add the rows that are different from the others. This is wrong, because if package Foo changed from version 2 to version 3, then the CSV will have one row for each version, although Foo version 2 is no longer used.

The expected behavior would be to simply delete the old CSV file and write a new one with the current dependencies.

Handle deep NPM dependencies

I think there should be a way to handle deep NPM dependencies (just like fossa does), for instance with a new optional parameter --depth=2.

[User-defined info] Differentiate two packages with 'registry' field

If there are two packages with the same name (from different registries, for instance NPM and PyPi), there is currently no way to differentiate them in the user-defined info file. We should add a way to add the 'registry' field, so that we can have both packages in the .foss.json.

For instance if we add:

{
  "additional-info": [
    {
      "package": "Foo",
      "registry": "PyPi",
      "extra information": "This is a Python package",
      "owner": "John Doe"
    }
  ]
}

It won't affect a package named 'Foo' in the NPM registry (which is the current behavior).

Consider releasing to PyPi and add command-line client

We may want to release this via PIP and have it come with a convinient foo_finder_cli such that people can just do

pip install foss_finder
foo_finder_cli -h

Cleanup and add readme

As soon as PR is merged, lets make sure the code does not have any Arch specific code.

Lets also add a basic readme with basic instructions for how to use the script

Add support for user defined overwrites

When tracking FOSS, we need to:

1.- Indicate which license we are using when the project is multi-license
2.- Be able to explain exceptions.
3.- Be able to indicate that a given project is only used internally.

Foss finder should look for a .foss file on the analyzing project, and if found, parse and use to overwrite or append information related to the open source dependencies.

I am open for suggestions for what file format to use, but .json is probably one good option (so file should probably be called .foss.json. The file content count then be something like:

{
    "overwrites": [   
       {
          "project": "foo",
          "version": "2.1",
          "multiuse-license": "MIT"   
       },
       {
          "project": "bar",
          "version": "2.1",
          "license": "GPL-2.0",
          "linkage": "Dynamic, we use it as an imported python module",
          "justification": "This is a 3rd party python package that is dynamically linked against in one of our python products.  We do not modify the 'chardet' source code and since we dynamically link against it, it does not infect the rest of our product."   
       },
       {
          "project": "foobar",
          "version": "2.1",
          "usage": "internal"
       }
       ...
    ]
}

in the first example, we are simply specifying that for foo who has multiple licenses, we are using "MIT".

For the second, we are documenting a justification for why it is ok to use a GPL license.

For the third, I am just documenting that the project is only used for internal use

If "foo" or "bar" were not found, then foss-finder should report an error.

But we could also use this to add licenses that were not found by foss-finder but we know we are using. For that, we could also add an additional

{
    "overwrites": [],
    "additions": [
          {
               "project": "foobar",
               .... <whatever fields are needed>
          }
   ]

Print out summary at the end

At the end of the script, we should print (logger.info) a set of messages as summary of the work done:

Total number of GitHub projects processed: ???
Total number of open source projects found: ???

and anything else you may think is important and is easy to keep track of.

Will let you do this anyway you want, but one way is to add a new FossTracker python class that can be used to track all these stats globally (instead of a bunch of global variables and global functions). You will need to refactor some, but may help you organize the top level script. You want the class to be able to do:

tracker = FossTracker()
tracker.add_project(`archway`)
tracker.add_foss(`django`, ...)
tracker.write_project_csv()
tracker.report_project_summary('archway')
tracker.report_total_summary()

or something like that

Scanning a local project

I'm interested in using this tool, but our code is hosted on GitLab rather than GitHub.

My thought is that if foss-finder can scan a local project it'll work independent of where the code is hosted.

Nifty tool!