iotile / foss_finder Goto Github PK
View Code? Open in Web Editor NEWFinds Open Source dependencies for a GitHub Organization and/or Project
License: MIT License
Finds Open Source dependencies for a GitHub Organization and/or Project
License: MIT License
It should be possible to have a global configuration file for user defined information because some fields of .foss.json would be the same in every repositories.
Of course, a local configuration should have a higher priority than a global configuration (if there is a conflict).
Which version of a given package do we need to add to the database / csv?
Currently, by running the script on iotile-mobile-ionic
, I get the following output:
----> ionic-angular: =1.3.5
https://registry.npmjs.org/ionic-angular/latest/
----> {'registry': 'NPM', 'package': 'ionic-angular', 'license': 'MIT', 'version': '3.9.2', 'url': 'https://github.com/ionic-team/ionic#readme'}
So, the version in the output is 3.9.2 (latest) even though the version that is used in the repository must match 1.3.5 exactly.
Do we want the latest version of a package or the version that is actually used in the repo?
We are approaching a good enough state for open source. This is to remind me to change from private to public
It currently looks for requirements.txt
, base.txt
, development.txt
, docker.txt
, production.txt
. It should be improved so that it can look for more Python requirements files such as requirements-build.txt
, requirements-test.txt
, etc.
We do not want to generate all the output files under the current directory, specially as this is a version controlled directory, so I recommend we change:
out/<depo>.csv
You can do both things with the new argument defaulting to out
:
parser.add_argument('-o', '--outdir', dest='outdir', type=str, default='out' help='Output directory')
Very useful for us now (instead of hard coding a if depo == archway
), but will also be useful to users.
Add an optional --project
that will force the script to ONLY process that project within the Org:
foss_finder --project archway iotile
The current recursive approach is naive so the script can take some time to find all the deep dependencies, we should use memoization to optimize the recursive lookup.
See if lru_cache
can be used.
DeprecationWarning: Call to deprecated method get_dir_contents. (
Repository.get_dir_contents() is deprecated, use
Repository.get_contents() instead.
)
files = repo.get_dir_contents(path)
It should be possible to add checks that raise exceptions if certain conditions are met.
For instance, we could have a check that raises an exception if a project has a GPL-licensed dependency. Or, we could have a check that raises an exception if a project has a multi-licensed dependency but no license was chosen.
It should be possible to choose which checks we want to use from the configuration file.
The script currently looks for NPM deep dependencies but this is not the case for Python. However, it should be possible. The script currently uses the JSON endpoint of PyPi, which provides this information.
>>> import requests
>>> json = requests.get('https://pypi.org/pypi/js2py/json').json()
>>> json['info']['requires_dist']
['tzlocal (>=1.2)', 'six (>=1.10)', 'pyjsparser (>=2.5.1)']
So, it should be possible to provide a depth
argument (would make sense if it was the same one as the NPM one) and look for dependencies as deep as this argument suggests.
For the moment, the script doesn't support scoped NPM packages (e.g. @angular/common) since it doesn't use the right URL to get the information. It should be improved so that these packages are also saved in the .csv output.
For instance, the correct URL for @angular/common is https://registry.npmjs.org/@angular%2Fcommon
Unfortunately, adding /latest/ (or %2Flatest) at the end of the URL doesn't seem to work.
The current behavior of the script is: if <some_repo_name>.csv already exists, then only add the rows that are different from the others. This is wrong, because if package Foo changed from version 2 to version 3, then the CSV will have one row for each version, although Foo version 2 is no longer used.
The expected behavior would be to simply delete the old CSV file and write a new one with the current dependencies.
I think there should be a way to handle deep NPM dependencies (just like fossa does), for instance with a new optional parameter --depth=2
.
If there are two packages with the same name (from different registries, for instance NPM and PyPi), there is currently no way to differentiate them in the user-defined info file. We should add a way to add the 'registry' field, so that we can have both packages in the .foss.json.
For instance if we add:
{
"additional-info": [
{
"package": "Foo",
"registry": "PyPi",
"extra information": "This is a Python package",
"owner": "John Doe"
}
]
}
It won't affect a package named 'Foo' in the NPM registry (which is the current behavior).
We may want to release this via PIP and have it come with a convinient foo_finder_cli such that people can just do
pip install foss_finder
foo_finder_cli -h
As soon as PR is merged, lets make sure the code does not have any Arch specific code.
Lets also add a basic readme with basic instructions for how to use the script
When tracking FOSS, we need to:
1.- Indicate which license we are using when the project is multi-license
2.- Be able to explain exceptions.
3.- Be able to indicate that a given project is only used internally.
Foss finder should look for a .foss
file on the analyzing project, and if found, parse and use to overwrite or append information related to the open source dependencies.
I am open for suggestions for what file format to use, but .json is probably one good option (so file should probably be called .foss.json
. The file content count then be something like:
{
"overwrites": [
{
"project": "foo",
"version": "2.1",
"multiuse-license": "MIT"
},
{
"project": "bar",
"version": "2.1",
"license": "GPL-2.0",
"linkage": "Dynamic, we use it as an imported python module",
"justification": "This is a 3rd party python package that is dynamically linked against in one of our python products. We do not modify the 'chardet' source code and since we dynamically link against it, it does not infect the rest of our product."
},
{
"project": "foobar",
"version": "2.1",
"usage": "internal"
}
...
]
}
in the first example, we are simply specifying that for foo
who has multiple licenses, we are using "MIT".
For the second, we are documenting a justification for why it is ok to use a GPL license.
For the third, I am just documenting that the project is only used for internal use
If "foo" or "bar" were not found, then foss-finder should report an error.
But we could also use this to add licenses that were not found by foss-finder but we know we are using. For that, we could also add an additional
{
"overwrites": [],
"additions": [
{
"project": "foobar",
.... <whatever fields are needed>
}
]
At the end of the script, we should print (logger.info) a set of messages as summary of the work done:
Total number of GitHub projects processed: ???
Total number of open source projects found: ???
and anything else you may think is important and is easy to keep track of.
Will let you do this anyway you want, but one way is to add a new FossTracker python class that can be used to track all these stats globally (instead of a bunch of global variables and global functions). You will need to refactor some, but may help you organize the top level script. You want the class to be able to do:
tracker = FossTracker()
tracker.add_project(`archway`)
tracker.add_foss(`django`, ...)
tracker.write_project_csv()
tracker.report_project_summary('archway')
tracker.report_total_summary()
or something like that
I'm interested in using this tool, but our code is hosted on GitLab rather than GitHub.
My thought is that if foss-finder can scan a local project it'll work independent of where the code is hosted.
Nifty tool!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.