Giter Club home page Giter Club logo

service's Introduction

ClearlyDefined, defined.

This repo holds the docs, artwork, and other organizational content in support of ClearlyDefined.

Contributing

This project welcomes contributions and suggestions, and we've documented the details in how to get involved.

The Code of Conduct for this project details how the community interacts in an inclusive and respectful manner. Please keep it in mind as you engage here.

Website

This website is built using Docusaurus, a modern static website generator.

Installation

$ yarn

Local Development

$ yarn start

This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

Build

$ yarn build

This command generates static content into the build directory and can be served using any static contents hosting service.

Deployment

Using SSH:

$ USE_SSH=true yarn deploy

Not using SSH:

$ GIT_USER=<Your GitHub username> yarn deploy

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the gh-pages branch.

service's People

Contributors

adrian-sufaru avatar dabutvin avatar daniellandau avatar dependabot[bot] avatar disulliv avatar fredrick-tam avatar geneh avatar grvillic avatar iamwillbar avatar ignacionr avatar jamiemagee avatar jeffmcaffer avatar jeffmendoza avatar jeffwilcox avatar jpeddicord avatar jsoref avatar larainema avatar michaeltsenglz avatar mkeating avatar moranthomas avatar mpcen avatar nellshamrell avatar pnibakuze-ms avatar qtomlinson avatar sebastianwolf-sap avatar snyk-bot avatar storrisi avatar teju-manchenella avatar thomashintz avatar tmarble avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

service's Issues

Separate license info in curation and summarization

Create a distinct place in the curation/summary for "licensed" info as separate from "described" data. The described data is info about the project itself, cla?, code of conduct? file groupings (e.g., tests, biuld tools, ...) that can be used for filtering

Add schema validate to curations

Validate schema

  • of proposed curation when a PR is opened/modified (e.g,. the status check)
  • of the resultant summary after applying the proposal (also in the status check)
  • in the web ui (separate issue for that)

write aggregator

The aggregator takes summarized information from multiple sources and aggregates it according to a precedence model. For now the precedence can be really simple, this is more about the plumbing.

Build updated architectural diagrams

The architectural design of ClearlyDefined has progressed significantly from when we were first thinking about this project. We want to make sure we have updated architectural diagrams to present to potential adopters, code contributors, and for including in presentations.

@jeffmcaffer @iamwillbar

Rename everything

We need to rename everything in the code and get the code hygiene up now that we've moved along conceptually and in development.

Add links to definition that point to source, tools, ...

Based on the _metadata links coming out of the crawler, add some links to component definition as it is being summarized and curated. Note, links likely should not be curated. Examples are:

  • source
  • tool results
  • curation

The "links" here are not necessarily clickable (though they might be rendered that way). Rather they are a way to convey relationships. So, for example, someone looking to see what tools have been run for a component (or indeed what tools' output went into a definition) should be able to inspect the links and see what's gone on.

Need to think a little about the format of the data. Likely need

  • name
  • type
  • data

as a minimum.

implement "no auth" path for offline

the new GitHub auth story is slick but requires you to be online for much of the system to work. For example, you can't call the api or use the website effectively without being online. Even if online, if your net connection is slow (e.g, on an airplane over the atlantic) it can be unpredictable.

Propose that for localhost setups we allow "no auth" (perhaps the default?). That would trickle down into the team etc that we'll use for permissions. Those operations that truly need auth will necessarily have to be online so we should be good.

Add origins/maven endpoint

To support the user queuing Maven artifacts in the UI, the service needs to expose two origins/maven endpoints, one that aids in discovering components (group/artifact) and another that serves up the versions of a given group/artifact.

Depending on the Maven API, this may be a lot like the GitHub one where the org (i.e., group) search is served up by one GitHub API and the repo (i.e., artifact) search is served up by another.

setup harvesting execution

Setup OSS review toolkit to

  • run on some repo(s)
  • take the output and call the service API to store as harvested data
  • Kick off a PR?

Add webhook in service to track crawler changes

As the crawler produces output it will trigger webhooks (see clearlydefined/crawler#44). On the service we need a webhook implementation that takes the POST'd event and "does the right thing".

In the first instance the action will be to recompute the cached result for the newly scanned component. So, the webhook will say "wrote /npm/npmjs/foo/1.0.0/scancode" or some such. The webhook will call ComponentService.computeAndStore for the triggering NPM (in this case). That will recompute the fundamental result and store it for future retrieval.

Note: When this is done, we need to remove the call to computeAndStore in the get method of the ComponentService

sort out source `path` story

some source locations need a path within a repo. In theory we talked about how this would be described but that has fallen by the wayside in the implementation and pathing/specing floating around.

Need to figure out the right data model but this should be driven from real scenarios.

story for curating arrays

Need a way to express changes to arrays of values like copyright holders. In particular,

  • addition
  • removal
  • update (if not remove and update)

Tests for the service

define the testing approach for different elements of the service

  • REST apis
  • plain function code
  • configuration
  • providers

Write status check to replace build

We need a richer experience for curators than we can get with a CI build log etc. Write a simple status check in the service that validates the PR and leaves a link that takes the user to a place on clearlydefined.io where they can see content.

We can incrementally make that content better.

Consider getting "source" data when getting definition data

Right now summarization/curation happens for the precise entity that has been requested. E.g. just the npm or just the github repo. In the case of a package where we know (and have processed) the source, we should allow for the source's data to also be included in the package summary.

This is one of the value points of ClearlyDefined. We build up a network of information about entities and can aggregate that info to be more informative.

In implementation terms this means altering the definitions route execution to

  • consider a param that indicates the mode (or some such) as to whether or not to include the source data
  • optionally overlay the package and source data in some order TBD

This may also benefit from allowing the definition to qualify its source location info with a set of filters(dimensions) that would then be applied to the summarization of the source. This way, the package can express what parts of the source it includes. This is akin to the path element of the sourceLocation information.

Summarize API should be able to produce valid SPDX files

Input:

  • Optional param in the summarize API request to request for SPDX format.

Output:

  • Valid SPDX file with minimal datapoint (Source URL, License, Copyright plus minimal set of required fields to be a valid SPDX file.

Side-notes:

  • Thomas is working on minimizing the set of required fields

Write a ClearlyDefined cache for review toolkit

When the review toolkit downloader runs it consults a cache to see if the desired component has already been scanned. Right now that works against Artifactory. Need one that looks at the ClearlyDefined GitHub harvested data repo

Permissions for who can queue a harvest request

Like the curator model, there should be protection on who can queue a harvest request. This can be managed by membership in a GitHub team or some such and require GH auth when the request is made

Create Press Release mockup

as a means of scoping the work for the MVP, write a fake Press release that announces the project, extols its virtues and has a call to action to both consume and contribute

implement harvest PUT and GET

Need to move away from GitHub as the harvest store. There are too many limitations and it is not really needed as long as we have a way of browsing and exposing the harvested data to the community

API key and rate limiting design

Need something simple.

Consider CloudFlare as we are already using their infrastructure to front for *clearlydefined.io. @jpeddicord suggests using one of the express rate limiter middlewares. That might be easier to manage at least in the first instance.

ClearlyDefined badges for projects

  • Similar to Shields.io - have an end point that the URL contains your project details and serves up a project image/svg you can put in your markdown.
  • e.g. clearlydefined.io/badge/npm…
  • Badge works out if you’ve met the criteria and serve up appropriate color badge (need to sort out what is the criteria for the 3 levels)

API for github/npm search

To support the UI we need API roughly equivalent to (though not as fully featured as) the search apis present on github.com and npms.io. We can turn around and call those if needed.

Here the user is looking for a package to queue. We want them to input good data so need auto-completion etc. This is true for GitHub orgs, repos and commits, and NPM scopes, names, and versions. Of course, there will be more package managers in the future.

We should also consider (though not necessarily implement right now) that the response will include ClearlyDefined info such as "yup, we've already run these tools on that version" that would further help guide the user.

Summarized data structure

Define (and implement) the Summarized data format. Consider building on the structure that Philippe talked about yesterday in MineCode.

Add batching GET operations

For various user and automation scenarios we need endpoints that can take a list of components and give back the related data (e.g., harvest results, component definitions, ...)

Clean up copyrights in summarized data

The data output from Scancode, particularly around copyrights is pretty noisy. Random characters, duplicates that differ only by case, puncutation, Inc. vs Inc vs Incorporated, ...

This degrades the apparent quality of the data for the end user so is key to showing value and building trust.

We can implement this in several different places. Ideally the upstream tools would just be less noisy but ... Current thinking is to put this in the summarizers. There is a summarizer for each tool so each would do the work as it goes from the raw data form to the summarized/canonical form. This might be a shared bit of code that does all the simplifications but rather than introduce a new step in the flow, consider adding it to summarization.

Add the ability to filter files from summarization and have that list curatable

On looking at some initial data for jquery it became apparent that it was very noisy due to things like d.ts files being included and not very well understood by the scanner.

We can still have the harvesters harvest data about these files but when we summarize, we need to be able to filter these out.

More over, this needs to be curatable so that people can define the filter and then see the outcome, then save that and have it apply to the final output.

Make merged metadata available to curators

Curators looking at a status check should be able to easily see the effect of the changes (e.g., merged data) as well as have handy pointers/links to available scans and the original source repo.

Authentication for write operations

API operations that make or propose modifications to data should require authentication. We need to decide what authentication makes sense.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.