Comments (13)
Avoiding the rebase would be good - It gets unwieldy especially if lots of additions come in at the same time. What would you suggest as a unique key?
from diodb.
I have two options at the moment:
- incrementing numerical id
- company slug inferred from company name or company's slug on Crunchbase (e.g. "https://www.crunchbase.com/organization/bugcrowd" becomes "bugcrowd")
The first option is good because it's simple to issue new ids but it's hard to find the program you need without grep.
The second one is more human-friendly but it's harder to issue new ids (and there may be collisions, when a company has more than one program).
from diodb.
We can use numerical ids but have a workflow merging them together in a single file on push to the main branch.
from diodb.
That's a really good idea! Maybe something like bucket sorting by the company's name (e.g. apple
in a
, bugcrowd
is in b
, etc.), might be a way to organise things. It also allows us to fix up the structure to allow multiple domains as we were discussing this previously. Also, the idea of a github action to collate that information is awesome too! Sounds like a good improvement to the data structure!
from diodb.
Hi @prodigysml Do you mean something like this?
tree programs
programs
โโโ a
โย ย โโโ amara.json
โย ย โโโ apple.json
โโโ b
โโโ bugcrowd.json
where each .json
file is:
cat programs/a/apple.json
{
"program_name":"Apple",
"policy_url":"https://developer.apple.com/security-bounty/",
"launch_date":"",
"offers_bounty":"yes",
"offers_swag":false,
"hall_of_fame":"",
"safe_harbor":"none",
"public_disclosure":"",
"pgp_key":"",
"hiring":"",
"securitytxt_url":"",
"preferred_languages":"",
"policy_url_status":"alive",
"contact_email":"[email protected]"
}
I'm not sure how to deal with collisions in such case. For example, both Android and Chrome programs hosted at https://bughunters.google.com thus both have google.json
file. As a workaround we can name it google.1.json
and google.2.json
for example.
from diodb.
Yup that's what I meant! Hmm didn't think about the collisions honestly. Your idea sounds good but maintainers will need to make sure they merge careful and don't duplicate data.
from diodb.
May be we can automatically deduplicate (via GitHub Actions) based on the data provided? For example, check that "policy_url" is unique or some combination of fields is unique.
from diodb.
Hi @prodigysml, hi @yesnet0! I've just added a proof of concept for the issue in #351 ๐
from diodb.
@nikitastupin I like it so far - In context of what diodb is trying to solve (catalog all known policy URLs along with their safe harbor status and optional attributes), policy_url itself could be considered the primary key. @prodigysml and @jmanoto - Any thoughts how we could better manage collisions here?
from diodb.
Resolving conflicts on json isnt difficult IMO, do you mean doubles?
from diodb.
Hi @sickcodes ๐
Resolving conflicts on json isnt difficult IMO ...
I agree that compared to more complex merge conflicts resolving this might be easy (though I personally don't know the easy way to do it). However, if we can avoid conflicts with little or no tradeoff - why not avoid them?
I'm aware of the following tradeoffs: (1) we should change the repo structure (short-term), (2) when someone adds a program he or she should figure out the primary key (filename) for it (long-term). We can remediate the 2nd tradeoff with providing a script that helps generating a new program (kinda like npm init
helps generate package.json
).
Also, storing each program in a separate file could help to avoid duplicate entries. For example, now we have 272 https://g.co/vrp
policy_urls (grep 'g.co/vrp' program-list.json | wc -l
) that point to the same Google VRP program. This fact isn't obvious when we store each program in one file.
... do you mean doubles?
Sorry, I didn't quite got the point. Could you elaborate what do you mean by "doubles"?
from diodb.
Doubles, meaning duplicate entries.
a58b6b2#diff-3209bee5852a8fc2dde56c367fffe517831fa18462004498955d355234899867R39-R40
Previously pandas would bring the JSON raw into a datatable, and spit out an alphabetically sorted, de-duplicated JSON, jq to pretty-print it.
from diodb.
Another way to solve the rebase problem would be to use https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/syntax-for-issue-forms instead of PRs. So that a contributor opens an issue, fills a form. Then a maintainer checks the submission and if everything is fine it'll run a chat-ops command to merge the program. Though it's not clear how to handle changes or deletions in this case.
from diodb.
Related Issues (20)
- Revamp `README.md`
- ๐ก Participate in Gitcoin Grants and Bounties HOT 1
- Add URL and email validations
- Footer social media link need to be fix and stale Copyrights HOT 2
- Plisio BugBounty HOT 1
- d53df6856bbafad4f7dd257ef9d96c56fd8700bcf d378d96a53dc695da2928f6i0 Length Tััะต Timestamp 2582 image/webp
- add new program: Grafana Labs HOT 1
- https://liveclicker.com/trust/report-a-vulnerability/ Looks like there is no way to submit a vulnerability HOT 3
- add new program: inDrive HOT 1
- Brentley Systems Responsible disclosure program HOT 1
- add new program: Ashby HOT 1
- add new program: Target HOT 1
- [idea] Evaluate security.txt HOT 1
- add new program: Livesport HOT 1
- add new program: Deputy HOT 3
- add new program: HiddenLayer HOT 1
- Update Dell Bug Bounty Program Link HOT 1
- Global Media Group Services Ltd Bug Bounty Program HOT 1
- add new program: Inditex HOT 1
- add new program: {Monash University} HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diodb.