dpgalliance / dpg-standard Goto Github PK

View Code? Open in Web Editor NEW

96.0 96.0 42.0 455 KB

Digital Public Goods Standard

License: Creative Commons Attribution Share Alike 4.0 International

digital-public-goods public-good public-goods sustainable-development-goals

dpg-standard's People

Contributors

Stargazers

Watchers

dpg-standard's Issues

Generation of Markdown docs from JSON schema

From the description in unicef/publicgoods-roadmap#17:

screening-schema.json in unicef/publicgoods-candidates currently defines the set of questions that operationalize the standard defined in DPGAlliance/DPG-Standard.

The standard is defined in standard.md, and operationalized in standard-questions.md, yet these are three different files that have to be manually updated to keep them in sync. This is highly inefficient and prone to errors.

The schema should be the source of truth, and from there generate any other related files where the same information is presented in a more human-readable way.

The proposed tasks are as follows:

Learn GitHub. Go to https://lab.github.com/, and start with the first course Introduction to GitHub.
At the link above, also take the Communicating using Markdown as your first task will involve Markdown
At the link above, also take the GitHub Actions: Hello World as we’ll use GitHub actions to automate sync’ing across repos (we are already heavily using them)
Create an empty repository in your personal account: https://github.com/dcha7225/JSON-MD
Add a copy of the nominee-schema.json and the screening-schema.json
Using Javascript/NodeJS write a script that reads the two JSON files above, and outputs a Markdown document similar to standard.md. Make the necessary changes to your JSON files to accomplish this task.
Using Javascript/NodeJS write a script that reads the two JSON files above, and outputs a Markdown document similar to standard-questions.md. Make the necessary changes to your JSON files to accomplish this task.
Using Github Actions, automate both scripts above so that Markdown files are updated anytime the JSON files change

Once all of the tasks above have been completed and reviewed:

Migrate all of the above changes to the DPG Standard repository

Question: re: Do No Harm

#9. "Do No Harm" has 4 sub-components to it. Does the top-level, i.e., "All projects must demonstrate that they have taken steps to ensure that the project anticipates, prevents and does no harm", include aspects not covered by the sub-components?

In other words, can "do no harm" include things like human rights abuses?

Indicator 9.c Protection from Harassment

I would avoid using the word “mechanism” here, see 9.b for explanation. Again, it is pretty obvious what the intent behind this indicator is, but here we are crossing over in a UX functionality domain like blocking or reporting a user on Facebook. Although blocking and reporting an abuser will prevent you from being contacted by said user, or even seeing the persons posts, it does not by any means prevent or stop the harassment.

Most social media, or other applications or services for that matter, do not offer any forms of functionality providing safety or security of underage users.

Indicator on Governance

I believe the standard needs a section on governance. It is entirely possible that open projects maintain a rather restrictive policy for decision-making, changes to the project, inclusion and exclusion of parties. Open Source projects may follow many different models in this area, including the "Benevolent Dictator for Life", who might be the founder of the project. I think that may be a problematic model, but given its prevalence, it might be difficult to exclude it. I think it is nevertheless important for the alliance to have that discussion.

Personally (and I should emphasize that I speak only for myself), I tend to think that a governance model that says something like

"All projects must be governed by a multi-lateral decision body whose proceedings are public"

is a good minimum requirement. In this case, the least the BDfL is required to appoint others from different groups to guide the project.

Form Suggestion: SDG Question

For the SDG question we propose changing the prompt word from "description" to "explain how this solution is relevant"

Indicator 8. Adherence to Standards & Best Practices

I would like it to read:
“Projects must demonstrate adherence to standards, best practices, and/or principles. For example, the Principles for Digital Development. See list here for reference”

A very good indicator. I would probably prefer to include a list of other standards as well as SDLC, SSDLC, OWASP/ ASVS, ISO/IEC 27k, some of these further covered under my paragraph “Summary and a few thoughts.”. Depending on the project, there might be other standards, best practices or principles that might be more applicable or better suited.

It would also be advisable to develop a best practice regarding DPG for the future.

Indicator 3. Clear Ownership

In this indicator the sentence “Ownership of everything the project produces must be clearly defined and documented” could benefit from some clarity. Is this indicator directed at who owns the project itself, the algorithms, the application, design, UX-design, program, system or service? Or are we really taking “everything” into consideration?

If so, does this apply to content or data, produced or generated by user? One could argue that the project (through interaction with users) produce data. Who owns the data automatically collected through interaction with the user? Or the data a user would have to share in order to be able to use the app, software or system? Does this cover both PII and non-PII, such as name, age,address, geolocation, platform, make/ model of phone/ tablet/ browser, IMEI-number and so forth?

Multi selector for license in standard-questions.md

This issue concerns standard-questions.md and "DPGs must use an open license. Please identify which of these approved open licenses this project uses:"

The text above refers to "licenses" in plural, and that is correct, as both software project, content, AI and data projects can use more than one license. The issue is that the submission form only allows the user to choose one license.

Suggested change: Change the drop-down in the form to another field type that allows the user to choose more than one license.

We might also want to change the first sentence in the form text "DPGs must use an open license" as this refers to only one license.

Document Propagation of Changes

Following on the small change proposed in #73, we should document how such changes need to be propagated across repositories to keep all sources in sync.

My current understanding is as follows:

DPGAlliance/DPG-Standard is the source of trust, and information found in all other places needs to be synced with what this repository contains.
Our public website contains https://digitalpublicgoods.net/standard/ which needs to be updated manually matching the contents of standard.md.
Our public website also contains the submission guide which needs to be updated manually as needed.
Where relevant, the unicef/publicgoods-candidates needs to be updated through the following two files: the nominee-schema.json and screening-schema.json, and propagate any changes to all encoded nominees and digital public goods.
Relevant changes need to be propagated to the Submission Form, controlled through the lacabra/publicgoods-submission repo, and its corresponding schema.js
Relevant changes need to be propagated to the Eligibility Form at unicef/publicgoods-scripts by editing quizQuestions.js

@nathanbaleeta, @SarahWat, @Not-Whiskey, @Lucyeoh: Am I forgetting anything? Please comment on this issue or confirm that it looks good to you, and this will be added to this repository documentation.

FYI: @amreenp7, @nathanfletcher

Indicator 4. Platform Independence

This indicator is too vague in my opinion, “demonstrate” and “indicated” are both pretty abstract words you can fill with just about anything. There are plenty of examples showing there can be huge gaps between what could work in theory, and what actually works in real life. This is true, both for small agile IT-projects and huge governmentally run projects spending millions in planning, preparation and execution. I would strongly urge that the word “indicate” is replaced by “prove” or that the sentence is changed to “…..list existing fully functional, open alternatives”, to avoid good projects being shut down by poor planning.

[Proposal] Retrofit the standards for Open Data, Open Standards

📝 🧑‍⚖ Expand acceptable content and data licenses to Creative Commons IGO variants

Summary

Add the Creative Commons IGO license variants as acceptable licenses for the DPG Standard.

Background

Recently, I discovered variants of Creative Commons 3.0 licenses designed for nongovernmental agencies and international aid organizations. These licenses remain identical to their non-IGO counterparts with the exception of a clause that requires disagreements to be handled through meditation and arbitration. For this same reason, the 3.0 IGO licenses are not superseded by the 4.0 international variants. Thus, this seems like a blind spot for projects and creative works using these licenses who may meet all other DPG Standard requirements, except this one.

Also worth noting that UNICEF is currently exploring an open access policy, which would use one of the 3.0 IGO variants as a default license. These means all works created by UNICEF staff, contractors, interns, volunteers, etc. will be licensed under one of these IGO licenses unless another license is explicitly selected.

Details

The implementation is below:

DPGAlliance/publicgoods-candidates#940

More context can also be found in the Creative Commons wiki:

https://wiki.creativecommons.org/wiki/Intergovernmental_Organizations#Can_intergovernmental_organizations_.28.22IGOs.22.29_use_CC_licenses.3F

Outcome

More comprehensive selection of acceptable licenses for the DPG Standard.
Future-proofing compatibility with an open access policy for UNICEF.

Indicator 5. Documentation

I feel this indicator lacks a crucial point; the availability/ sharing of the documentation. This might feel obvious, but a clarification regarding who the information should be accessible to, how to ensure documentation integrity, updates, reviews and so forth would be of great value here. I would add “The project must have shareable documentation of the source code, use cases, and/or functional requirements”. It would not harm to clarify further, adding desired formats for instance. Some of the information might be publicly available, other parts might benefit from being shared between those who will actually work on the project in some kind of capacity. I would also like to see some kind of “Tool Bible” here, as in a listing of pre-approved tools used in the process of the project, being a part of the documentation.

Also, there is a matter of language. Should they make the documentation and or their instructions available in their native tongue, or is there one, or more languages that are mandatory to use when delivering this? Who is responsible for translations?

Identify cadence for DPG Standard updates

Summary

Identify a cadence or release schedule for new updates and changes made to the DPG Standard.

Background

In the April Standard Council meeting, @prajectory mentioned the ongoing work to codify the governance around the Standard and the process used to make changes and revisions. One possible way to streamline this is by identifying a cadence in which we periodically review feedback and proposals made to the DPG Standard, and if needed, release a new update. Ideally, as a Standard, formal updates are few and infrequent, but a regular cadence allows us to work through a backlog of feedback.

For context, there are many other FOSS projects where this might be modeled. The CHAOSS Project, which produces metrics about open source community health, releases twice a year through a formal review process. Fedora Linux, a DPG, makes two major releases of their operating system appx. every six months. Perhaps twice yearly or quarterly could be a good cadence depending on the backlog and capacity of the team.

Details

A few thoughts on how this work might be completed:

Conduct desk review of all pending feedback, proposals, and amendments to the DPG Standard.
Reach quorum on a release cadence for the Standard.
Consider scoping out dedicated time for part of the release process, e.g. proposal phase, review phase, final vote phase, etc.
Begin review of proposals for the next targeted release.

Outcome

Improved program management around the DPG Standard.
Suggestions and feedback on the DPG Standard are regularly reviewed and incorporated when necessary.

Documentation for new cadence

In order to stabilise the Standard Governance and account for the evolution, following steps are needed:

Migration to projects board
New vertical labeling (horizontal being the process)
New Kanban chart to handle the pipeline
Make changes to the governance document - council composition, voting mechanism, expert consultations, community discussion, work groups
Standard Operating Procedure

More specific language pertaining to data or file formats for content and data

I am exploring the implications of accepting or rejecting the proposed change to standard.md. This should trigger a two week community input period that will end February 27.

I suggest that we include some more specific language around data and file formats to ensure that indicator 8 is more specifically addressing the need for open and documented file and data formats.

This change would be highly relevant for content and data being vetted against indicator 2. By being more specific we will ensure to avoid that a content repository/project that has content in an approved open license but at the same time use closed or undocumented file formats, making reuse difficult or impossible.

An example of languages that could make indicator 8 more specific: "Open data and content must be made accessible in an open data or file format, including relevant metadata and documentation.".

Define list of acceptable licenses for DPG type "standard"

The DPG standard is very clear in stating which licenses are approved for software, content and data here, namely:

Projects must demonstrate the use of an approved open license. For open source software, only OSI approved licenses are accepted. For open content the use of a Creative Commons license is required. While we encourage projects to use a license that allows for both derivatives and commercial reuse (CC-BY and CC-BY-SA), or dedicate content to the public domain (CC0); licenses that do not allow for commercial reuse (CC-BY-NC and CC-BY-NC-SA) are also accepted. For open data, an Open Data Commons approved license is required. See The full license list for reference.

Yet, there is no mention of what licenses are approved with the digital public good under consideration refers to a standard.

Proposed way forward:

Identify an authoritative licensing body that we can rely on for choosing acceptable licenses for standards
Amend the text of the standard to explicitly list acceptable licenses to the standard type and update this list accordingly.

Cc: @nathanbaleeta

Incorrect spelling of acronyms in the DPG Nomination Form

It is "acronymns" --> It should be "acronyms"

Screenshot attached for reference

Indicator 9.a Data Privacy & Security

I would like to add some clarity her by introducing a few more parameters.
“Projects collecting, processing, storing or distributing data must identify and list, the data they include. Or (…identify and list all data included) Projects must also demonstrate how they ensure the privacy, integrity and security of this data in addition to the steps taken to prevent adverse impacts resulting from its collection, processing, storage, and distribution”.
As I see it there are three “flaws” in this paragraph. The first one being “collecting data”. This should be applicable to every project that collects, process, store or distribute data.
The second being the term “types of data”. Data types or types of data, have both numerous meanings like quantitative data or qualitative data, if we are addressing high level data. Short term, long term or useless data are other data types. Primitive, composite or even abstract data will also be considered types of data. Nominal, ordinal, discrete or continuous data, are all types of data, and I could go on listing data types, but that is beside the point. I think the intent behind the term “types of data” was for the projects to identify and list all data collected, processed, stored or distributed, so a rewrite would be in order.
The third is the lack of the word “integrity” when it comes to adverse impacts. To me data integrity is as important as privacy and security, as you cannot have the two latter without integrity being guaranteed. Will DPIA (Data Protection Impact Assessment, or equivalent tools for assessing impact be mandatory through any of the other paragraphs?
Using already in place standards will ease the task of demonstrating or verifying compliance. There might also be a need for a discussion concerning data distribution vs data sharing from a security point of view. Should there be a reference to the three “Application Security Verification Levels” in OWASP/ ASVS, somewhere here? Being able to correctly identify the level is crucial in order to achieve the right amount of security measures.

Specification of requirements for AI/machine learning

There are some very interesting projects around open AI under development and I think it would be good to run a process to add some more specific requirements around AI. I have noted some points to frame a possible discussion and process:

What licenses are required for AI algorithms (needs to be added to indicator 2), models and datasets.
Should we specify that there could different licenses on source code/algorithms and training data? One could argue that this is covered in the standard in its current form, in indicator 2 with reference to open source and open data?
Would we require the original training data to be openly licensed for an AI-model to be accepted as a DPG?
Can a dataset that is created based on non-open AI/ML be licensed as open data?
Regarding indicator 9. Can “no harm” in one context be “harm” in another context when for example ML classification is based on the same training data, but used on a different dataset?

DPG reference catalog for standards & best practices

Based on recent discussions in the DPG secretariat I am now suggesting that we start the process of defining the DPG reference catalog for open standards.

This reference catalog would be an overview of standards that are recommended for Digital Public Goods. The standards can be technical or semantic.

I have created a PR (#55) where I have created a first very early draft of the reference catalog just to get us started in the process of defining a complete list.

I would suggest we add this to our work on version 2 of the DPG Standard to ensure involvement for all relevant stakeholders.

Standard question around US

Question-related to the standard:
DPGAlliance/publicgoods-candidates#256

Indicator 3 expanded for Open Data

Can we extend this indicator to think of ownership in a more democratic manner? Open Data projects cannot adhere to having ownership of the project itself. They may have ownership over the governance of the project but as a digital public good, they don't have ownership over the data itself. In this case, they are data stewards more than data owners.

Changes to consider:
Clear stewardship and not ownership

Discussion: implementations or generic products

Proposal:
Update the standard to explicitly focus on the generic products and not implementations. BUT collect and display implementation questions. Evaluate the core product. Update the language to focus on the product and how it was designed.

In favour:

Very difficult to evaluate, track implementations, beyond the scope of product owner

Against:

Reviews are more valuable if they include how they're used

Notes:

Add implementation and use cases in the registry
Can you be a DPG with no implementation?
Very difficult to separate product from implementation, but narrowing in on good or product

Clarify the term "mandatory dependencies"

Clarify in indicator 4 what we mean by "mandatory dependencies" in plain-language i.e. the solution can't perform it's core function without this component or "the solution doesn't work without this component".

The solution can't run & function without this.

Define mandatory dependency (hard dependency Vs. soft dependency > once we care hard dependency? Is there open alternative? How do we fix this hard dependency) in submission form and significant change the core product

How hard is it for it to be prohibitive for someone else to pick up?

Does the Standard work for AI Models?

Are there parts of the standard that are impossible to answer for AI model?

Indicator 7. Adherence to Privacy and Applicable Laws

The term “to the best of its knowledge” has a potential to become a huge liability. Again, I do understand why the term is used and the intentions behind it, however this is a classic pitfall when it comes to accountability. What about projects in countries that have little or no legislation regarding privacy and data protection? What international laws would apply then? Should there not at least be a “minimum” requirement regarding privacy and data protection?

Anyone can argue that they thought they did the right thing. I would seriously consider enforcing a minimum set of rules for privacy and data protection, and here is why:

Even though the U.S. and EU/ EEC countries have laws and legislation conserving data privacy and data protection, the U.S, and the EU have different approaches handling this. The lack of international laws procuring data privacy and data protection brought forth the need for a common framework. The “EU -U.S. Privacy Shield Framework” became a reality. This framework offered the users (both businesses and governments) a chance to comply with applicable laws on both sides of the pond. Even though EU, U.S. and Switzerland initially agreed upon this framework, it was later subject to scrutiny and is today invalid as a legal instrument for regulating data flow between countries.

However, the DPG standard is in its full right to invoke a “minimum” requirement regarding privacy and data protection, in order for any entity to be a eligible to partake in various programs, scopes or schemes

Privacy Laws > Privacy and other applicable laws

Indicator 7 deals with adherence to laws. This indicator also shares some commonalities between 9A which also deals with digital solutions that particularly collect non PII data. There has been overwhelming number of follow up conversations between the DPGA review team and the submissions we receive from products to be vetted as a DPG on indicator 7. That led us to speak to legal experts at reputed private and multilateral organisations.

Here is a detailed case that enumerates the challenges and issues we needed address through consultations.

The outcome of the consultation led us on the following path:
Indicator 7 - make it restricted to privacy

Don’t say relevant and applicable both
International and domestic laws shouldn’t be mentioned categorically - applicable means either domestic or international - we don’t know what supersedes what and let the project owner decide that
Sticking to privacy laws

We then put this for community discussion and received feedback about there being laws beyond privacy that are applicable to digital solutions. Our internal team at the DPGA Secretariat put together research and collated the responses we have received from projects for questions related to indicator 7. It found mention of many laws beyond privacy.

Therefore, the current course of action would be to change the text to "privacy and other applicable laws".

Remove the word "Support" standards to "adhere" or "align with"

For the question, does the projects support standards, could we replace "support" to "align with" or "adhere to"? One of the nominees was confused by this wording.

9a and 7 overlap to be resolved

With the recent updates to the standards #57 and focusing indicator 7 on strictly privacy laws, there is now a redundancy/overlap in our indicators. Indicator 7 requires the digital solution has adherence to applicable privacy laws, and indicator 9a - a subsection of do no harm dealing with privacy - requires the digital solution give clear and specific answers on how privacy is protected.
The question now is how do we resolve this redundancy?

The tactical options ahead of us are
i. to merge 7 & 9a
ii. drop one of the two indicators (most likely 9a), or
iii. keep both but clarify the language of each to remove the redundancy.

The tactical choice we make depends on our strategic vision. As a community we need to answer the following related questions.

a) What are our minimum requirements?
Since only "applicable" privacy laws apply, some digital solutions may clear indicator 7 based on their context, but could still do harm in a way that locally applicable laws do not protect for. They will therefore violate indicator 9. As the alliance, do we have a stand on inalienable privacy rights that every DPG must protect?
b) How do we translate the intent of these indicators into a vetting process? Is it operationally feasible?
For any digital solution in our registration marked as a DPG, there is an expectation that these projects have been vetted to adhere to the standards. Often, in response to questions under indicator 9a, digital solutions reply by sharing their privacy policy. This is insufficient to determine if the digital solutions are actually anticipating and preventing privacy harms. We may need the digital solution owners to spell out the system of checks and balances, specially under areas we consider our minimum requirements.
Based on our answers to the questions (a, b) we should choose between our options (i, ii, iii).

@nathanbaleeta is also putting together questions that we have received from users regarding these two indicators. That will help us with insights on b).

Can a single piece of content ever be considered a DPG? Does the standard reflect that?

Question: Does the UNSG definition indicate that a single piece of content could be a DPG?
If yes: Does the Standard currently accomodate that.

Cyber Security and the DPG Standard

In order for DPGs to have robust "security by design", the DPGA has commissioned expert dialogues and community inputs to implement cyber security tenets in the design stage of the DPG.

There are 3 parts to this:

How to retrofit the standard to meet the basic principles of cyber security by design?
How to think through risk mitigation through robust documentation and audits?
Guiding principles and best practices for product owners to deploy in the designing process of DPGs

Are there any creative commons licenses we do NOT want to accept

Today the standard indicator 2 only requires that "the use of a Creative Commons license".

The rest of the components are currently "encouraged" - While we encourage projects to use a license that allows for both derivatives and commercial reuse (CC-BY and CC-BY-SA), or dedicate content to the public domain (CC0); licenses that do not allow for commercial reuse (CC-BY-NC and CC-BY-NC-SA) are also accepted.

This discussion is to see if there are any CC licenses we feel would exclude a project from DPG status.

Proposed change to wording for questions for indicator 7

Change the question for 7 to:

Has this project taken steps to ensure adherence with relevant privacy, domestic, and international laws? (yes/no)
If yes, please list some of relevant laws that the project complies with:
If yes, please describe the steps this project has taken to ensure adherence (include links to terms of service, privacy policy, or other relevant documentation):

Indicator 9. Do No Harm

Although this is noble, I find it hard to see how this can be demonstrated, verified or even enforced. I would like to see a clarification as to what “harm” means in this context.

Examples: most social medias are set up for people to interact, share thoughts and ideas and have a good time in general. A lot of what we know from social medias are directly transferrable to other kinds of applications, systems or programs. We still know that people have been abused, bullied, harassed and even committed suicide through interactions on social media platforms. That is harm

We also know social media platforms today are used for selling drugs, illegal contraband, weapons, human trafficking and much more. That is harm, either indirectly or directly.

Another form of harm is harm induced by someone gaining access to personal information and/ or private information, either through exploiting technical flaws or in other ways engaging in unlawfully accessing data, using it to target specific people or groups. (Religious and political beliefs, sexual orientation, financial matters, health and mental health records, to name a few) Several Sony employees fell victims to extortion through a third-party service handling the employees’ medical data. This ended up doing harm to both individuals and the company.

We also know that user created content being publicly available, has led to prosecution of people who does not share the views of regimes, religious organizations and political adversaries. Where does the project´s responsibility end, and where does the user’s responsibility start? Without a clarification, this will appear more like a “buzz word term”, or an empty phrase. To me, this should either be explained further, or incorporated in Indicators 9.a-9.c

Indicator 6. Mechanism for Extracting Data

I would change this to:
“If the project has non-personally identifiable information (non-PII) there must be a possibility for extracting or importing non-PII data from the system in a non-proprietary format”. It would be advisable to either include a list of approved formats, or include examples of non-proprietary formats. I would avoid using the word “mechanism” here, see Indicator 9.b for explanation

[Proposal] Sharpening the "Goods" language in the Standard

Background

The DPGA secretariat has established an operationalized definition of Digital Public Goods that allows for the assessment of open software, data, content and AI models to determine whether they are digital public goods. As the stewards and hosts of this open standard, our goal is to identify projects that conform to the UNSG’s description of DPGs described in the roadmap.

Within this definition there is already a distinction between the “goods” aka. the open software, the data, the models, the content, with the “project” the purposeful design and deployment of these goods in adherence with laws, best practices and for the attainment of the SDGs.

Today, we frequently conflate goods & projects in the standard.

As a result - assessing and evaluating an individual good, a piece of data, content, or even a piece of software, against the DPG standard today would be very difficult. Even when they are licensed in a way that makes them adaptable and adoptable they are still often “neutral” and could be used for harm or to the SDGs.

It is only when they are designed or deployed with the intention of creating a specific effect that we are able to compare them against the DPG Standard as it is written today.

Assuming our goal is to have a consistent interpretation of what is a Digital Public Good, I see three options:

Option 1: Separate the Goods from the Project

In this scenario any digital, openly licensed software, standard, content, data would be considered a digital public good. As digital goods with appropriate licensing they would address non-rivalry and non-excludability.

We would then additionally try and identify and promote projects that use DPGs to help attain the SDGs.

Implications:

All open source projects would be considered DPGs
We would need to shift our wording to try and support projects that leverage DPGs to advance the SDGs,
We need a name for projects that leverage DPGs to advance SDGs i.e. “DPGs for SDGs” more like “open source for social good”.
The brand “DPG” will carry less weight - the need for an alternate term from "open source" is not clear.

Option 2: Explicitly focus on Goods NOT Projects (PREFERRED)

In this scenario the DPGA explicitly focuses on goods as openly licensed software, data, AI models, content and standards that have been designed or deployed in a way that advances the SDGs

We would update the language in the standard to explicitly focus on goods (remove reference to projects) being careful to acknowledge that while we can assess a tool (data set/content collection/AI model) based on how it was designed and how it is currently being deployed it doesn’t have active intent or control over the future the way a project does (it is just a tool).

Implications:

The standard and questions which currently reference “projects” would be re-written in order to explicitly focus on goods. Aka. What category best describes this good?: Select all that apply - Open Source Software, Open Data Set, Open AI Model, Open Standard, Open Content Collection
Goods can be software, collections of content and/or data sets as long as they’re able to meet the requirements of the standard.
Technically a single piece of content or data could be a DPG if it was able to show relevance to the SDG in it’s design/deployment (though would likely be difficult).
We would break “projects” down into their components and review a DPG that was both a platform and a collection of content/data as the platform and the collection separately i.e. MET Norway would contain the DPG: MET Norway software platform, and the MET Norway data set. However the individual pieces of data in the set would not be DPGs (only the collection).
Similarly we would screen GDL as a platform and a content collection. However every individual piece of content would only be a DPG in so far as it was part of the collection.

Option 3: Focus on projects not goods.

We screen and accept DPG “projects” where projects are defined as a planned undertaking that use open source open software, content, data, AI models to do no harm and advance the SDGs.

In keeping with the standard today DPG projects would require data/software/content/standards to have:
Planned Intent (that is relevant to the SDGs, a commitment to doing no harm)
Accountability (clear ownership and documentation) at the project level

Implications:

We risk brand confusion highlighting projects but calling them “digital public goods”
We would not consider individual pieces of content/data/software/standards outside of a designed project to advance the SDGs as DPGs
We’ll have to update the standard/questions to better reflect projects that encompass multiple individually licensed components
We need to update Indicator 1 - to make it designed intent

DPG standard, general thoughts

There is no point in reinventing the barrel. There are a lot of good standards out there that can be utilized both as inspiration and as a broad dynamic knowledge base for the various communities interested in developing projects within the DPG scope. Amongst them are standards like the SDLC- standard (Software Development Life Cycle) and the SSDLC-standard (Secure Software Development Life Cycle). There are also the standards within the ISO/ IEC 27k series, the international standard entitled: Information technology that can be utilized either by obtaining proper certifications or by “being in compliance with” one, or more of the standards.

Although far from perfect, the intentions behind the GDPR is amicable. A lot of European countries have high internet penetration rates and are pretty much way ahead of most development countries when it comes to computer literacy, technology development and tech usage. Still it appears as we as citizens in the tech savvy parts of the world, need laws and legislation protecting us from big corporations and data mining, as well as misusage of the data we generate. The needs for privacy and protection do not appear any less for citizens of development countries, quite the contrary. Datatilsynet´s guide to “Software development with Data Protection by Design and by Default”, deals specifically with software development (as described in article 25 in GDPR) and can be a useful tool even for projects operating outside the EU/ EEC.

Expansion of the Intro text for the DPG Standard

To include:

Glossary of relevant terms
Explanation of how project/product fits with the term DPG

Define "functional open alternatives" in indicator 4

Indicator 4 in the standard allows open source projects that have a mandatory dependencies that create more restrictions than the original license IF the project(s) is able to demonstrate independence from the closed component(s) AND/OR indicate the existence of functional, open alternatives.

Platform Independence | If the project has mandatory dependencies that create more restrictions than the original license, the project(s) must be able to demonstrate independence from the closed component(s) and/or indicate the existence of functional, open alternatives.

The mention of “open alternatives” in indicator 4 should refer to working solutions, in order to avoid misleading adopters into thinking that something is a DPG because in theory it can be run on this OSS software whereas in practice it only runs on this other proprietary software.

Below are suggestions (to be discussed) on how this indicator could be improved to more clearly articulate the requirement for projects that may have a closed component where an open alternative exists but has not been implemented.

Update Summary Indicator Text for "Do No Harm"

Current Indicator # 9 = Do No Harm (encompasses 9a,9b,9c)

Proposal = Steps Taken to Mitigate & Avoid Harm

The reason for this is that many solutions have expressed discomfort with the concept of assuring that they "do no harm" as they are open source solutions, there is not visibility into who and how the products are deployed or implemented.

This revised wording for indicator 9 allows solutions to describe the steps taken as part of the design and development of the solution to mitigate and avoid harm. Note that this is relevant but distinct from PR #70

Role of Sustainability of a DPG

The list of indicators suggest a set of criteria against which a solution can be assessed before someone or organization decides to adopt that solution.

I feel in order to achieve SDGs, the role of ministries adopting DPGs is crucial compared to the adoption of individuals or small organizations. With my experience engaging with ministries, I believe in addition to the current list of indicators, 'Sustainability of a DPG' is a major factor factor which determines adoption of a solution. This was also the belief of the HISP network.

Therefore one or more indicators related to sustainability (eg: community, capacity-building, support, funding, adoption) should be included in the core indicators list. This will even facilitate countries who plan to include this list of indicators in the process of procurement of information systems.

Planning of V.2 of the DPG standard

During this week's meeting (on Feb 18), we start our planning towards version 2 of the DPG standard. The idea is to have an open discussion that leads to a version plan that includes:

Mayor potential changes to the standard
A planned release date
A plan with dates for community input to the process and proposed changes

7. Adherence to Privacy and Applicable Laws

This indicator regularly causes issues because, it's not clear whether we're requesting compliance from the generic product or implementations. It's not clear which laws are "applicable".

From conversation #66 we're proposing 3 options for discussion.

Replace with --- Software must state to the best of its knowledge the product was developed in compliance with relevant laws based on the jurisdiction in which the generic product was developed.
Delete this ---- ends up redundant with Indicator 9a or too implementation focused.
Replace with --- PG must be developed in such a way as to facilitate compliance with relevant local laws for future implementations.

Indicator 9.b Inappropriate & Illegal Content

I would like this to read:
“Projects that collect, process, store or distribute content must have enforceable policies identifying inappropriate and illegal content such as child sexual abuse materials in addition to mechanisms for detecting, moderating, reporting and removing inappropriate/ illegal content”.

First of all, I added “process”, as in the indicator above, it might not be apparent, but in data processing, additional tasks such as detection, can be added with ease. Secondly, I added “reporting” as there should be a way for users to report undesired content. Community help to moderate, is always preferable.

My worries here are the latter part of the indicator “….materials in addition to mechanisms for detecting, moderating, and removing inappropriate/ illegal content”. The intentions are good and there is an obvious need for having such functions in place. For most people working in computing, the term mechanism or computing mechanism, describes a mechanism whose function is to generate output strings from input strings and if applicable, internal states, in accordance with a general rule that applies to all relevant strings and depends on the input strings and possibly, internal states, for its application. Internal states, refers to “hidden” variables impacting the behavior of the mechanism´s input and output. So, “mechanisms” is by all means a usable term.

However, having in place mechanisms for “detecting, moderating, and removing inappropriate/ illegal content” is a tall order for any software developer. Even the giants as Facebook struggles to have this in place, as fully functional services. So, I wonder if we can find a more suitable term without compromising the intent of the sentence. Moderating words, phrases or even sentences are pretty easy tasks. Detecting and blocking images being shared, is an entirely other ballgame and can easily make or break a project financially as it is demanding on so many levels.

Clear definition of requirements for standards in the DPG standard

The DPG standard defines standards as one of the categories of possible nominations to become PDGs.

The 9 indicators cover content, open source and data very well, but has no reference to actual criteria's that would give a clear frame for standards.

I suggest a quick iteration to fix this with focus on:

What defines an open standard
Other requirements like reference implementation, open governance, transparent processes and vendor/platform independence

Should we keep the "unknown" option in the DPG Nomination Form

For discussion - should we keep the unknown option? Your thoughts?

[Proposal] Retrofitting the standard for an open AI audience

The current standard attempts to address the needs for indicator 6 (Mechanism for Extracting Data) better for a software context than AI. There is still room for being more explicit. This issue seeks to outline key questions of concern regarding open ai models and data extraction mechanisms for non-personally identifiable information:

In the context of an open AI model, what qualifies to become non-personally identifiable information?

Model weights & parameters etc.

Describe the mechanism for extracting or importing non-personally identifiable information from the system in a non-proprietary format.(The answers below are my thoughts on what possible answers this question would attract in an AI context)
Model persistence or serialization can occur through:

For Scikit learn, save the model using Pickle (standard Python objects) or Joblib (efficiently serializing Python objects with NumPy arrays).
For Keras and Tensorflow, save the model in HDF5 format with .h5 extension.
For PyTorch, conventional approaches include Pickle, using either a .pt or .pth file extension etc.

PS: While for software the current wording in indicator 6 suffices, for AI models including the following keywords would make it clearer: model persistence/ serialization.

Indicator 1 expanded to account for Open Data

Open data projects can have a multiplier effect and pave way for domino effects that some times can't be fully accounted for in design. Datasets that are used for weather tracking can have consequences on food security which can have impact on nutrition which can have consequences to public health.

For this reason, open datasets will find it hard to prove that they are directly advancing SDGs actively. It may be easier for them to prove their "relevance" to SDG's.

Some options to change this are as follows:

Indicator 1 - demonstrate relevance to the SDGs/ designed and developed to be relevant to SDGs.

Content > Content Collections

If someone submits just one poem or limited amount of data, will that project be considered a digital public good? It may be a good idea to use 'content collections' and 'data collections' instead of content or data wherever it is used in the context of the project across all indicators of the standards.

dpgalliance / dpg-standard Goto Github PK

dpg-standard's People

Contributors

Stargazers

Watchers

Forkers

dpg-standard's Issues

Summary

Background

Details

Outcome

Summary

Background

Details

Outcome

Recommend Projects

Recommend Topics

Recommend Org