Giter Club home page Giter Club logo

very-good-science / data-hazards Goto Github PK

View Code? Open in Web Editor NEW
29.0 5.0 13.0 108.52 MB

Data Hazards is a project to find a shared vocabulary for talking about worst-case scenarios of data science - and to use that vocabulary to help people understand and avoid Data Hazards.

Home Page: https://datahazards.com

License: Other

Makefile 0.08% Python 0.43% Batchfile 0.11% CSS 0.08% HTML 99.31%
ethics ethics-resources data-science data-ethics futures future-thinking

data-hazards's People

Contributors

allcontributors[bot] avatar huwwday avatar ismael-kg avatar mcnanton avatar melanieistefan avatar nataliezelenka avatar ninadicara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

data-hazards's Issues

Where to test data hazards

  • Properly roll out for JGI seedcorn (November)
  • Test/get feedback on reflection activities/forms/process at workshop(s)

prepare graphs

  • Get data out of qualtrics (@ninadicara)
  • Pictographs (@NatalieThurlby)
    • Pictographs should show how many people found the workshops + materials useful/easy/etc.
  • Paired bar chart (@ninadicara)

Rough plan(s) of what a Data Hazards review would look like

Again, useful for bringing to collaborators.
What would a data hazards review look like, who would be involved and what the process be?

We could have a couple of options to try and see which works best, but probably best to have some initial ideas to work from to begin with.

Things we've discussed so far:

  • A panel review (similar to current ethics review panels)
  • A co-produced outcome based on a discussion between the researcher and the reviewer
  • Both researcher and reviewer independently reviewing and then combining afterwards

Examples of people/orgs doing good work on addressing the hazards

As I'm writing some hazards, thinking about how it would be cool to showcase projects and people working on directly addressing the hazards in their areas as a bit of inspiration, or to help people looking for ways to do that.

Putting it as an issue so I don't forget, but likely to be an enhancement for the longer term.

Add timeline to website

  • Decide timeline for paper and let Tessa know
  • Add project timeline to website

We need to decide on a timeline for this paper.

We originally thought of data hazards as an opinion piece about a missing part of Data Ethics ๐Ÿค” , but our enthusiasm is turning it into something more solutions-based.

Maybe not, but it might be sensible to break this up into smaller more bite-sized pieces. Maybe we can ask Tessa? From our reading so far, it seems like pointing out a problem is publishable on it's own.
Step 1: Opinion article. Hey! We've noticed this problem through reading these papers! This is one part of the puzzle that especially needs an intervention. This is one part of the problem that is especially resistant to interventions! Let's imagine worst case scenarios! Here is how other industries do this! Let's draw some attention to this area and call for solutions!
Step 2: Presenting data hazards, we made them, we tested them, this is how people felt about them and what was gained, here's all the materials! We're using them for seedcorn funding! We'd love it if you used them for your work/your funding/your journal. And we can help you use them through this mechanism.

Add "why are data hazards so scary?" to website

Basically want a blue info box on the home page that says what the purpose of Data Hazards is, setting the tone for:

  • Not meant to scare, but to catch your eye and to signal that ethics is serious and important. Ideally, would compel people to respect ethics as a crucial part of the data science process. Just as handling chemicals safely is crucial for working with them, considering the impact of your work is always important.
  • "Handle with care": a chance to for researchers to let potential users know that the responsibility for how this work is used doesn't start and end with the researcher, but it continues as the technology matures with each person who uses it.

(Probably I'm missing something)

Make GitHub public

  • Fix/write README (all contributors matrix, link to site)
  • Write GH repo description and link to deployed site
  • Add GH repo labels, e.g. data-ethics, etc.
  • Add basic contribution guidelines
  • Set up all-contributors bot
  • Tidy issues
  • Turn repo to public
  • Make "Projects" private: somewhere to record private conversations if we need to.

Fix hazard examples

Some of the Hazard examples are out of date and need updating

  • DeepFold in Difficult to understand

Ethical Approval

  • Get information about timeline, materials needed etc
  • Find out if we need someone to oversee
  • Write materials
  • Submit to ethics
  • Make inevitable revisions

Make a random generator

For creating situations to apply things in.

"What would be the result of this technology applied to..."
[the public, employees, *think of other group examples]

"in the context of"
[healthcare, finance, recruitment, policing, insurance, malicious hackers, government]

Meta-analyse the Data Hazards project using the Data Hazards materials

  • We should do a self assessment of the Data Hazard labels!
  • This could be done by anyone with an interest in the project at a workshop or asychronously.
  • We should also ask other people
    • maybe we could do this at a Data Ethics Club?
    • we could first present the idea of Data Hazards
    • then ask them to use them to assess the Data Hazards project
    • we could then show them our self-assessment
    • at the end, we could present any initial findings of our other meetings

Tasks for seedcorn planning

To-do

  • Meet with Patty/Lily/Chiara (depending on who's organising) to decide how Data Hazards will feature in the seedcorn
    • what format they'd like to do it (e.g. self-assessed by applicants web form)
    • when that needs doing by
  • Design a survey for applicants to go out in the acception/rejection emails asking for their thoughts on the Data Hazards ethics part of the process. Presumably this will require us to update our ethics application.
  • Write the form
    • Ask successful applicants if they'd be willing to complete the Data Hazards process (tickbox)
    • decide: asynchronous process OR workshop (decided: asynchronous )
  • Add page to the website, explaining the seed corn trial
    • Add Contents:
      • The purpose of applying data hazards
      • Which types of projects are relevant to apply hazards to
      • What the process will look like for them
        • Only inviting successful projects this time
      • Links to more context about the project
      • To contact me if they have any questions
      • The purpose of this trial
        • So, far, researchers have found the process to be useful, and we think it will be for you, too.
        • They'll have an opportunity to feedback
        • It won't affect their application if they choose not to take part
    • Update link from the timeline to the new materials
    • Get Lily to mirror the page on the JGI website as that will be a more official link for the applicants
  • Design the asynchronous materials #77
  • Write email text for successful/unsuccessful applicants
  • Update survey for asynchronous materials
  • Update Eng faculty ethics materials for asynchronous/seed corn process

Form materials

Conditions for funding

We strongly encourage all participants to consider the ethical and societal implications of their work. Successful projects will have the opportunity to take part in a short, guided Data Hazards self-assessment for ethics in data science to assist with this. You can read about the Data Hazards project here.

Application form

If successful, would you take part in a Data Hazards guided ethics self-assessment exercise? This is relevant to any project that involves statistics, AI, ML, algorithms, or data collection.
YES / NO ย 

Direct harm/physical harm

Just mulling over the definition of the 'May cause direct harm' label.
It's definition is that it is physical harm - my question is do we want to restrict this to physical harm?

As an example, I'm thinking about deep fake generators which could cause a lot of psychological harm, or be used for various illegal purposes.

It might be that that fits better elsewhere but interested in your thoughts! I can see the appeal of having a label restricted just to physical harm - in which case would it be better to change the title of the label to 'may cause physical harm' to be explicit?

Create Data Hazards lesson plan materials

  • Create lesson page for students (resources they might want during a lecture)

    • Embed the slides
    • List the titles of the Data Hazards
    • Link to the full list of Hazards
  • Create lesson plan page for teachers (how to run a teaching session/lecture on it)

    • Timings of lesson we've tried and tested
    • Prep needed on the part of the teacher/lecturer
    • Slides (to reuse or as an example)

Fix GH action

  • Make it so that GH action only copies over the build pages to the gh-pages branch on pulls (not PRs) - but still does the building for PRs.

Credit for people who help with the project

In the spirit of all-contributors, it would be cool if on the website and repository we can credit people for their involvement in the project. e.g. peer review, feedback, ideas - would link in nicely too if we'd like people to be able to suggest changes to the hazards via GH issues.


To-do:

  • Add all-contributors page to the website
  • Make sure all-contributors is up to date #36
  • Look at workshop 1 feedback and decide if we want to make any tweaks for workshop 2 about asking about credit

How to ask for feedback

We want to make sure this idea gets feedback from the right people. How can we do this most effectively?

Idea:

  • Write a short summary of our plans, almost like seed corn application, with our basic ideas (#1 , #2, #3)
  • Circulate that to (some subset of) people to review, to begin to plan for a workshop where we will invite the others to test the actual implementation.
  • Have a workshop where we go through some of the above. Invite the other people to review, also invite Data Ethics Club attendees. Invite people with problems separately so that we can look at the problems first. Basically need to decide on what needs to be asked/decided before workshop, and what can be developed through workshop, i.e. I think at a workshop we could ask if people want to suggest additional hazard labels, but we would want some ready beforehand. Similarly, I think we can ask for feedback on the activities (i.e. how to assign labels), but we would want some activities ready to go.
    • How many workshops (1? 2?)
    • When should they be? (Would be cool to have one in Data Week if there's time?) (June 11th)
    • Maybe we could try an academic and a public-facing version and invite companies through Tech Ethics Bristol?

Create an initial list of Data Hazards

Using the GitHub project board to come up with an initial list of maximum 12 data hazards that we can put to collaborators to help explain the concept, and eventually test.

Make slides for data hazards html slides

Why

  • Can easily keep slides here on GH (no need to upload to OSF - automatically sync'd) to make them Open/shareable/accessible.
    • Might be especially useful for sharing materials for other people to run data hazards workshops via the website.
  • Easy for version control/reusing (parts of) slides, etc
  • Should automatically share the same look (CSS/fonts) as the website
  • I've been meaning to test this out ๐Ÿง‘โ€๐ŸŽ“

How

  • Could use hieroglyph
    • Might need to build website and slides separately with sphinx-build (will need to test), which might mean we need a make file.

Instructions on how to use the hazards

Planning

  • Make a webpage where people can download the hi res hazard images from
    • Gareth Jones said he might give this a go ๐Ÿ™Œ
  • Suggest to link to full write-up of why you think each one applies: how?

Doing

  • How to do a self reflection and how to display (link to Gareth's app #56)
  • How to use in a workshop overview
    • Make slides more general (so anyone could use them to explain it)
    • check animation order on case study
    • Add recording
  • How to use for asynchronous review e.g. funding
  • How to know if your project is suitable
  • Browse through how different people have used them
    • Tell us how you've used "Data Hazards"

Add a licence

We need to add a licence to the repo - thinking CCBY-NC

Design the asynchronous process for Data Hazards

I've been thinking about how to make filling in Data Hazards asynchronously easy. This is related to #45, but a little more specific.

Using Data Hazards = LARD.

  • Learn: find out about the data hazard labels: what are they and what's the purpose?
    • Dream: commission a short explainer video
    • Or: Record a basic explainer video
    • And/Or: update website with this in mind, i.e. minimum you need to know to apply the hazards page
  • Apply: apply them to your project
    • Dream: natively do this inside OSF (#76)
    • Or: Provide a web form that people can fill in where they can download their answers at the end.
  • Reflect: reflect on how other people may apply the labels differently, and how you might prevent some of the outcomes that you and others have identified. Getting other people's feedback is optional, but encouraged.
    • Dream: allow other people to ethics-register your project
    • Or: provide a mailing list, panel, discussion group, subreddit/whatever, where we apply the labels to registered projects
  • Display: display your hazard labels in slides, papers, and grant applications:
    • to let other people know that you've thought about this stuff, and what steps you've taken to minimise negative outcomes
    • to let other people know that when they use the tech/algorithm, they take on the repsonsibility of continuing to ensure that the work is used for everyone's benefit
    • Add Gareth's app #56, #45
    • Have a section of the website showing how other people have used the.

Videos

  • Run through hazards - @NatalieThurlby
  • Case study - @ninadicara

Video aims:

  • keep to 3 mins each
  • speed up-able embedded
  • with transcript

Introductory section

Writing an introductory section of the piece which describes:

  • The need for this approach
  • Gaps in existing ethical review processes
  • What we hope the approach will bring to the data science community

Alt text

Need to add alt text to hazard labels.

Update 7.03.2022:
There are now alt text labels on all of the individual hazard page labels, but not on the main overview page.

Tasks for workshop organising

Tasks for @NatalieThurlby and @ninadicara

By 13th August

  • Decide tentative dates (we decided 1-3pm on the 21st Sept or 23rd Sept 2022)
  • Email Ismael to see if he can make those dates
  • Formally set date (Tuesday 21st Sept 1-3pm)
  • Email the workshop owners to see if they can make the new date. NB (!) Attach the participant information sheet to this email so that they can read it.

By 20th August

  • Make Eventbrite with consent sheet built into sign up for attendees - link to participant information sheet.
  • Send the sign up event to all the previously interested parties.
  • Send to Lily to advertise the sign up in the JGI newsletter

1 week before workshop

  • Send project owners the Qualtrics survey to complete (NB this includes the consent form)
  • Ensure Qualtrics is changed and ready to go (See #47)
  • Send email to participants
  • Send email to Ismael

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.