very-good-science / data-hazards Goto Github PK

Data Hazards is a project to find a shared vocabulary for talking about worst-case scenarios of data science - and to use that vocabulary to help people understand and avoid Data Hazards.

Home Page: https://datahazards.com

License: Other

Makefile 0.08% Python 0.43% Batchfile 0.11% CSS 0.08% HTML 99.31%

ethics ethics-resources data-science data-ethics futures future-thinking

data-hazards's People

Contributors

Stargazers

Watchers

Forkers

varlottaang ismael-kg gareth-j zachboser l-gorman maverick-escape-ordinary-coding rshkunov neelsoumya lextuga007 ansuz mcnanton dsmukilan melanieistefan

data-hazards's Issues

Where to test data hazards

Properly roll out for JGI seedcorn (November)
Test/get feedback on reflection activities/forms/process at workshop(s)

prepare graphs

Get data out of qualtrics (@ninadicara)
Pictographs (@NatalieThurlby)
- Pictographs should show how many people found the workshops + materials useful/easy/etc.
Paired bar chart (@ninadicara)

Actions 28.06.2021

@NatalieThurlby

Find a date for workshop
Write the case study

@ninadicara

Drafting email to send to project owners (today)
Sending Natalie qualtrics info (today)
Add links in Qualtrics to extended descriptions and example

Tweet about `#DataHazards` and update twitter link on website

Tweet about Data Hazards
Update twitter link on website

Rough plan(s) of what a Data Hazards review would look like

Again, useful for bringing to collaborators.
What would a data hazards review look like, who would be involved and what the process be?

We could have a couple of options to try and see which works best, but probably best to have some initial ideas to work from to begin with.

Things we've discussed so far:

A panel review (similar to current ethics review panels)
A co-produced outcome based on a discussion between the researcher and the reviewer
Both researcher and reviewer independently reviewing and then combining afterwards

Examples of people/orgs doing good work on addressing the hazards

As I'm writing some hazards, thinking about how it would be cool to showcase projects and people working on directly addressing the hazards in their areas as a bit of inspiration, or to help people looking for ways to do that.

Putting it as an issue so I don't forget, but likely to be an enhancement for the longer term.

Update site with new date

timeline
upcoming events

Add timeline to website

Decide timeline for paper and let Tessa know
Add project timeline to website

We need to decide on a timeline for this paper.

We originally thought of data hazards as an opinion piece about a missing part of Data Ethics 🤔 , but our enthusiasm is turning it into something more solutions-based.

Maybe not, but it might be sensible to break this up into smaller more bite-sized pieces. Maybe we can ask Tessa? From our reading so far, it seems like pointing out a problem is publishable on it's own.
Step 1: Opinion article. Hey! We've noticed this problem through reading these papers! This is one part of the puzzle that especially needs an intervention. This is one part of the problem that is especially resistant to interventions! Let's imagine worst case scenarios! Here is how other industries do this! Let's draw some attention to this area and call for solutions!
Step 2: Presenting data hazards, we made them, we tested them, this is how people felt about them and what was gained, here's all the materials! We're using them for seedcorn funding! We'd love it if you used them for your work/your funding/your journal. And we can help you use them through this mechanism.

Add "why are data hazards so scary?" to website

Basically want a blue info box on the home page that says what the purpose of Data Hazards is, setting the tone for:

Not meant to scare, but to catch your eye and to signal that ethics is serious and important. Ideally, would compel people to respect ethics as a crucial part of the data science process. Just as handling chemicals safely is crucial for working with them, considering the impact of your work is always important.
"Handle with care": a chance to for researchers to let potential users know that the responsibility for how this work is used doesn't start and end with the researcher, but it continues as the technology matures with each person who uses it.

(Probably I'm missing something)

Make GitHub public

Fix/write README (all contributors matrix, link to site)
Write GH repo description and link to deployed site
Add GH repo labels, e.g. data-ethics, etc.
Add basic contribution guidelines
Set up all-contributors bot
Tidy issues
Turn repo to public
Make "Projects" private: somewhere to record private conversations if we need to.

Fix hazard examples

Some of the Hazard examples are out of date and need updating

DeepFold in Difficult to understand

Code of conduct not displaying as expected?

Add Ismael to all-contributors

Ismael
JGI team (testing/feedback)

Decided not to do for now:

Add our own emoji key "review"/"feedback/suggestions"

Write a short summary of the project idea for sharing for feedback

Linked to what we said we would do in #5

Ethical Approval

Get information about timeline, materials needed etc
Find out if we need someone to oversee
Write materials
Submit to ethics
Make inevitable revisions

Add a 'Data Hazards' tab to the nav bar

Lets make the hazard labels a bit more obvious by giving them their own tab :)

Make a random generator

For creating situations to apply things in.

"What would be the result of this technology applied to..."
[the public, employees, *think of other group examples]

"in the context of"
[healthcare, finance, recruitment, policing, insurance, malicious hackers, government]

Meta-analyse the Data Hazards project using the Data Hazards materials

We should do a self assessment of the Data Hazard labels!
This could be done by anyone with an interest in the project at a workshop or asychronously.
We should also ask other people
- maybe we could do this at a Data Ethics Club?
- we could first present the idea of Data Hazards
- then ask them to use them to assess the Data Hazards project
- we could then show them our self-assessment
- at the end, we could present any initial findings of our other meetings

Open Science Framework "Ethics registration"

Contact them
Initial meeting
Follow up with projects team?

Update `deploy-site.yml` Action

Action runs on PRs

Setup Zenodo

setup Zenodo
add badge to repo

Tasks for seedcorn planning

To-do

Form materials

Conditions for funding

We strongly encourage all participants to consider the ethical and societal implications of their work. Successful projects will have the opportunity to take part in a short, guided Data Hazards self-assessment for ethics in data science to assist with this. You can read about the Data Hazards project here.

Application form

If successful, would you take part in a Data Hazards guided ethics self-assessment exercise? This is relevant to any project that involves statistics, AI, ML, algorithms, or data collection.
YES / NO

Direct harm/physical harm

Just mulling over the definition of the 'May cause direct harm' label.
It's definition is that it is physical harm - my question is do we want to restrict this to physical harm?

As an example, I'm thinking about deep fake generators which could cause a lot of psychological harm, or be used for various illegal purposes.

It might be that that fits better elsewhere but interested in your thoughts! I can see the appeal of having a label restricted just to physical harm - in which case would it be better to change the title of the label to 'may cause physical harm' to be explicit?

Create Data Hazards lesson plan materials

Create lesson page for students (resources they might want during a lecture)
- Embed the slides
- List the titles of the Data Hazards
- Link to the full list of Hazards
Create lesson plan page for teachers (how to run a teaching session/lecture on it)
- Timings of lesson we've tried and tested
- Prep needed on the part of the teacher/lecturer
- Slides (to reuse or as an example)

Check there are options for non-researchers to give their roles

This is in reference to the survey we will be using.

Are you public/private sector?
Health, finance, government?

What is your area of expertise?

Set up Open Science Framework project

Make an Open Science Framework project

Add:

Ethics materials
Pre-registration
Link GitHub account (so that all the GitHub files show up in the file browser)
Add a link to deployed website in wiki/description

Fix GH action

Make it so that GH action only copies over the build pages to the gh-pages branch on pulls (not PRs) - but still does the building for PRs.

Review and action Valerio's peer review

Review and action Valerio's peer review

Credit for people who help with the project

In the spirit of all-contributors, it would be cool if on the website and repository we can credit people for their involvement in the project. e.g. peer review, feedback, ideas - would link in nicely too if we'd like people to be able to suggest changes to the hazards via GH issues.

To-do:

Add all-contributors page to the website
Make sure all-contributors is up to date #36
Look at workshop 1 feedback and decide if we want to make any tweaks for workshop 2 about asking about credit

Add code of conduct to website

Add page for Data Hazards code of conduct (based on Data Ethics Club CoC)
Add to navigation

How to ask for feedback

We want to make sure this idea gets feedback from the right people. How can we do this most effectively?

Idea:

Write a short summary of our plans, almost like seed corn application, with our basic ideas (#1 , #2, #3)
Circulate that to (some subset of) people to review, to begin to plan for a workshop where we will invite the others to test the actual implementation.
Have a workshop where we go through some of the above. Invite the other people to review, also invite Data Ethics Club attendees. Invite people with problems separately so that we can look at the problems first. Basically need to decide on what needs to be asked/decided before workshop, and what can be developed through workshop, i.e. I think at a workshop we could ask if people want to suggest additional hazard labels, but we would want some ready beforehand. Similarly, I think we can ask for feedback on the activities (i.e. how to assign labels), but we would want some activities ready to go.
- How many workshops (1? 2?)
- When should they be? (Would be cool to have one in Data Week if there's time?) (June 11th)
- Maybe we could try an academic and a public-facing version and invite companies through Tech Ethics Bristol?

Make changes to Qualtrics survey

Add a back button
Add an overall Data Hazard label
Remove 'dilemma' from the prompt question under the hazards

Create an initial list of Data Hazards

Using the GitHub project board to come up with an initial list of maximum 12 data hazards that we can put to collaborators to help explain the concept, and eventually test.

Fix all-contributors custom types

Having trouble making custom types work for the all-contributors bot. Described in issue #15, in this comment.

Add Gareth's app!

Add gareth's app
Add gareth to all-contributors
Fix image sizes

Make slides for data hazards html slides

Why

Can easily keep slides here on GH (no need to upload to OSF - automatically sync'd) to make them Open/shareable/accessible.
- Might be especially useful for sharing materials for other people to run data hazards workshops via the website.
Easy for version control/reusing (parts of) slides, etc
Should automatically share the same look (CSS/fonts) as the website
I've been meaning to test this out 🧑‍🎓

How

Could use hieroglyph
- Might need to build website and slides separately with sphinx-build (will need to test), which might mean we need a make file.

Instructions on how to use the hazards

Planning

Make a webpage where people can download the hi res hazard images from
- Gareth Jones said he might give this a go 🙌
Suggest to link to full write-up of why you think each one applies: how?

Doing

How to do a self reflection and how to display (link to Gareth's app #56)
How to use in a workshop overview
- Make slides more general (so anyone could use them to explain it)
- check animation order on case study
- Add recording
How to use for asynchronous review e.g. funding
How to know if your project is suitable
Browse through how different people have used them
- Tell us how you've used "Data Hazards"

Add a licence

We need to add a licence to the repo - thinking CCBY-NC

Ignores or opposes community need -> Lacks community involvement

Fill out qualtrics for first workshop

Especially project sign-ups

Write promotional material

Write summary
Write bios
Send to lilly/Patty

Design the asynchronous process for Data Hazards

I've been thinking about how to make filling in Data Hazards asynchronously easy. This is related to #45, but a little more specific.

Using Data Hazards = LARD.

Learn: find out about the data hazard labels: what are they and what's the purpose?
- Dream: commission a short explainer video
- Or: Record a basic explainer video
- And/Or: update website with this in mind, i.e. minimum you need to know to apply the hazards page
Apply: apply them to your project
- Dream: natively do this inside OSF (#76)
- Or: Provide a web form that people can fill in where they can download their answers at the end.
Reflect: reflect on how other people may apply the labels differently, and how you might prevent some of the outcomes that you and others have identified. Getting other people's feedback is optional, but encouraged.
- Dream: allow other people to ethics-register your project
- Or: provide a mailing list, panel, discussion group, subreddit/whatever, where we apply the labels to registered projects
Display: display your hazard labels in slides, papers, and grant applications:
- to let other people know that you've thought about this stuff, and what steps you've taken to minimise negative outcomes
- to let other people know that when they use the tech/algorithm, they take on the repsonsibility of continuing to ensure that the work is used for everyone's benefit
- Add Gareth's app #56, #45
- Have a section of the website showing how other people have used the.

Videos

Run through hazards - @NatalieThurlby
Case study - @ninadicara

Video aims:

keep to 3 mins each
speed up-able embedded
with transcript

Introductory section

Writing an introductory section of the piece which describes:

The need for this approach
Gaps in existing ethical review processes
What we hope the approach will bring to the data science community

Things to Read

This is a list of reading that we think may be relevant to read/cite when we write up the Data Hazards project into a paper. If you think there's anything else we should be reading, please let us know in a comment below:

Innovating Like an Optimist, Preparing Like a Pessimist: Ethical Speculation and the Legal Imagination by Casey Fiesler (2021)

Webpage for first meeting

Apply for conferences?

It might be nice to take Data Hazards to a conference next year, e.g.

FAccT https://facctconference.org/ march 2022?
- Craft session?! Make your own data hazard label
Metascience https://metascience2021.org/ sept 2022?
Mozfest: https://www.mozillafestival.org/en/proposals/?utm_source=email&utm_medium=newsletter&utm_campaign=2022_mozfest_newsletter&utm_content=cfp_launch - deadline Nov 2nd

To-do:

Have a look and think in November/December?

Alt text

Need to add alt text to hazard labels.

Update 7.03.2022:
There are now alt text labels on all of the individual hazard page labels, but not on the main overview page.

Write call for projects to review

Probably needs to go out in early May JGI newsletter(!)
Is this even possible with ethics?
Maybe need to revisit timeline :/

Illustrations for the hazards

We'd like to develop some images for the hazards once they are mostly decided upon.

Tasks for workshop organising

Tasks for @NatalieThurlby and @ninadicara

By 13th August

Decide tentative dates (we decided 1-3pm on the 21st Sept or 23rd Sept 2022)
Email Ismael to see if he can make those dates
Formally set date (Tuesday 21st Sept 1-3pm)
Email the workshop owners to see if they can make the new date. NB (!) Attach the participant information sheet to this email so that they can read it.

By 20th August

Make Eventbrite with consent sheet built into sign up for attendees - link to participant information sheet.
Send the sign up event to all the previously interested parties.
Send to Lily to advertise the sign up in the JGI newsletter

1 week before workshop

Send project owners the Qualtrics survey to complete (NB this includes the consent form)
Ensure Qualtrics is changed and ready to go (See #47)
Send email to participants
Send email to Ismael

very-good-science / data-hazards Goto Github PK

data-hazards's People

Contributors

Stargazers

Watchers

Forkers

data-hazards's Issues

To-do

Form materials

Conditions for funding

Application form

To-do:

Why

How

Planning

Doing

Videos

To-do:

By 13th August

By 20th August

1 week before workshop

Recommend Projects

Recommend Topics

Recommend Org