Data Hazards is a project to find a shared vocabulary for talking about worst-case scenarios of data science - and to use that vocabulary to help people understand and avoid Data Hazards.
As I'm writing some hazards, thinking about how it would be cool to showcase projects and people working on directly addressing the hazards in their areas as a bit of inspiration, or to help people looking for ways to do that.
Putting it as an issue so I don't forget, but likely to be an enhancement for the longer term.
We originally thought of data hazards as an opinion piece about a missing part of Data Ethics ๐ค , but our enthusiasm is turning it into something more solutions-based.
Maybe not, but it might be sensible to break this up into smaller more bite-sized pieces. Maybe we can ask Tessa? From our reading so far, it seems like pointing out a problem is publishable on it's own.
Step 1: Opinion article. Hey! We've noticed this problem through reading these papers! This is one part of the puzzle that especially needs an intervention. This is one part of the problem that is especially resistant to interventions! Let's imagine worst case scenarios! Here is how other industries do this! Let's draw some attention to this area and call for solutions!
Step 2: Presenting data hazards, we made them, we tested them, this is how people felt about them and what was gained, here's all the materials! We're using them for seedcorn funding! We'd love it if you used them for your work/your funding/your journal. And we can help you use them through this mechanism.
Basically want a blue info box on the home page that says what the purpose of Data Hazards is, setting the tone for:
Not meant to scare, but to catch your eye and to signal that ethics is serious and important. Ideally, would compel people to respect ethics as a crucial part of the data science process. Just as handling chemicals safely is crucial for working with them, considering the impact of your work is always important.
"Handle with care": a chance to for researchers to let potential users know that the responsibility for how this work is used doesn't start and end with the researcher, but it continues as the technology matures with each person who uses it.
Meet with Patty/Lily/Chiara (depending on who's organising) to decide how Data Hazards will feature in the seedcorn
what format they'd like to do it (e.g. self-assessed by applicants web form)
when that needs doing by
Design a survey for applicants to go out in the acception/rejection emails asking for their thoughts on the Data Hazards ethics part of the process. Presumably this will require us to update our ethics application.
Write the form
Ask successful applicants if they'd be willing to complete the Data Hazards process (tickbox)
decide: asynchronous process OR workshop (decided: asynchronous )
Add page to the website, explaining the seed corn trial
Add Contents:
The purpose of applying data hazards
Which types of projects are relevant to apply hazards to
What the process will look like for them
Only inviting successful projects this time
Links to more context about the project
To contact me if they have any questions
The purpose of this trial
So, far, researchers have found the process to be useful, and we think it will be for you, too.
They'll have an opportunity to feedback
It won't affect their application if they choose not to take part
Update link from the timeline to the new materials
Get Lily to mirror the page on the JGI website as that will be a more official link for the applicants
Write email text for successful/unsuccessful applicants
Update survey for asynchronous materials
Update Eng faculty ethics materials for asynchronous/seed corn process
Form materials
Conditions for funding
We strongly encourage all participants to consider the ethical and societal implications of their work. Successful projects will have the opportunity to take part in a short, guided Data Hazards self-assessment for ethics in data science to assist with this. You can read about the Data Hazards project here.
Application form
If successful, would you take part in a Data Hazards guided ethics self-assessment exercise? This is relevant to any project that involves statistics, AI, ML, algorithms, or data collection.
YES / NO ย
Just mulling over the definition of the 'May cause direct harm' label.
It's definition is that it is physical harm - my question is do we want to restrict this to physical harm?
As an example, I'm thinking about deep fake generators which could cause a lot of psychological harm, or be used for various illegal purposes.
It might be that that fits better elsewhere but interested in your thoughts! I can see the appeal of having a label restricted just to physical harm - in which case would it be better to change the title of the label to 'may cause physical harm' to be explicit?
In the spirit of all-contributors, it would be cool if on the website and repository we can credit people for their involvement in the project. e.g. peer review, feedback, ideas - would link in nicely too if we'd like people to be able to suggest changes to the hazards via GH issues.
We want to make sure this idea gets feedback from the right people. How can we do this most effectively?
Idea:
Write a short summary of our plans, almost like seed corn application, with our basic ideas (#1 , #2, #3)
Circulate that to (some subset of) people to review, to begin to plan for a workshop where we will invite the others to test the actual implementation.
Have a workshop where we go through some of the above. Invite the other people to review, also invite Data Ethics Club attendees. Invite people with problems separately so that we can look at the problems first. Basically need to decide on what needs to be asked/decided before workshop, and what can be developed through workshop, i.e. I think at a workshop we could ask if people want to suggest additional hazard labels, but we would want some ready beforehand. Similarly, I think we can ask for feedback on the activities (i.e. how to assign labels), but we would want some activities ready to go.
How many workshops (1? 2?)
When should they be? (Would be cool to have one in Data Week if there's time?) (June 11th)
Maybe we could try an academic and a public-facing version and invite companies through Tech Ethics Bristol?
Using the GitHub project board to come up with an initial list of maximum 12 data hazards that we can put to collaborators to help explain the concept, and eventually test.
Or: Provide a web form that people can fill in where they can download their answers at the end.
Reflect: reflect on how other people may apply the labels differently, and how you might prevent some of the outcomes that you and others have identified. Getting other people's feedback is optional, but encouraged.
Dream: allow other people to ethics-register your project
Or: provide a mailing list, panel, discussion group, subreddit/whatever, where we apply the labels to registered projects
Display: display your hazard labels in slides, papers, and grant applications:
to let other people know that you've thought about this stuff, and what steps you've taken to minimise negative outcomes
to let other people know that when they use the tech/algorithm, they take on the repsonsibility of continuing to ensure that the work is used for everyone's benefit
This is a list of reading that we think may be relevant to read/cite when we write up the Data Hazards project into a paper. If you think there's anything else we should be reading, please let us know in a comment below:
Decide tentative dates (we decided 1-3pm on the 21st Sept or 23rd Sept 2022)
Email Ismael to see if he can make those dates
Formally set date (Tuesday 21st Sept 1-3pm)
Email the workshop owners to see if they can make the new date. NB (!) Attach the participant information sheet to this email so that they can read it.
By 20th August
Make Eventbrite with consent sheet built into sign up for attendees - link to participant information sheet.
Send the sign up event to all the previously interested parties.
Send to Lily to advertise the sign up in the JGI newsletter
1 week before workshop
Send project owners the Qualtrics survey to complete (NB this includes the consent form)
Ensure Qualtrics is changed and ready to go (See #47)