Giter Club home page Giter Club logo

data-science-playground's People

Contributors

amanduhrr avatar cc-creativecommons-github-io-bot avatar charlineshen avatar flalam avatar gituxedo avatar kgodey avatar mathemancer avatar michaelrenmr avatar shubhamcs162 avatar sohompaul avatar timidrobot avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science-playground's Issues

Discussion: Project to Quantify the Commons

Background:

Creative Commons has submitted a project to UMSI and they have determined that this project is a potential fit for the course SI 485: Information Analysis Capstone and Final Project. In this course, advanced undergraduate students deliver data-oriented solutions through the development and analysis of data sets, building tools to extract useful information for clients through manipulation, analysis and visualization. This ticket is intended for discussion of the project, with the goal of refining the potential questions we'd like answered and getting input from those who have considered this challenge in the past.

Project General Information

Project Idea:

Creative Commons (CC) seeks to quantify the use of CC legal tools (works in the commons). CC legal tools include the licenses (e.g. CC BY, CC BY-NC-SA) and public declarations e.g. CC0, PDM). This project would include data collection, analysis, and visualization.

Potential questions to be answered:

  • How many works are in the commons?
  • What can we determine from the rate of change?
  • How can those works be characterized (e.g. by legal tool, region, language)?
  • How can the data be managed to allow future trend analysis (e.g. which languages saw the largest growth in legal tool adoption)?
  • How can the use of CC legal tools be meaningfully visualized?

Developing reproducible methodologies for gathering information about the use of CC legal tools will help CC communicate its impact, support policy work (at all levels of government and within institutions), and support the wider community.

Full Description

Creative Commons (CC) seeks to quantify the use of CC legal tools (works in the commons). CC legal tools include the licenses (e.g. CC BY, CC BY-NC-SA) and public declarations (e.g. CC0, PDM). This project would include data collection, analysis, and visualization.

First, this project should create reproducible processes or methodologies for creating a dataset of information about works that are CC licensed or dedicated to the public domain. The dataset may be built from platform APIs (e.g. Flickr), Common Crawl data, etc. The project should create a starting place not only for the project itself, but future efforts to extend the dataset and the meaning derived from it.

Second, the project should begin to create meaning from the dataset. How many works are currently in the commons? How has that changed/trended? How can those works be characterized (e.g. by legal tool, region, language)? How can the data be managed to allow future trend analysis (e.g. which languages saw the largest growth in legal tool adoption)?

Third, optionally, how can the data be visualized to communicate meaning and allow exploration?

Project Outcome

  • What deliverable(s) would students produce and share with your organization as a result of this project?
  • How do you plan to use the feedback, recommendations, or product you receive from the student team?

Students should create reproducible processes or methodologies for creating a dataset, the resulting dataset, and analysis. Optionally students may create visualizations of the dataset.

The processes, dataset, and analysis will help Creative Commons communicate its impact, support policy work (at all levels of government and within institutions), and support the wider community.

What do students need for this project to be successful?

Examples: skills needed, social impact orientation, interest or experience in a specific field/domain/industry.

Curiosity, motivation, proficiency in a programming language that can be used to query APIs and manipulate data (e.g. JavaScript, Pearl, Python, Ruby; Python is preferred), and a recognition of the value of open knowledge.

Data Proposal Information

Data Set

We expect students to create a new data set for us

Size of Data Set

How big is the data set? Approximately how many rows and columns does it have?

Between 200 million and 2 billion rows with 10 columns. The last effort to quantify the commons in 2017 estimated 1.4 billion works. I expect metadata can be discovered on at least 200 million works. Columns could include: URL, author, date, legal_tool, language, reference_count, etc.

Findings from Data Set

What do you want to learn from your data set? Please share 3-5 specific questions that the data can help solve:

How many works are in the commons?
What can we determine from the rate of change?
How can those works be characterized (e.g. by legal tool, region, language)?
How can the data be managed to allow future trend analysis (e.g. which languages saw the largest growth in legal tool adoption)?

Data Availability, Type, Format

No dataset currently exists and CC has not made a recommendation on format. Input is welcome on this subject.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.