Giter Club home page Giter Club logo

analyze-github-pull-requests's Introduction

Natural Language Processing : Analyzing GitHub Pull Requests

Context

The dataset github_comments.tsv that carries 4000 comments that were published on pull requests on Github by developer teams.

Here is an explanation of the table columns:

  • Comment: the comment made by a developer on the pull request.
  • Comment_date: date at which the comment was published
  • Is_merged: shows whether the pull request on which the comment was made has been accepted (therefore merged) or rejected.
  • Merged_at: date at which the pull request was merged (if accepted).
  • Request_changes: each comment is labelled either 1 or 0: if it’s labelled as 1 if the comment is a request for change in the code. If not, it’s labelled as 0.

The goal is to dig deeper into the nature of blockers and analyze the requests for change. If possible, try to answer the following questions:

  • What are the most common problems that appear in these comments?
  • Can we cluster the problems by topic/problem type?
  • How long is the resolution time after a change was requested?

Content

  • Report.pdf is a PDF report that details my approach.
  • images is a collection of the images that I included in my report
  • TopicModelling.ipynb is a Jupyter Notebook in which I have do my analysis in Python
  • corpus.pkl, dictionary.gensim, and all files starting with model… are files generated in the notebook that I use to avoid re-running some steps.

Theory covered

This project covers the concepts of :

  • Topic Modelling using LDA
  • Clustering through tf-idf and BoW
  • Dimension reduction through t-SNE and truncated SVD
  • Classification and Regression algorithms

analyze-github-pull-requests's People

Contributors

maelfabien avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.