Giter Club home page Giter Club logo

awesome-machine-learning-on-source-code's Introduction

Awesome Machine Learning On Source Code Awesome Machine Learning On Source Code CI Status

Awesome Machine Learning On Source Code

A curated list of awesome research papers, datasets and software projects devoted to machine learning and source code. #MLonCode

Contents

  • Posts
  • Talks
  • Software
  • Datasets
  • Credits
  • Contributions
  • License
  • Digests

    Conferences

    Competitions

    • CodRep - competition on automatic program repair: given a source line, find the insertion point.

    Papers

    Program Synthesis and Induction

    Source Code Analysis and Language modeling

    Neural Network Architectures and Algorithms

    Embeddings in Software Engineering

    Program Translation

    Code Suggestion and Completion

    Program Repair and Bug Detection

    APIs and Code Mining

    Code Optimization

    Topic Modeling

    Sentiment Analysis

    Code Summarization

    Clone Detection

    Differentiable Interpreters

    Related research

    AST Differencing

    Binary Data Modeling

    Soft Clustering Using T-mixture Models

    Natural Language Parsing and Comprehension

    Posts

    Talks

    Software

    Machine Learning

    • Differentiable Neural Computer (DNC) - TensorFlow implementation of the Differentiable Neural Computer.
    • sourced.ml - Abstracts feature extraction from source code syntax trees and working with ML models.
    • vecino - Finds similar Git repositories.
    • apollo - Source code deduplication as scale, research.
    • gemini - Source code deduplication as scale, production.
    • enry - Insanely fast file based programming language detector.
    • hercules - Git repository mining framework with batteries on top of go-git.
    • DeepCS - Keras and Pytorch implementations of DeepCS (Deep Code Search).
    • Code Neuron - Recurrent neural network to detect code blocks in natural language text.
    • Naturalize - Language agnostic framework for learning coding conventions from a codebase and then expoiting this information for suggesting better identifier names and formatting changes in the code.
    • Extreme Source Code Summarization - Convolutional attention neural network that learns to summarize source code into a short method name-like summary by just looking at the source code tokens.
    • Summarizing Source Code using a Neural Attention Model - CODE-NN, uses LSTM networks with attention to produce sentences that describe C# code snippets and SQL queries from StackOverflow. Torch over C#/SQL
    • Probabilistic API Miner - Near parameter-free probabilistic algorithm for mining the most interesting API patterns from a list of API call sequences.
    • Interesting Sequence Miner - Novel algorithm that mines the most interesting sequences under a probabilistic model. It is able to efficiently infer interesting sequences directly from the database.
    • TASSAL - Tool for the automatic summarization of source code using autofolding. Autofolding automatically creates a summary of a source code file by folding non-essential code and comment blocks.
    • JNice2Predict - Efficient and scalable open-source framework for structured prediction, enabling one to build new statistical engines more quickly.
    • Clone Digger - clone detection for Python and Java.
    • Sensibility - Uses LSTMs to detect and correct syntax errors in Java source code.
    • DeepBugs - Framework for learning bug detectors from an existing code corpus.
    • DeepSim - a deep learning-based approach to measure code functional similarity.
    • rnn-autocomplete - Neural code autocompletion with RNN (bachelor's thesis).

    Utilities

    • go-git - Highly extensible Git implementation in pure Go which is friendly to data mining.
    • bblfsh - Self-hosted server for source code parsing.
    • engine - Scalable and distributed data retrieval pipeline for source code.
    • minhashcuda - Weighted MinHash implementation on CUDA to efficiently find duplicates.
    • kmcuda - k-means on CUDA to cluster and to search for nearest neighbors in dense space.
    • wmd-relax - Python package which finds nearest neighbors at Word Mover's Distance.
    • Tregex, Tsurgeon and Semgrex - Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").
    • source{d} models - Machine Learning models for MLonCode trained using the source{d} stack.

    Datasets

    Credits

    Contributions

    See CONTRIBUTING.md. TL;DR: create a pull request which is signed off.

    License

    License: CC BY-SA 4.0

    awesome-machine-learning-on-source-code's People

    Contributors

    aby0 avatar bdqnghi avatar bzz avatar campoy avatar changlinzhang avatar dbeezt avatar eddieantonio avatar egorbu avatar eiso avatar ferhatelmas avatar filipefilardi avatar geilehirnbude avatar jjhenkel avatar m09 avatar mallamanis avatar marnovo avatar martinezmatias avatar maskys avatar mcuadros avatar ricardobaeta avatar shilinhe avatar smola avatar soundug avatar sridhareaswaran avatar tbennun avatar thecodingaviator avatar timmolderez avatar todpole3 avatar vmarkovtsev avatar zurk avatar

    Watchers

     avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.