Giter Club home page Giter Club logo

atomizer's Introduction

Atomizer

Build Status

Atomizer is an experimental static analysis tool for Erlang programs.

The Erlang compiler can warn you if you make a typo in a function name or a variable. Atomizer is an experiment in creating a tool that can warn about typos in Erlang atoms. Dialyzer might be able to detect misfit atoms in some cases, but its powers are limited. For example, it can never find a misfit atom in a pattern matching.

Given an Erlang program, atomizer does the following.

  1. Parse all *.erl files and extract all the atoms from them.
  2. Compute the normal form of each atom.
  3. Find all pairs of distinct atoms with the same normal form.
  4. Filter out all pairs of distinct atoms that are likely false positives.
  5. For each pair of distinct atoms guess which of them is likely the typo.

The most important parts of this algorithm, i.e. computation of the normal form, filtering false positives and guessing the typo, are heuristical. Finding a satisfying balance between sensitivity and precision for them is a challenge. The current implementation of the heuristics considers an atom misfit if it is spelled with an outlier casing, e.g. ip_address instead of ipAddress.

An atom is in the normal form when it is all lowercase and does not contain dashes and underscores. This definition ensures that atoms spelled in camelCase, snake_case, SCREAMING_SNAKE_CASE, kebab-case and their combination all have the same normal form.

A pair of distinct atoms with matching normal forms is only reported if all of the following is true.

  • One of the atoms occurs in the program at least n times more often than the other (n is arbitrarily chosen to be 4).
  • The atom with fewer occurrences only occurs in a single .erl file.
  • The atom with more occurrences occurs in the same file as the other one.

In a pair of distinct atoms with matching normal forms, the atom with the fewest occurrences in the program is considered to be the typo.

Atomizer is designed to be run on a large codebase. It utilizes concurrency for traversing the file system, parsing, computing normal forms and filtering.

Atomizer with the above choice of heuristics was able to find several bugs in a large (5MLOC) legacy Erlang application at Ericsson. The success was largely due to the fact that the application included a lot of machine-generated Erlang code from ASN.1 definitions. This generated code had a hodgepodge of naming conventions in the atoms that spead to the rest of the codebase.

Compile with make, run ./bin/atomizer.

atomizer's People

Contributors

aztek avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.