Giter Club home page Giter Club logo

crashanalysistool's Introduction

ProSeries Crash Analysis Tool (CAT)

CAT is a tool for offline crash report analysis. CAT allows faster, more precise queries of ProSeries crash data. It includes a quickbase crash report downloader, xml parser, Pandas dataframe helper functions, and some text analysis tools.

If you are new to Python 31, Jupyter notebooks2, or Pandas3, check out the references section.

After installation, check out CrashAnalysisTour.ipynb to see examples of the commands available with CAT. Copy and rename ExampleNotebook.ipynb to create a new crash report.

Installation

Prerequisite Software

Python 3 is required (3.5+ preferred). We recommend installing python with Anaconda.

PyCharm or the Visual Studio Python Plugin is recommended for editing the crash_analysis library, but not required.

Git is also required for editing the crash_analysis library.

Set Up

In a terminal or command prompt, do the following:

  1. Download this repository: git clone https://github.intuit.com/arosengarten/CrashAnalysisTool.git. If you don't have git installed, this can be downloaded from the repo webpage by clicking the "Clone or Download" button and selecting "Download Zip". However, if you don't use git/clone the repo, you will not be able to make lasting changes to the tool.

  2. Go inside the directory: cd CrashAnalysisTool. If you downloaded the zip file, extract it and go inside that directory.

  3. (Recommended) Create a virtual environment: conda create --name cat35 python=3.5. Otherwise, ensure that Python 3.5+ is your default python installation.

  4. (Recommended) Activate the virtual environment (of python 3.5): source activate cat35 for OSX/Linux, or activate cat35 for Windows.

  5. Install required python packages: pip install -r requirements.txt

  6. Open or create crash_analysis/private.py and input the database id, username, password, and app token as strings. See internal ProSeries wiki for details.

(Optional) Developer Setup

For contributing to the crash_analysis library, it is recommended that you install extra python packages. Activate your cat python environment (step 4 in Set Up) and from the CrashAnalysisTool directory, run the following commands:

pip install -r crash_analysis/module_requirements.txt
pip install -r crash_analysis/dev_requirements.txt
  • module_requirements.txt include packages such as sci-kit learn and gensim, which are necessary for the machine-learning modules in the library (not currently publicly accessible).
  • dev_requirements.txt include packages that promote higher code quality, namely a python linter (flake8/hacking) and type checker (mylang).

Creating crash reports

  1. Open a command prompt or terminal inside the CrashAnalysisTool directory on your machine.

  2. In the command prompt or terminal, start the jupyter notebook: jupyter notebook

  3. A browser window should open up. Open src/ExampleNotebook.ipynb, copy it (File > Make A Copy...), and begin crash reporting!

References

  1. Learn X in Y Minutes where X = Python 3

  2. Jupyter Notebook Quickstart

  3. 10 minutes to pandas

Latest Changes

Oct 26, 2017

  • Revised documentation (this readme, docstrings in lib, and explicit comments in the example notebook)
  • Added types and doctests to a few modules.
  • Added dev requirements

April 14, 2017

  • Added quickbase downloader that can download crashed by time range in parallel
  • Curated ExampleNotebook and CrashAnalysisTour
  • Completely upgraded to Python 3
  • (Finally) started writing documentation

TODO

  • Finish adding type annotations
  • Use hacking/flake8 to lint project, make sure it adheres to community style guide
  • Refactor/Gut analysis.py, which hasn't been used in a while in the first place.
  • Add unit tests (specifically to parser.py, downloader.py, and maybe analysis.py)
  • Finish adding doctests (specifically to preprocess.py). dataframe_helper.py is fully doctested.
  • (optional) Create sphinx documentation for project (put in root/docs/ directory)
  • (optional) Reorganize modules into subpackages (e.g. parser.py, quickbase.py, and downloader.py could go into a download sub-package)
  • (reach) Rehash document clustering investigation (see kmeans.py and lda.py). Maybe with more time and effort, ML could be useful for crash analysis.
  • (reach) Refactor downloader subpackage to live update data into an AWS database. Refactor notebooks to get data from AWS DB instead of manually downloading files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.