Giter Club home page Giter Club logo

docscraper's Introduction

DocuScraper

About

This is a document scraper modeled after the `grep` command used in the linux os cli. It compares the file content from upload user files to regex patterns, declared as either RegExp objects

let regEmail = new RegExp(`${text['0']}[a-zA-Z0-9_.]+@${text['1']}[a-zA-Z0-9_.]+.[${text['2']}]`, 'g');

or regex literals

let regEmail = /\b[a-zA-Z0-9_.]+@[a-zA-Z0-9_.]+.[a-zA-Z0-9_.]+\b/g;

configure Regex params

email addresses

ip addresses

phone numbers

Installation

  • Fork and clone repo to your local machine
  • install dependencies via npm i
  • Read the following scripts to understand their functionality

Scripts

  • creates a development bundle that's served to the browser
npm start
  • creates a production bundle that compresses file content to maximize perfomance
npm run build
  • create a development bundle that runs in the webpack-dev-server, a private server that allows automatic rerendering browser when updating code without the need to refresh the page manually
npm run dev
  • runs all jest tests
$ npm test
  • resets project repo by removing all bundle files and node_modules ! use in case there are dependency conflicts in webpack
$ npm run reset
  • prints a tree of the file structure of the repo if you need to find a file
$ npm run printDir
./
├── AWS.config.json
├── README.md
├── __tests__
│   ├── __snapshots__
│   │   └── snapshot.test.js.snap
│   ├── api.test.js
│   ├── data
│   │   ├── sorted.txt
│   │   ├── test.txt
│   │   └── test2.txt
│   ├── fileToArray.test.js
│   ├── snapshot.test.js
│   └── sort.test.js
├── babel.config.json
├── client
│   ├── dist
│   │   ├── bundle.js
│   │   ├── bundle.js.LICENSE.txt
│   │   ├── index.html
│   │   ├── styles.css
│   │   └── styles.scss
│   └── src
│       ├── App.js
│       ├── Landing.js
│       ├── OptionModal.js
│       ├── Options.js
│       ├── components.js
│       └── index.js
├── component.js
├── jest.config.js
├── package-lock.json
├── package.json
├── server
│   ├── controllers
│   │   ├── AWS.js
│   │   ├── grep.js
│   │   └── sort.js
│   ├── routes
│   │   ├── getFiles.js
│   │   ├── grepFiles.js
│   │   └── test.js
│   └── server.js
└── webpack.config.js

docscraper's People

Contributors

lawsan92 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.