Giter Club home page Giter Club logo

fass's Introduction

FASS - FastAPI - Selenium - Scraper

Latest Tag Build Status License Docker Pulls Docker Image Size

This simple server enables scraping of website with dynamic content. It exposes the parser via rest API: http://localhost:8000/parse and accepts POST in the form of, e.g.

curl -X POST "http://localhost:8000/parse/" -H "Content-Type: application/json" -d '[
            {
                "url": "https://github.com/pymzml/pymzML/",
                "name": "Github stars",
                "delay": "1",
                "patterns": [
                    {
                        "name": "Star Counter",
                        "regex": "Counter js-social-count\\\">(?P<Stars>[0-9]*)</span>"
                    }
                ]
            }
        ]'

the payload contains a list of websites to scrape, each containing the url a name, delay in seconds and patterns. The two first kwargs are self explenatory, the delay parameters defines how many seconds the selenium driver should wait until the page is scraped. The pattern represent a list of entities to extract from the page, defined by Python regex expression and a name which will be used in the returned json.

The example above return:

{
    "name":"Github stars",
    "all_fields_matched":true,
    "Star Counter":["154"]
}

Please note that the matched values are always a list since we match all occurences on page. If multiple Python regex groups are defined, the returned list will contain tuples.

Installation

From source

Clone this repo and

docker build -t fass_app .

From Docker hub

docker pull zerealfu/fass:latest

Running the service

docker run -d -p 8000:8000 fass_app

then execute the curl for example:

curl -X POST "http://localhost:8000/parse/" -H "Content-Type: application/json" -d '[
            {
                "url": "https://github.com/pymzml/pymzML/",
                "name": "Github stars",
                "delay": "1",
                "patterns": [
                    {
                        "name": "Star Counter",
                        "regex": "Counter js-social-count\\\">(?P<Stars>[0-9]*)</span>"
                    }
                ]
            }
        ]'

Have fun :)

fass's People

Contributors

fu avatar

Stargazers

Derek Armstrong avatar  avatar Gabriel avatar  avatar r4lix avatar  avatar  avatar Greg Burke avatar Benbou avatar Martin avatar Jameel avatar Kristoffer Braa avatar FSCorrupt avatar Luca Weidmann avatar Dan Kuehling avatar Nate Harris avatar Matias Micheltorena avatar aRandomSteve avatar chosenpath avatar Bradford Gibbons avatar

Watchers

Dan Kuehling avatar  avatar

fass's Issues

Use `latest` tag for Docker image

Someone mentioned this project on a Reddit thread the other week, and I noticed there wasn't an Unraid template for it, so I made one (should be available on the Unraid Community Apps store within a few hours). To make maintaining the template easier, can you please use the latest tag for the latest version of each Docker container you make (in addition to the usual explicit version number), so that users only have to target latest rather than needing to know the version number.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.