Giter Club home page Giter Club logo

jumia_market_scraper's Introduction

License: MIT Replit Badge Contributor Covenant Python 3.11 Build status Coverage Status

Jumia_API-like Data Scraping with Selenium and Flask

Bot must follow this rules Site scaping is permited IF the user-agent is clearly identify it as a bot and the bot owner and is using less than 200 request per minute Bot identification must have a owner url or contact if we need to contact them Bots with fake user-agent will be blocked Bots trying to use too many IPs to increase performance may also be blocked. If you need more than 200 RPM, please contact the email techops at jumia com jumia robots.txt

This project involves building an API that can scrape data from a specific webpage, similar to how the Jumia website is scraped for product information. It utilizes the Selenium library for web scraping and the Flask microframework for building the API.

How it Works:

Web Scraping with Selenium:

  • Selenium is used to automate the process of opening a web browser, navigating to the desired webpage (e.g., a Jumia search results page), and extracting the necessary data.

Flask:

  • Flask is used to create an API endpoint that allows external applications to access the scraped data.
  • The API endpoint can be called with specific parameters, such as the search query or product category, to retrieve the relevant data.

Valuable Information Retrieved:

The data scraped from the Jumia search results page typically includes the following valuable information:

  • Product title

  • Product price

  • Product image URL

  • Product rating

Example Usage:

To use the API, a client application can make a request to the API endpoint with the desired parameters. For instance, to scrape data for products related to "mobile phones," the client application would send a request to the endpoint with the search query "mobile phones."

Upon receiving the request, the API would invoke Selenium to scrape the Jumia search results page and extract the relevant product information. This information would then be returned to the client application in a structured format, such as JSON.

Requirements

This package requires the following to run:

Installation

First you have to clone the repo by writing the following code

Clone the git rep

Change directory to jumia python web scraper cd in to the repo or open it with a text editor. Because that's where the main python file is (main.py) Activate your Virtual Environment (venv)

Then run

pip install -r requirements.txt

Usage

Then run the python file

python main.py

Endpoints:

  • GET / - Homepage (This page)
  • GET /product_name/{number_of_page} - Scrapes products from Jumia based on page number
  • GET /product_name/{discount_percentage}/{number_of_page} - Scrapes products with a discount percentage from Jumia based on page number
  • Example:

    localhost/get_all/phones/2
    2 here is the number of page to scrape

    Contribution

    You can contribute to this project. To contribute to this project, clone repo locally and commit your code on a seperate branch.

    You can also reach me via email me or better yet, shoot me a twitter DM.

    license

    MIT license.

jumia_market_scraper's People

Contributors

arinze1020 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.