Giter Club home page Giter Club logo

multinational-retail-data-centralisation's Introduction

Multinational Retail Data Centralisation

Table of Contents

  1. Introduction
  2. Installation instructions
  3. Usage instructions
  4. File structure
  5. License information

Introduction

Multinational Retail Data Centralisation is a project to pull various sources of data that corresponds to retail business and collate them into a single database. Data sources are a variety of formats (PostrgreSQL database, PDF, CSV, JSON) hosted on AWS. Accessing data is done through a variety of methods to test different tools.

Installation instructions

To install this Multinational Retail Data Centralisation project clone and enter the repository

git clone https://github.com/Ciaran-Mu/multinational-retail-data-centralisation.git
cd multinational-retail-data-centralisation

Before running the program, credentials for local and remote database access must be created in a secret file (not included in this repository) called db_creds.yaml. This credentials file should contain the following:

RDS_HOST: #HOST
RDS_PASSWORD: #PASSWORD
RDS_USER: #USER
RDS_DATABASE: #DATABASE
RDS_PORT: #PORT

LOCAL_HOST: #HOST
LOCAL_PASSWORD: #PASSWORD
LOCAL_USER: #USER
LOCAL_DATABASE: #DATABASE
LOCAL_PORT: #PORT

Usage instructions

Run the main file

python main.py

Note that some functionality will not be accessible without the correct database credentials which are in a secret file db_creds.yaml.

The program will run through the sections of extracting data, cleaning data, uploading to new local database and finally querying the database.

The following is the typical output of the program in the terminal.

Screenshot of Output 1 Screenshot of Output 2 Screenshot of Output 3

File structure

database_utils.py contains a classes 'DatabaseConnector' and 'DatabaseModifier' which initiate a connection to a database and modify an existing database, respectively.

data_extraction.py contains a class 'DataExtractor' which contains methods for extracting data from multiple different sources.

data_cleaning.py contains a class 'DataCleaning' which contains static methods for performing the data cleaning steps.

data_querying.py contains a class 'DataQuerier' which contains static methods for a set of specific SQL queries to be sent to a database.

main.py contains the main program which calls all the relevant methods of the classes defined within the above python files.

License information

MIT License

multinational-retail-data-centralisation's People

Contributors

ciaran-mu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.