Giter Club home page Giter Club logo

pyazblob's Introduction

PyAzBlob

Python tool to upload files into Azure Storage Blob Service from local file system.

Disclaimer

AzCopy is the official tool from Microsoft that, among many other things, implements bulk upload of files from local file system to Azure Storage Blob Service. PyAzBlob is a simple console application created in few hours, mostly for fun and to practice with Microsoft Azure Storage SDK for Python. However, it does implement a couple of features that I find useful for my personal use, that are not available in AzCopy.

  _____                     ____  _       _                
 |  __ \         /\        |  _ \| |     | |               
 | |__) |   _   /  \    ___| |_) | | ___ | |__             
 |  ___/ | | | / /\ \  |_  /  _ <| |/ _ \| '_ \            
 | |   | |_| |/ ____ \  / /| |_) | | (_) | |_) |           
 |_|    \__, /_/    \_\/___|____/|_|\___/|_.__/            
         __/ |                                             
        |___/                                              

Features

  • user friendly console application with integrated help
  • recursive upload of files, keeping the same folder structure of local file system
  • definition of ignored files by Unix-style glob patterns
  • logs uploaded files one by one, to skip re-uploading same files to same Azure Storage container in following runs
  • supports definition of Azure Storage keys inside environmental variables or in .ini file
  • two implementations: event-based (asynchronous) implementation and synchronous implementation, described below under Branches

Branches

This repository has two branches, with two implementations of the application:

The async version requires a Shared Access Signature (SAS) from a storage account, whereas the sync version requires storage account name and an administrative key.

The asynchronous version offers best performance, especially for small files in big number. Performance tests showed the asynchronous version to be about 7 times faster than the synchronous implementation; using a single thread in both cases.

This branch

This branch (master) contains the synchronous implementation, using the official Microsoft Azure Storage SDK for Python, which automatically handles chunked upload of files greater than 64MB.

Requirements

  • Python 3.4 =>
  • Azure Storage

How to use

  1. Download or clone this repository
# clone repository:
git clone https://github.com/RobertoPrevato/PyAzBlob.git

2. Create Python virtual environment and restore dependencies

# Linux:
python3 -m venv env

env/bin/pip install -r requirements.txt
# Windows:
py -3 -m venv env
env\Scripts\pip install -r requirements.txt

3. Activate Python virtual environment (Optional)

# Linux:
source env/bin/activate
# Windows:
env\Scripts\activate.bat

4. Configure the Azure Storage

Configure Azure Storage account name and key in file settings.ini, which is read by Python console application when running the script. Key and name are used only by official Microsoft Azure Storage SDK for Python, as can be verified in source code.

Recommendations: if you are creating an Azure Storage for backups, use Standard performance and LRS (Locally Redundant Storage). Make sure to use Private containers if you want your data to be kept private.

4.1 Useful links

Storage account name and settings can be found in the Azure Portal under Settings > Access keys.

Azure Storage Settings

5. Run the console application

If the environment was activated, use "python"; otherwise: env\bin\python in Linux or env/Scripts/python in Windows.

# display the help:
python pyazblob.py -h

Help

Example: upload all files from /home/username/Pictures/ recursively, and keeping folder structure starting from /Pictures/:

python pyazblob.py -p /home/username/Pictures/ -c /home/username/ -r

Upload all files from C:\Users\username\Documents\ recursively, keeping folder structure starting from \Documents\:

python pyazblob.py -p C:\Users\username\Documents\ -c C:\Users\username\

Bulk upload

Configuration options

  • define ignored file paths (Unix-style globs) using .pyazblobignore file
  • define Azure Storage key, name and destination container name using settings.ini file, or following environmental variables
Name Scope
PYAZ_ACCOUNT_NAME account name
PYAZ_ACCOUNT_KEY account key
PYAZ_CONTAINER_NAME container name

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.