Giter Club home page Giter Club logo

download-file's Introduction

File Download

The file download tool to download multiple files from the remote servers to local machine.

Tools and Language

  • Python version: 3.7

activating dev environment: source ./dev/bin/activate

deactivating: deactivate

Installing dependencies: pip install -r requirements.txt

Linting code: pylint

create ftp password and put into folder ftp/passwd to run from localhost

Notes

  • Supported protocols: HTTP(S) and FTP. There's a downloader.py file that can help to add new protocols easily.
  • To avoid memory issue when the file is big, I use the streaming mechanism.
  • To support download multiple files, I use the thread pool executor, so the requests can run in parallel in multiple cores machine.
  • To avoid partial data, any file that is raised an exception while downloading will be removed.

Things to improve

  • Handling unique file name better. I only grab the file name by the last part of url. There are some ways to improve:

    • grouping files to folders by the domains from the urls.
    • splitting the url and concatenate to the file name.
  • Allowing a user to input the destination folder. I designed the code to have an option to set the destination folder, if I can spend more time, I will handle the destination folder from user input. Currently, a downloaded file is saved to folder files inside the source code.

  • Guessing file type if the url does not have an extension of the file. There is a library named python-magic that can help to detect the file type by content of the file, but the supported file types are limited.

  • Adding integration tests.

download-file's People

Contributors

hoangnm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.