Giter Club home page Giter Club logo

veeam_python_developer_test's Introduction

Veeam_Python_Developer_Test

This is an assessment for a job position at Veeam as a Python Developer in QA

The Challenge

Please implement a program that synchronizes two folders: source and replica. The program should maintain a full, identical copy of source folder at replica folder. Solve the test task by writing a program in one of these programming languages:

  • Python
  • C/C++
  • C#
  1. Synchronization must be one-way: after the synchronization content of the replica folder should be modified to exactly match content of the source folder;

  2. Synchronization should be performed periodically.

  3. File creation/copying/removal operations should be logged to a file and to the console output;

  4. Folder paths, synchronization interval and log file path should be provided using the command line arguments;

  5. It is undesirable to use third-party libraries that implement folder synchronization;

  6. It is allowed (and recommended) to use external libraries implementing other well-known algorithms. For example, there is no point in implementing yet another function that calculates MD5 if you need it for the task โ€“ it is perfectly acceptable to use a third-party (or built-in) library.

The Final Solution

In my final approach I decided to implement my own version of the main functionality I need from filecmp.

For that I created a Comparer class that compare two paths and produces lists for different comparison scenarios that I can in turn use to sync based on each scenario.

  1. source_only

    • List of the content names that are only in the source folder
  2. replica_only

    • List of the content names that are only in the replica folder
  3. common_dirs

    • List of directory names that are common between source and replica
  4. diff_files

    • List of file names that are common between source and replica and have changed.
    • Uses file name, last modified time and file size for a first comparison to avoid the memory consuming hashing function.
    • Uses the md5 hashing function from hashlib if any of the previous checks fail.
    • This comparison approach makes a tradeoff between security for performance, a production grade tool might have to focus more on security or provide a flag for the user to choose between which approach best suits him.

With that I can implement the Synchronizer class in the same way when using the filecmp lib in the naive solution

The Synchronizer class

class Synchronizer:
    """
    Synchronizer class to sync the source and replica folders

    @param source: pathlib.Path 
        Path to the source folder
    @param replica: pathlib.Path 
        Path to the replica folder
    @param logger: logging.Logger 
        Logger object responsible for logging the actions to a file and to stdout

    All methods are only performed in root level of the source and replica folders, thats why there is a recursive call to the synchronize method in the search_child_folders method.

    @method add_missing_in_replica: Search for files and folders not present in replica but present in source and copy it to replica
    @method remove_extra_in_replica: Search for files and folders not present in source but present in replica and remove it from replica 
    @method sync_changed_files: Search for files that have been changed and sync it to replica
    @method search_child_folders: Recursively search common folders between source and replica
    @method synchronize: Main method to synchronize the source and replica folders

    """

This class implements 4 methods that satisfies the requirements for the challenge.

  1. add_missing_in_replica

    • Checks the source_only list for files or folders that are only present in the source folder and copies them to the replica folder.
  2. remove_extra_in_replica

    • Checks the replica_only list for files or folders that are not present in the source and remove them from the replica folder.
  3. sync_changed_files

    • Checks the diff_files list for items that are presents in both folders but have different contents and them copies from source folder to replica folder.
  4. search_child_folders

    • Check for folders present in both source and replica folders and peform a recursion creating a new Synchronizer object and performing the same sync process in both child folders until there is no more child folder common between two parents.

Installation

Requirements

Python v3.12

pip v23.3.2

  1. Clone the repository
git clone https://github.com/Desgue/Veeam_Python_Developer_Test.git

Usage

python main.py [--source <path_to_source_folder>] [--replica <path_to_replica_folder>] [--log <path_to_log_file>] [--interval <interval_number_in_seconds>]

Arguments

-h, --help

  • Description: Show the help menu that indicates what each command does and how to use it.
  • Usage: -h or --help

-s, --source

  • Description: Absolute path for the source folder.
  • Usage: --source <absolute_path_to_source_folder> or -s <absolute_path_to_source_folder>
  • Required: True
  • Type: String

-r, --replica

  • Description: Absolute path to the replica folder.
  • Usage: --replica <absolute_path_to_replica_folder> or -r <absolute_path_to_replica_folder>
  • Required: True
  • Type: String

-l, --log

  • Description: Absolute path to the .log file, if the file do not exist it will be created.
  • Usage: --l <absolute_path_to_log_file> or --log <absolute_path_to_log_file>
  • Required: True
  • Type: String

-i, --interval

  • Description: Specify the interval time to wich the program will perform the synchronization task. Expressed in seconds. Default is 60 seconds.
  • Usage: -i <interval_number_in_seconds> or --interval <interval_number_in_seconds>
  • Default: 60s
  • Type: Integer

Final Considerations

  1. Error handling could be improve, for that I need to read the docs of each lib I am using and understand what kind of exceptions can happen.
  2. Even tough I performed manual testing to ensure all behaviors function as expected, an automated test script can be created to check thoroughly.
  3. Better handling of the terminal interface to accept a more gracefull shutdown instead using ctrl+c to stop the script, thus making it possible to also log the end of script session for further analysis.

Total time spend In this project was about 8 hours spread between reading about folder synchronization, searching for and reading the docs of which libraries I decided to use and actually implementing and refactoring the code.

veeam_python_developer_test's People

Contributors

desgue avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.