Giter Club home page Giter Club logo

harvey's Introduction

Hey, I'm Justin Hammond 👋

Senior Software Engineer @EasyPost, IT Pro, Tech Enthusiast

I love all things tech. I've been programming for 18+ years, tinkering with electronics for 15+ years, and founding or building tech companies for 10+ years. I'm an open source fanatic, Apple fanboy, and love to explore new tech. I spend my time coding open source projects, tinkering with electronics and new tech products, and consulting teams on how to get things done.

Noteworthy Projects

The following are items that may not be represented on my GitHub profile but are noteworthy in the software space:

GitHub Stats

Metrics

Latest Blog Posts

harvey's People

Contributors

justintime50 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

harvey's Issues

Add Healthcheck After Deploys

Harvey never actually checks to ensure your container is running, instead it simply tries to run a container.

There should be functionality to allow Harvey to do a healthcheck on the container prior to marking a deploy/full pipeline successful. This will ensure it didn't start up and immediately exit.

Cache vs No Cache

Ensure we have the right mix of cache vs no-cache. It's imperative that builds are unique each time and pull in data properly, but it's also important for speed we cache what we can without having residue from previous runs

Add Linting

Harvey needs to be linted. This needs to happen though after bug fixes and stability improvements.

Switch to Using Docker-Py Instead of Shelling out for Docker Commands

Summary

Initially, Harvey was built as a client library around the Docker API and then the pipelines were built over that. There already exists a much more robust client library for Docker which we should use instead called docker-py.

Acceptance Criteria

  • Switch containers and images to use the Docker client library
  • Ensure that we still get timeouts on actions taken like we do right now with the subprocess module
  • Ensure the docker package is in the setup.py file

Related Issues

  • #23
  • #20
  • #19
  • #15
  • TODO item found in images.py: Use the Docker API for building instead of a shell command (PR #43)

Speed Improvements to Stage Execution Times

  • The Build stage is INCREDIBLY fast - wahoo!
  • The Test stage needs help. Because we are building unique images every single time with unique tags and unique containers - nothing can get cached. This is great for security and containing each persons tests to a unique container but is terrible on performance. Some scripts that only take 1-3 seconds to run are having their test stage take 15-20 seconds as all the overhead is on Docker.
  • The Deploy Stage is fine, spinning up the container isn't the problem - it comes with tearing down the old one. Many containers will wait the default 10 seconds before stopping pending a "graceful shutdown" but many of the projects configured don't have a graceful shutdown in place and therefore simply wait the 10 default seconds. Killing containers immediately may not be a great idea because some projects will have a graceful shutdown - we need to find a happy median where we can shut down containers ASAP. This may require a per-project config.

Add Authentication

Anyone can hit an endpoint and with the right data royally mess up Docker on the host machine. Protect Harvey with per-user authentication. Ensure that only the user who created a project in Harvey can touch it in anyway and ensure that those with no authentication cannot use Harvey at all.

Pipeline timers do not account for the "startup" time

Pipeline times start when the pipeline starts but pipelines start after Harvey boots up and clones/pulls the project so there is some unaccounted time on hand. Pass the start time from the very beginning to the pipelines to get an accurate reading.

Add logging to Harvey

Summary

Harvey should log errors and stack traces for troubleshooting. Pair this with #7

Simply adopt the Logging package from Python to do the trick.

https://realpython.com/the-most-diabolical-python-antipattern/

Acceptance Criteria

  • Add logging throughout the app experience allowing us insight to what's happening with Harvey (keep this separate from request logging) - use standard Python logging
  • Add logging on the endpoints (requests) so we know what's coming in (keep this separate from app logging) - use Flask logging
  • Allow logging to be configurable by the end user
  • Keep logging of pipelines separate from the two items above
  • Ensure all types of logs (requests, application, pipeline) rollover once the file or files become to big or many

Add Support for Parallel Tests

Users may want to test their code across various versions of a programming language. Add support for parallel tests to be run (eg: PHP 7.2, 7.3, 7.4).

Criteria:

  • We'll need to ensure there is a limit on how many parallel tests any single user can run at a time.
  • We'll need to string all the outputs together or associate them with that user

Lock Does Not Exist for Project That is Brand New

If you try to deploy a brand new project to Harvey, the pipeline will blow up stating there is no lock and will exit.

Exception in thread Thread-25:
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/Users/admin/git/personal/harvey/venv/lib/python3.9/site-packages/sentry_sdk/integrations/threading.py", line 69, in run
    reraise(*_capture_exception())
  File "/Users/admin/git/personal/harvey/venv/lib/python3.9/site-packages/sentry_sdk/_compat.py", line 54, in reraise
    raise value
  File "/Users/admin/git/personal/harvey/venv/lib/python3.9/site-packages/sentry_sdk/integrations/threading.py", line 67, in run
    return old_run_func(self, *a, **kw)
  File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/admin/git/personal/harvey/harvey/pipelines.py", line 94, in run_pipeline
    webhook_config, webhook_output, start_time = Pipeline.initialize_pipeline(webhook)
  File "/Users/admin/git/personal/harvey/harvey/pipelines.py", line 28, in initialize_pipeline
    if Lock.lookup_project_lock(Webhook.repo_full_name(webhook)) is True:
  File "/Users/admin/git/personal/harvey/harvey/locks.py", line 42, in lookup_project_lock
    raise ValueError('Lock does not exist!')
ValueError: Lock does not exist!

Let's fix this so it gracefully handles if no lock can be found.

Use ssh_url for "git clone"

If we don't use ssh_url for git clone then private repos can't be cloned into Harvey. We'll need to:

  1. Change the reference
  2. Document using ssh keys
  3. Create a helper script to try automating that?

Add Subprocess Timeouts

Currently, subprocesses can run forever and not exit. Add a timeout to each subprocess to ensure they don't hang or spin forever.

Security Audit & Authentication

Security Audit

  • Add authentication to all API endpoints and ensure only those with a valid API key can perform actions
  • Ensure users can only interact with their own projects

See #1

Change "cd" to "git -C"

Harvey currently navigates to the project directory then pulls. Instead, Harvey should simply pull changes relatively using git -C <dir> pull instead of cd <dir> && git pull which is safer and simpler.

Add "Pull" Pipeline

Currently if you only wanted to pull changes when a webhook fires, you'd need to test as well. Add a new pipeline that literally just runs git pull when a webhook is received. Great if people wanted to use custom logic to test/deploy their repo.

Add Database

Currently logs are stored to actual log files with the name of the pipeline ID which is just a randomly generated ID.

Structure:

logs
    project_1
        1234567890.log
        0987654321.log
    project_2
        ...
    ...

This is great for now, but we should instead be saving log data to a database.

Brainstorming some initial columns and tables:

logs

  • id [int]
  • pipeline_id [int]
  • log_content [text]
  • created_at [datetime]

users

  • id [int]
  • user_id [int]
  • email [varchar]
  • password [varchar]
  • created_at [datetime]
  • updated_at [datetime]
  • deleted_at [datetime]

pipelines

  • id [int]
  • pipeline_id [int]
  • success [bool]
  • configuration [text]? (store the JSON configuration)
  • user_id [int] (user who created the pipeline... how is this derived?)
  • pipeline_time [varchar] (time the total pipeline took to build)
  • created_at [datetime]

Add Email Support

Add support to email when pipelines are finished in addition to the Slack logic that already exists.

Container Healthchecks Are Randomly Failing

Container health checks are randomly failing and I'm unsure why (see recent Slack output for more info). Let's investigate and correct this. Logging will be helpful to determine why.

Do Not Allow Multiple Concurrent Pipelines for the Same Project

Docker Compose does weird things when you start two concurrent docker compose up -d commands for the same project (only possible because they are running in separate threads via Harvey) ultimately leading to Docker crashing completely.

Let's add a check prior to running a pipeline that will lock deployments for that project and only release it once the pipeline is finished (success or fail) so that we don't inadvertently break Docker.

Dangling Images/Containers

Ensure that we aren't leaving dangling images or containers lying around eating up resources and disk space

Docker-in-Docker Support (For Tests)

Figure out a good way to implement a docker-in-docker concept (which yes, is tricky and bad - but people may want to test their docker containers)

Better User Logging

Better logging (include output from everything in the log, not just container logs meaning the build output and each steps output)

Also add error handling across the board.

Replace Flask with WSGI Server

Currently a flask router wraps all the routes to the API/webhooks which is fine for development and small personal use but will not scale in production environments. Replace the Flask app.py with a production ready WSGI server.

This will depend on #10

Update Documentation

Summary

The documentation is sad and has been, the project has changed so frequently that it's been difficult to document how to get started with Harvey; however, it's stabilizing and the docs could use a serious refresh to make it dead simple on how to get started with this project, especially because it's so large and requires a lot of info.

Acceptance Criteria

  • Update the documentation to describe better what Harvey is
  • Update the documentation to show how to use Harvey

Fix Container Healthchecks for "Compose" Workflows

Currently, the container healthcheck functionality only works for non-compose workflows. As I solely use compose workflows in Harvey, it'd be great to get this working again.

Notes I had from a previous commit:

* Healthchecks currently fail for docker-compose deploys as the container name is specified in the compose files vs the webhook
* Fix healthchecks for compose and create a way we can line up the name in code with the name in files

Basically the health check is trying to run a healthcheck against a container whose name doesn't exist and it therefore fails.

Harvey Stopped Saving to SQLite DB

Harvey appears to have stopped saving to the SQLite DB in prod. I'm unsure yet as to the reason, it started about a week ago regardless of project. This could simply be due to the version of Harvey being deployed being a "nightly build" and having a bug. This will need some investigation.

The biggest offender is the pipeline logs. Harvey still builds and fires off the Slack notification though so it's only the saving (or retrieving?) of the logs.

Try/Catch Logic

Introduce try/catch logic to ensure each step of the process works correctly. Some of this has already happened which mostly stops bad things from happening but some of the errors aren't caught and there are still other places where there is no try/catch logic.

Fix Encoding

Fix latin encoding for logs which messes with output. Find something universal that is friendlier to all terminals and text editors.

Pipeline’s Need a Time-out

It’s possible that users can add a rogue script in the testing stage that runs forever tying up resources or that certain build stages could run too long. Add a timeout for each stage that will exit the pipeline if exceeded.

Flush Logs After Time Period

Harvey will quickly build up hundreds or thousands of log files. Need to add some logic to flush logs after a certain date or time period? Allow the user to configure this?

Change Default Git Pull Behavior

When pulling repos, you'll receive the following error:

warning: Pulling without specifying how to reconcile divergent branches is
discouraged. You can squelch this message by running one of the following
commands sometime before your next pull:

  git config pull.rebase false  # merge (the default strategy)
  git config pull.rebase true   # rebase
  git config pull.ff only       # fast-forward only

You can replace "git config" with "git config --global" to set a default
preference for all repositories. You can also pass --rebase, --no-rebase,
or --ff-only on the command line to override the configured default per
invocation.

To fix this, let's make all pulls fast forward.

Dockerize Harvey

Explore dockerizing this project (difficult as we use the Docker socket to connect to the docker instance - putting this inside Docker makes connecting difficult)

Allow Pipelines to be Retried

Currently if the admin of Harvey goofs and changes files locally, Harvey can't pull in the project.

We should allow a pipeline to be retried (say they stashed or removed the changes). Currently the only way to do this is to redeliver the web hook which isn't great. We should probably save the attempt somehow so it can be retried locally?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.