Giter Club home page Giter Club logo

pcw's Introduction

codecov

OpenQA Public cloud Helper

PCW project logo

Jose Lausuch: Where you thinking about PCW while laying on the beach? ๐Ÿ˜› Anton Smorodskyi: YES, constantly! :partygeeko: I see it in every palm on the beach !

PublicCloud-Watcher (PCW) is a web app which monitors, displays and deletes resources on various Cloud Service Providers (CSPs). PCW has two main flows :

  1. Update run ( implemented in ocw/lib/db.py ) Executed every 45 minutes. Concentrates on deleting VMs (in case of Azure Resource Groups).

    • Each update scans accounts defined in configuration file and writes the obtained results into a local sqlite database. Newly discovered entities get assigned an obligatory time-to-life value (TTL). TTL may be taken from tag openqa_ttl if entity is tagged with such tag if not PCW will check pcw.ini for updaterun/default_ttl setting and if setting is not defined than PCW will use hard-coded value from webui/settings.py. Database has a web UI where you can manually trigger certain entity deletion.
    • After persisting results into db PCW deciding which entities needs to be deleted. There are two ways to survive for entity: a. Having tag pcw_ignore ( with any value) b. Age of entity is lower than TTL defined. Age is calculated as delta of last_seen and first_seen
    • For entities that survive cleanup PCW will sent notification email to the list defined in config.
  2. Cleanup ( implemented in ocw/lib/cleanup.py ) Execution via django command. Concentrates on everything except VM deletion. This vary a lot per CSP so let's clarify that on per provider level.

    • For Azure such entities monitored (check details in ocw/lib/azure.py): a. bootdiagnostics b. Blobs in sle-images container c. Disks assigned to certain resource groups d. Images assigned to certain resource groups
    • For EC2 such entities monitored (check details in ocw/lib/ec2.py): a. Images in all regions defined b. Snapshots in all region defined c. Volumes in all regions defined d. VPC's ( deletion of VPC means deletion of all assigned to VPC entities first ( security groups , networks etc. ))
    • For GCE deleting disks, images & network resources (check details in ocw/lib/gce.py)
    • For Openstack deleting instances, images & keypairs (check details in ocw/lib/openstack.py

The fastest way to run PCW is via the provided containers, as described in the Running a container section.

Install

PCW has 3 sets of virtual env requirements files :

  • requirements.txt common usage for everything except K8S related cleanups
  • requirements_k8s.txt due to high volume of dependencies needed only in single use case (k8s cleanups) they excluded in independent category
  • requirements_test.txt contains dependencies allowing to run pcw's unit tests It's recommended to setup pcw in a virtual environment to avoid package collisions:
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt

Configure and run

Configuration of PCW happens via a global config file in /etc/pcw.ini. See templates/pcw.ini for a configuration template. To start, copy the template over:

cp templates/pwc.ini /etc/pcw.ini

To be able to connect to CSP PCW needs Service Principal details. Depending on namespaces defined in pcw.ini PCW will expect some JSON files to be created under /var/pcw/[namespace name]/[Azure/EC2/GCE/Openstack].json. See templates/var/example_namespace/ for examples.

PCW supports email notifications about left-over instances. See the notify section therein and their corresponding comments.

# Setup virtual environment
virtualenv env
source env/bin/activate
pip install -r requirements.txt


## Configuration steps, only required once to setup the database and user
# Setup database
python manage.py migrate
# Setup superuser (OPTIONAL)
python manage.py createsuperuser --email [email protected] --username admin
python manage.py collectstatic


## Running the webapp server
python manage.py runserver

By default, PCW runs on http://127.0.0.1:8000/

Building PCW containers

In containers folder you main find several Dockerfiles to build several different images:

Running a container

You can use the already build containers within this repository:

podman pull ghcr.io/suse/pcw:latest
podman pull ghcr.io/suse/pcw_k8s:latest

The PCW container supports two volumes to be mounted:

  • (required) /etc/pcw.ini - configuration ini file
  • (optional) /pcw/db - volume where the database file is stored

To create a container using e.g. the data directory /srv/pcw for both volumes and expose port 8000, run the following:

podman create --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v /srv/pcw/db:/pcw/db -v <local creds storage>:/var/pcw -p 8000:8000/tcp ghcr.io/suse/pcw:latest
podman start pcw

The pcw container runs by default the /pcw/container-startup startup helper script. You can interact with it by running

podman exec pcw /pcw/container-startup help

podman run -ti --rm --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v <local creds storage>:/var/pcw -v /srv/pcw/db:/pcw/db -p 8000:8000/tcp ghcr.io/suse/pcw:latest /pcw/container-startup help

To create an user within the created container named pcw, run

podman exec pcw /pcw/container-startup createuser admin USE_A_STRONG_PASSWORD

Devel version of container

There is devel version of container file. Main difference is that source files are not copied into image but expected to be mounted via volume. This ease development in environment close as much as possible to production run.

Expected use would be :

make podman-container-devel
podman run  -v <local path to ini file>:/etc/pcw.ini -v <local creds storage>:/var/pcw -v <path to this folder>:/pcw  -t pcw-devel "python3 manage.py <any command available>"

Codecov

Running codecov locally require installation of pytest pytest-cov codecov. Then you can run it with

BROWSER=$(xdg-settings get default-web-browser)
pytest -v --cov=./ --cov-report=html && $BROWSER htmlcov/index.html

and explore the results in your browser

Debug

To simplify problem investigation pcw has two django commands :

cleanup

updaterun

dumpstate

rmclusters

those allows triggering core functionality without web UI. It is highly recommended to use dry_run = True in pcw.ini in such cases.

Testing

virtualenv .
source bin/activate
pip install -r requirements_test.txt
make test

The tests contain a Selenium test for the webUI that uses Podman. Make sure that you have the latest geckodriver installed anywhere in your PATH and that the podman.socket is enabled: systemctl --user enable --now podman.socket

Set the SKIP_SELENIUM environment variable when running pytest or make test to skip the Selenium test.

pcw's People

Contributors

asmorodskyi avatar b10n1k avatar cfconrad avatar dependabot[bot] avatar grisu48 avatar ilausuch avatar jlausuch avatar mimi1vx avatar mpagot avatar pdostal avatar ricardobranco777 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pcw's Issues

ID in email notification has wrong format

The ID in the email notification is formatted in exponent format, which is odd:

Message from https://publiccloud-ng.qa.suse.de/

Provider      id       Created-By   Namespace    Age                        Delete                       openQA
===============================================================================================================
GCE        5.556e+18   <redacted>     ccoe        12h1m   <redacted>

The id field is displayed as 5.556e+18, which indicates a ridiculously high ID number. Either the formatting is wrong or the id is weird.

Get rid of generic exception catching

The code has a lot of except Exception that is not only a bad practice but harmful when trying to catch or debug other exceptions in code. In each case we must list all expected exceptions, however painful it is.

Collision of static file in new container startup

Restarting the new pcw container fails after some attempts with the following issue:

Mar 22 14:18:03 larrytornado conmon[5992]: Operations to perform:
Mar 22 14:18:03 larrytornado conmon[5992]:   Apply all migrations: admin, auth, contenttypes, ocw, sessions
Mar 22 14:18:03 larrytornado conmon[5992]: Running migrations:
Mar 22 14:18:03 larrytornado conmon[5992]:   No migrations to apply.
Mar 22 14:18:05 larrytornado conmon[5992]: 
Mar 22 14:18:05 larrytornado conmon[5992]: You have requested to collect static files at the destination
Mar 22 14:18:05 larrytornado conmon[5992]: location as specified in your settings:
Mar 22 14:18:05 larrytornado conmon[5992]: 
Mar 22 14:18:05 larrytornado conmon[5992]:     /pcw/static
Mar 22 14:18:05 larrytornado conmon[5992]: 
Mar 22 14:18:05 larrytornado conmon[5992]: This will overwrite existing files!
Mar 22 14:18:05 larrytornado conmon[5992]: Are you sure you want to do this?
Mar 22 14:18:05 larrytornado conmon[5992]: 
Mar 22 14:18:05 larrytornado conmon[5992]: Type 'yes' to continue, or 'no' to cancel: 
Mar 22 14:18:05 larrytornado conmon[5992]: Traceback (most recent call last):
Mar 22 14:18:05 larrytornado conmon[5992]:   File "manage.py", line 15, in <module>
Mar 22 14:18:05 larrytornado conmon[5992]:     execute_from_command_line(sys.argv)
Mar 22 14:18:05 larrytornado conmon[5992]:   File "/usr/lib/python3.6/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
Mar 22 14:18:05 larrytornado conmon[5992]:     utility.execute()
Mar 22 14:18:05 larrytornado conmon[5992]:   File "/usr/lib/python3.6/site-packages/django/core/management/__init__.py", line 395, in execute
Mar 22 14:18:05 larrytornado conmon[5992]:     self.fetch_command(subcommand).run_from_argv(self.argv)
Mar 22 14:18:05 larrytornado conmon[5992]:   File "/usr/lib/python3.6/site-packages/django/core/management/base.py", line 330, in run_from_argv
Mar 22 14:18:05 larrytornado conmon[5992]:     self.execute(*args, **cmd_options)
Mar 22 14:18:05 larrytornado conmon[5992]:   File "/usr/lib/python3.6/site-packages/django/core/management/base.py", line 371, in execute
Mar 22 14:18:05 larrytornado conmon[5992]:     output = self.handle(*args, **options)
Mar 22 14:18:05 larrytornado conmon[5992]:   File "/usr/lib/python3.6/site-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 191, in handle
Mar 22 14:18:05 larrytornado conmon[5992]:     if input(''.join(message)) != 'yes':
Mar 22 14:18:05 larrytornado conmon[5992]: EOFError: EOF when reading a line

The culprint is most likely the container-startup script, where collectstatic might cause this collision.

Pipeline fails in Python 3.6

The CI pipeline for python 3.6 fails because python 3.6 is deprecated:

______________________ ERROR collecting tests/test_gce.py ______________________
tests/test_gce.py:1: in <module>
    from ocw.lib.gce import GCE
ocw/lib/gce.py:4: in <module>
    from google.oauth2 import service_account
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/google/oauth2/service_account.py:77: in <module>
    from google.auth import _service_account_info
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/google/auth/_service_account_info.py:22: in <module>
    from google.auth import crypt
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/google/auth/crypt/__init__.py:43: in <module>
    from google.auth.crypt import rsa
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/google/auth/crypt/rsa.py:20: in <module>
    from google.auth.crypt import _cryptography_rsa
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/google/auth/crypt/_cryptography_rsa.py:22: in <module>
    import cryptography.exceptions
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/cryptography/__init__.py:28: in <module>
    stacklevel=2,
E   cryptography.utils.CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.

Auto delete of left-overs

Add a Tag to instances/resources with a max TTL. If this TTL expire, pcw can delete that instance without asking some user.

Refactor the handling of outdated timestamps

Timestamps for deletion are currently scattered over multiple variables and some variables per CSP, e.g.

[cleanup]
# Max age of an image file
max-images-age-hours = 24

[cleanup.namespace.qac]
# EC2 snapshots younger than this amount of days will be ignored
ec2-max-snapshot-age-days = 2
# EC2 volumes younger than this amount of days will be ignored
ec2-max-volumes-age-days = 2

We should unify the TTL handling and instead of having hours and days variables allow the user to set a custom duration that will be parsed (e.g. max-image-age = 24h or ec2-max-volumes-age = 2d).

wrap all dry_run use cases in generic function

currently project has a lot of duplicating code like

if self.dry_run: 
   log_info ( 'No code execution due to dry_run') 
else: 
  execute something

this can and should be refactored into generic function which will accept function to call as parameter and will replace all this if/else blocks with one-liner

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.