Giter Club home page Giter Club logo

Comments (3)

mshudrak avatar mshudrak commented on July 28, 2024

[Updated below] Hmm, I would go with multithreading for GCP resource crawler (e.g. use multiprocessing.pool.ThreadPool or concurrent.futures.ThreadPoolExecutor). Doing fork/spinning up new process for each request/set of requests might be expensive. I'd do actual multiprocessing per each GCP project and/or service account key where time spent on spinning up new process/forking is negligible comparing to scanning time. I think having one solution for all OSes will make maintenance and implementation easier (so spawn is preferable).

Update: just read the documentation about pool of workers. You can disregard what's written above about process per each request/set of request. You have a pool of process workers waiting for new request to come. BUT we need to make sure all workers live as long as the scanning loop lives. This way you waste time for spinning up everything just once during the start up (this is not super important IMO). However, keep in mind that that the object you pass should be pickable in case of concurrent.futures.ProcessPoolExecutor according to this (https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor). Everything sent/received to our functions should be pickable AFAIK but I'd check it anyway.

from gcp_scanner.

peb-peb avatar peb-peb commented on July 28, 2024

Meeting Notes for 10/07/2023:

Decision 1

We would go with the default approach and support for all 3 types of systems. The primary reason is that the IO tasks are in terms of seconds and optimizing it using fork would optimize it in terms of milliseconds (which is negligible in comparison).

Decision 2

Discussion on further PRs and Issues. But a starting point would be paralleling Storage Bucket.

Decision 3

The projects list needs to be queried first. So, instead of going with groups and handling them, we can make an exception and query the projects list outside the loop.

Decision 4

Concurrent.futures.ProcessPoolExecutor is the way to go.

from gcp_scanner.

peb-peb avatar peb-peb commented on July 28, 2024

completed with PR #265 and #269
closing

from gcp_scanner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.