Comments (3)
[Updated below] Hmm, I would go with multithreading for GCP resource crawler (e.g. use multiprocessing.pool.ThreadPool or concurrent.futures.ThreadPoolExecutor). Doing fork/spinning up new process for each request/set of requests might be expensive. I'd do actual multiprocessing per each GCP project and/or service account key where time spent on spinning up new process/forking is negligible comparing to scanning time. I think having one solution for all OSes will make maintenance and implementation easier (so spawn
is preferable).
Update: just read the documentation about pool of workers. You can disregard what's written above about process per each request/set of request. You have a pool of process workers waiting for new request to come. BUT we need to make sure all workers live as long as the scanning loop lives. This way you waste time for spinning up everything just once during the start up (this is not super important IMO). However, keep in mind that that the object you pass should be pickable in case of concurrent.futures.ProcessPoolExecutor
according to this (https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor). Everything sent/received to our functions should be pickable AFAIK but I'd check it anyway.
from gcp_scanner.
Meeting Notes for 10/07/2023:
Decision 1
We would go with the default approach and support for all 3 types of systems. The primary reason is that the IO tasks are in terms of seconds
and optimizing it using fork
would optimize it in terms of milliseconds (which is negligible in comparison).
Decision 2
Discussion on further PRs and Issues. But a starting point would be paralleling Storage Bucket
.
Decision 3
The projects list
needs to be queried first. So, instead of going with groups and handling them, we can make an exception and query the projects list
outside the loop.
Decision 4
Concurrent.futures.ProcessPoolExecutor is the way to go.
from gcp_scanner.
completed with PR #265 and #269
closing
from gcp_scanner.
Related Issues (20)
- GCP Scanner does not work on older versions of Python3 HOT 2
- Add tests for older python versions
- Refinement: Improve sending scan config to every crawler. HOT 2
- But Report HOT 2
- Compare
- Epic: add support for additional GCP resources. HOT 4
- refactor: fix wrong filename cloud_source_manager_client.py
- Cloud firestore crawler.
- Bug: bigquery crawler replaces data instade of appending to the list HOT 1
- Cloud datastroe crawler.
- Add functional tests for different OSes
- cloud domains crawler
- Bug: Do not stop enumerating project if resource manager is disabled in SA project. HOT 1
- Under certain pip setup gcp_scanner raises `can not pickle _cffi_Backend.FFI object`
- GCP Scanner freezes when there is an error in project scanning crawler HOT 1
- Visualization Tool: Search using regex HOT 3
- Visualization Tool: Upload multiple scanned files at once
- Visualization Tool: Add a table view for resources page HOT 4
- Visualization Tool: Reduce number of steps to upload a file HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcp_scanner.