mc2-project / mc2 Goto Github PK
View Code? Open in Web Editor NEWA Platform for Secure Analytics and Machine Learning
License: Apache License 2.0
A Platform for Secure Analytics and Machine Learning
License: Apache License 2.0
Hi, When I conducted federated xgboost, do I need to install secure-xgboost first, using the following link's steps?
https://mc2-xgboost.readthedocs.io/en/latest/build.html#installing-secure-xgboost-dependencies
Could help with this? Thanks.
docker run --env HTTP_PROXY="XXXXXXXXXXXXX" --env HTTPS_PROXY="XXXXXXXXXXXX" -it -v ~/github/mc2-project/playground:/mc2/client/playground mc2project/mc2_img:v0.1.3
root@da42abf41b17:/mc2/client# cp -r quickstart/* playground
root@da42abf41b17:/mc2/client# mc2 configure $(pwd)/playground/config.yaml
2021-12-14 07:13:06 - INFO - Set configuration path to /mc2/client/playground/config.yaml
root@da42abf41b17:/mc2/client# mc2 init
2021-12-14 07:13:12 - WARNING - Skipping keypair generation - private key already exists at /mc2/client/playground/keys/user1.pem
2021-12-14 07:13:12 - WARNING - Skipping symmetric key generation - key already exists at /mc2/client/playground/keys/user1_sym.key
2021-12-14 07:13:12 - INFO - init finished successfully
root@da42abf41b17:/mc2/client# mc2 start
2021-12-14 07:13:15 - INFO - Running 'cd /mc2/opaque-sql; build/sbt run' locally
2021-12-14 07:13:15 - INFO - start finished successfully
root@da42abf41b17:/mc2/client# mc2 upload
2021-12-14 07:13:23 - HOST - info - Successfully initialized cryptography module.
^[[A2021-12-14 07:13:23 - INFO - Encrypted /mc2/client/playground/data/opaquesql.csv in sql format and outputted to /mc2/client/playground/data/opaquesql.csv.enc
2021-12-14 07:13:23 - INFO - Using local deployment. Copying /mc2/client/playground/data/opaquesql.csv.enc to /mc2/data/opaquesql.csv.enc
2021-12-14 07:13:23 - INFO - Using local deployment. Copying /mc2/client/playground/data/opaquesql.csv.enc to /mc2/data/opaquesql.csv.enc
2021-12-14 07:13:23 - INFO - upload finished successfully
root@da42abf41b17:/mc2/client# mc2 run
E1214 07:13:26.457284059 214 http_proxy.cc:81] 'https' scheme not supported in proxy URI
Traceback (most recent call last):
File "/mc2/client/mc2.py", line 192, in
mc2.configure_job(config)
File "/usr/local/lib/python3.6/dist-packages/mc2client-0.0.1-py3.6.egg/mc2client/core.py", line 1215, in configure_job
_attest(head_address, simulation_mode, enclave_signer_pem)
File "/usr/local/lib/python3.6/dist-packages/mc2client-0.0.1-py3.6.egg/mc2client/core.py", line 1271, in _attest
response = stub.GetRemoteEvidence(attest_pb2.AttestationStatus(status=0))
File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1639466006.457566980","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3008,"referenced_errors":[{"created":"@1639466006.457565708","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"
Currently, the tracker_uri
and tracker_port
variables in rabit are set by looking at environment variables. Whether XGBoost is run in a distributed manner is determined by the value of tracker_uri
(whether or not it is null). Unfortunately, when running Federated XGBoost with RPC, setting the environment variables before individual RPC calls doesn't seem to modify the tracker_uri
and tracker_port
variables, meaning that XGBoost doesn't run in distributed mode.
A possible fix to this is passing in the tracker uri and the tracker port as arguments to the RPC call, then setting the tracker_uri
and tracker_port
variables in the Rabit instance on each worker. The variables are currently set from environment variables in the AllreduceBase::SetParam function, which is called during initialization. We'll need to modify how these variables are set, setting them on the workers immediately after the RPC call is made on the tracker.
Add TLS to FederatedXGBoost to reduce network leakage
To enforce code style in the repo, we should add a GitHub Actions linter to enforce code style. In particular, we should add black and flake8 for Python, and a Chromium style clang-format checker.
We currently have the Super Linter commented out. We can likely reuse the super linter for Python black and flake8. We'll have to find another linter for clang-format, as Super Linter doesn't support C++.
Set up CI for PRs
To prevent inconsistent key sizes across the MC2 ecosystem, we should not allow users to specify a key size in generate_symmetric_key()
. Instead, we should retrieve the CIPHER_KEY_SIZE
from C++, similar to this, and use the CIPHER_KEY_SIZE
as the number of bytes for our generated key.
To do so, we'll need to add a function in src/c_api.cpp
, similar to the cipher_iv_size()
function that gets the CIPHER_KEY_SIZE
from C++.
Should we make mc2 configure $(path-to-config-file)
a default so the user only needs to run this if the config file is somewhere else in a different path?
We currently offer a Docker image with necessary dependencies, but the image is quite large (~5GB) and takes a while to download. Consequently, we should also provide a Dockerfile as part of this repo to enable users to build the container much faster locally.
Currently, we hardcode the port that Opaque SQL listens on in the client. We should make this port configurable, either by allowing users to specify it in the global configuration or through an environment variable.
Hi, when I git down your project by "git clone --recursive https://github.com/mc2-project/mc2.git". It seems that there's only mc2-xgboost in this git repo, but not secure-xgboost. (To better continue, I still git it from another git repo. ) Is there any difference between mc2-xgboost and secure-xgboost here?
Thanks.
Dependabot couldn't find the submodule secure-xgboost
. Perhaps it isn't committed, or isn't a submodule?
Currently the system will always use the star topology. We want to add a parameter somewhere that will enable us to also use the ring topology
Currently, MC2 Client doesn't support uploading entire directories to Azure blob storage. We'd like to add support for this, as data encrypted in sql
format is always outputted as a directory with a data
sub-directory and a schema
sub-directory.
To do this, we'll have to investigate how to upload/download directories to/from Azure blob storage using the Azure Python SDK, and modify the upload_data()
and download_data()
functions.
Is this at all possible?
Add a YAML config file to specify the host IPs, the federated job parameters (worker memory, num workers, etc from the dmlc-submit
command).
Currently the Python client does not have a robust and friendly exception classes and throws a Segmentation fault error on the erratically formatted schema files and this error, which does not do a good job at hinting what goes wrong:
terminate called after throwing an instance of 'std::runtime_error'
what(): Not a number.
Aborted (core dumped)
We should add some important exception classes to the client with more information that handle the lower-level errors bubbling up to the caller.
There is current a few places where there are still print
statements. Can we switch to logger
instead?
Hi,
I conducted the following steps for federated learning:
git clone --recursive https://github.com/mc2-project/mc2.git
cd mc2/secure-xgboost
I can't find secure-xgboost module in mc2. Thanks!
mc2 client documentation is 404
Currently, the system requires data to be present on all parties, including the master/tracker. We want to make data on the master/tracker optional
During the project launch process, when I call Azure CLI, I encounter an issue indicating "API deprecated starting 2.21.0". However, the current version of Azure CLI I have installed is 2.0.25, which is lower than the 2.21.0 mentioned in the prompt. How should I solve this problem?
Replace SSH calls to start jobs with gRPC calls
Umbrella issue of #12
Hey guys, I follow your guys readme file to execute the federated xgboost. While, I encounter some problems.
Question1: If the hosts.config should be under the sample folder? Meanwhile, I will run ./start_job.sh this script under the sample folder since I see ../../ at the bottom of this script(run scripts in dmlc-core this folder. Also, is there necessary to include start_job.sh and hosts.config under federated-xgboost folder?
Question2: In my understand, different clients dataset will locate in their own local node. Here, they probably need to have the same file names (eg: train.csv, test.csv). In may case, I launch three different servers in the virtualbox. Thus, they can not have the same file path as the tracker(I treat it as my local machine as the master). Hence, I am not sure whether I can use relative path such as
fxgb.load_training_data('../../data/train.csv') instead of your guys fxgb.load_training_data('home/ubuntu/mc2/data/train.csv').
Question3: As I mention before, I use virtualbox to launch three servers as worker nodes. I encounter the error: File "/Users/shi/Desktop/mc2/dmlc-core/tracker/dmlc_tracker/tracker.py", line 416, in get_host_ip
hostIP = socket.gethostbyname(socket.gethostname())
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
Do you guys have any idea how to solve this issue? i include 192.168.56.101 and other two host ips in hosts.config.
Thanks Guys.
If the internal IP/external IP of the master differs, then the tracker may not assign the master rank 0.
To improve usability, we'd like to add a logger to the Python code to provide more detail on what's actually being run
Currently, generate_keypair()
outputs a private key and a certificate. We should modify this function to also generate and output a public key.
Currently, MC2 Client allows users to upload and download files to Azure blob storage. Users should also be able to remotely delete files they uploaded to storage. We should add a new function to enable them to do so.
Make authentication robust (autogenerate certs/keys, authenticate both sides)
Currently, the user has to specify a username for MC2, and a second username for authentication for launched Azure VMs.
We should consolidate this and have the user only input one username to simplify the configuration process - changes will have to made in the various places in the codebase that load in the config ssh_user
field. Instead of loading in the Azure config ssh_user
field, we should load in the global config user
field.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.