Comments (14)
Is there a way to perform 2PC private inference using only one machine?
The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.
from spu.
Is there a way to perform 2PC private inference using only one machine?
The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.
Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.
An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?
from spu.
I don't think they are strictly equivalent. It depends on your network conditions.
from spu.
I don't think they are strictly equivalent. It depends on your network conditions.
Get it. Thanks.
from spu.
Is there a way to perform 2PC private inference using only one machine?
The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.
An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?
multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.
from spu.
Is there a way to perform 2PC private inference using only one machine?
The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.
An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.
Hi @anakinxc. Thanks for your helpful explanation. I'll try real LAN/WAN setting when I solved the 2PC implementation problem.
I tried to run 2pc.json for this and bugs happened. Here is the output:
E0408 15:23:17.155576810 9376 chttp2_server.cc:1060] UNKNOWN:No address added out of total 1 resolved for '127.0.0.1:61320' {created_time:"2024-04-08T15:23:17.155547663+08:00", children:[UNKNOWN:Unable to configure socket {fd:5, created_time:"2024-04-08T15:23:17.155536205+08:00", children:[UNKNOWN:Address already in use {created_time:"2024-04-08T15:23:17.155524529+08:00", errno:98, os_error:"Address already in use", syscall:"bind"}]}]}
Traceback (most recent call last):
File "/mnt/c/Users/78299/Desktop/privit-main/src/benchmark/nodectl.py", line 47, in
ppd.RPC.serve(args.node_id, nodes_def)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 210, in serve
server.add_insecure_port(nodes_def[node_id])
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_server.py", line 1329, in add_insecure_port
return _common.validate_port_binding_result(
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_common.py", line 181, in validate_port_binding_result
raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
RuntimeError: Failed to bind to address 127.0.0.1:61320; set GRPC_VERBOSITY=debug environment variable to see detailed error message.
I have restarted the program and it is still stuck. Could you please take a look at how to solve this 'Address already in use' problem? Thanks.
from spu.
Is there a way to perform 2PC private inference using only one machine?
The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.
An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.
Hi @anakinxc. Thanks for your helpful explanation. I'll try real LAN/WAN setting when I solved the 2PC implementation problem.
I tried to run 2pc.json for this and bugs happened. Here is the output:
E0408 15:23:17.155576810 9376 chttp2_server.cc:1060] UNKNOWN:No address added out of total 1 resolved for '127.0.0.1:61320' {created_time:"2024-04-08T15:23:17.155547663+08:00", children:[UNKNOWN:Unable to configure socket {fd:5, created_time:"2024-04-08T15:23:17.155536205+08:00", children:[UNKNOWN:Address already in use {created_time:"2024-04-08T15:23:17.155524529+08:00", errno:98, os_error:"Address already in use", syscall:"bind"}]}]} Traceback (most recent call last): File "/mnt/c/Users/78299/Desktop/privit-main/src/benchmark/nodectl.py", line 47, in ppd.RPC.serve(args.node_id, nodes_def) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 210, in serve server.add_insecure_port(nodes_def[node_id]) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_server.py", line 1329, in add_insecure_port return _common.validate_port_binding_result( File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_common.py", line 181, in validate_port_binding_result raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address) RuntimeError: Failed to bind to address 127.0.0.1:61320; set GRPC_VERBOSITY=debug environment variable to see detailed error message.
I have restarted the program and it is still stuck. Could you please take a look at how to solve this 'Address already in use' problem? Thanks.
If it's local host, try to use a different port.
from spu.
Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different:
node-side:
[2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320
[2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0
E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320
[2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
[2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last):
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run
ret_objs = fn(self, *args, **kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init
link = libspu.link.create_brpc(desc, my_rank)
RuntimeError: what:
[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start
stacktrace:
#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8
#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84
#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b
#3 cfunction_call+0x4fc697
Terminal-side:
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init
results = [future.result() for future in futures]
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in
results = [future.result() for future in futures]
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run
return self._call(self._stub.Run, fn, *args, **kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call
raise Exception("remote exception", result)
Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))
Thanks.
from spu.
Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different: node-side: [2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320 [2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0 E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320 [2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread [2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last): File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run ret_objs = fn(self, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init link = libspu.link.create_brpc(desc, my_rank) RuntimeError: what: [external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start stacktrace: #0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8 #1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84 #2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b #3 cfunction_call+0x4fc697
Terminal-side: File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run return self._call(self._stub.Run, fn, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call raise Exception("remote exception", result) Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))
Thanks.
Can you share snippet of your 2pc.json?
from spu.
Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different: node-side: [2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320 [2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0 E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320 [2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread [2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last): File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run ret_objs = fn(self, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init link = libspu.link.create_brpc(desc, my_rank) RuntimeError: what: [external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start stacktrace: #0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8 #1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84 #2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b #3 cfunction_call+0x4fc697
Terminal-side: File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run return self._call(self._stub.Run, fn, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call raise Exception("remote exception", result) Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))
Thanks.Can you share snippet of your 2pc.json?
Sure. Here is the whole 2pc.json:
{
"id": "colocated.2pc",
"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},
"devices": {
"SPU": {
"kind": "SPU",
"config": {
"node_ids": [
"node:0",
"node:1"
],
"experimental_data_folder": [
"/tmp/spu_data_0/",
"/tmp/spu_data_1/"
],
"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],
"runtime_config": {
"protocol": "SEMI2K",
"field": "FM64",
"enable_pphlo_profile": true,
"enable_hal_profile": true
}
}
},
"P1": {
"kind": "PYU",
"config": {
"node_id": "node:0"
}
},
"P2": {
"kind": "PYU",
"config": {
"node_id": "node:1"
}
}
}
}
from spu.
Hi @anakinxc. I have checked the 2pc.json file again and found that:
"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},
and
"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],
should be different. I modified them to 127.0.0.1:61320
/127.0.0.1:61321
and 127.0.0.1:61330
/127.0.0.1:61331
and it works.
But I don't understand how these IPs are determined and why they need to be different?
Thank you for your patient comments.
from spu.
Hi @anakinxc. I have checked the 2pc.json file again and found that:
"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},and
"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],should be different. I modified them to
127.0.0.1:61320
/127.0.0.1:61321
and127.0.0.1:61330
/127.0.0.1:61331
and it works.But I don't understand how these IPs are determined and why they need to be different?
Thank you for your patient comments.
IPs in nodes are for python layer communications, and spu internal addrs are for spu runtimes.
In current implementation, these layers are using different RPC frameworks, so they have to be different at this point.
from spu.
Hi @anakinxc. I have checked the 2pc.json file again and found that:
"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},and
"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],should be different. I modified them to
127.0.0.1:61320
/127.0.0.1:61321
and127.0.0.1:61330
/127.0.0.1:61331
and it works.
But I don't understand how these IPs are determined and why they need to be different?
Thank you for your patient comments.IPs in nodes are for python layer communications, and spu internal addrs are for spu runtimes.
In current implementation, these layers are using different RPC frameworks, so they have to be different at this point.
Thank you so much for your explanation. I'll keep this in mind.
from spu.
Related Issues (20)
- [Feature]: flax_gpt2 inference using FM32 secret sharing HOT 4
- How to get the communication volume and how to evaluate individual DNN layer? HOT 6
- [Question]: Are the plaintexts passed to AddPlainInplace in NTT form? HOT 3
- mac运行spu-psi_test中.so错误 HOT 12
- [Bug]: One more minus sign HOT 2
- [Bug]: Package 'examples/python/ml/my_custom_file' contains errors HOT 10
- [Bug]: The critical condition judgment is wrong HOT 5
- [Bug]: Error when trying to benchmark SPU latency in 2PC setting. HOT 3
- another case where secret indexing doesn't seem to work HOT 2
- [Bug]: 8x communication compared to reported in Cheetah HOT 5
- [Operation Question] How to separate truncation and matmul operations HOT 9
- [Question]: Are there any files building correspondence between the kernels and their dispatching functions? HOT 3
- [Bug]: bitintl_b in ab_api.cc is wrong HOT 3
- [Question]: The number of convolutional multiplication decreases but the communication cost increases in SPU HOT 4
- [Bug]: gRPC Socket Shutting Down After Many Runs HOT 7
- [Bug]: gcc 11.2下的编译问题 HOT 12
- [Question]: 能否不重复编译外部库,加速编译速度? HOT 4
- [Question]: stub_method方法实现将函数交由server执行,请问如何调试server中函数的具体执行过程? HOT 4
- [Question]: Common type of Ashare and Bshare HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spu.