Issue Type Usability Modules Involved <p d

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="

How to use SPU to evaluate private models in 2PC setting with only one machine?,about secretflow/spu

Comments (14)

tpppppub commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine?
The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

from spu.

warpoons commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine? The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.

An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?

from spu.

tpppppub commented on June 12, 2024

I don't think they are strictly equivalent. It depends on your network conditions.

from spu.

warpoons commented on June 12, 2024

I don't think they are strictly equivalent. It depends on your network conditions.

Get it. Thanks.

from spu.

anakinxc commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine? The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.

An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?

multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.

from spu.

warpoons commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine? The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.
An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?

multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.

Hi @anakinxc. Thanks for your helpful explanation. I'll try real LAN/WAN setting when I solved the 2PC implementation problem.

I tried to run 2pc.json for this and bugs happened. Here is the output:

E0408 15:23:17.155576810 9376 chttp2_server.cc:1060] UNKNOWN:No address added out of total 1 resolved for '127.0.0.1:61320' {created_time:"2024-04-08T15:23:17.155547663+08:00", children:[UNKNOWN:Unable to configure socket {fd:5, created_time:"2024-04-08T15:23:17.155536205+08:00", children:[UNKNOWN:Address already in use {created_time:"2024-04-08T15:23:17.155524529+08:00", errno:98, os_error:"Address already in use", syscall:"bind"}]}]}
Traceback (most recent call last):
File "/mnt/c/Users/78299/Desktop/privit-main/src/benchmark/nodectl.py", line 47, in
ppd.RPC.serve(args.node_id, nodes_def)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 210, in serve
server.add_insecure_port(nodes_def[node_id])
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_server.py", line 1329, in add_insecure_port
return _common.validate_port_binding_result(
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_common.py", line 181, in validate_port_binding_result
raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
RuntimeError: Failed to bind to address 127.0.0.1:61320; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

I have restarted the program and it is still stuck. Could you please take a look at how to solve this 'Address already in use' problem? Thanks.

from spu.

anakinxc commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine? The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.
An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?

multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.

Hi @anakinxc. Thanks for your helpful explanation. I'll try real LAN/WAN setting when I solved the 2PC implementation problem.

I tried to run 2pc.json for this and bugs happened. Here is the output:

E0408 15:23:17.155576810 9376 chttp2_server.cc:1060] UNKNOWN:No address added out of total 1 resolved for '127.0.0.1:61320' {created_time:"2024-04-08T15:23:17.155547663+08:00", children:[UNKNOWN:Unable to configure socket {fd:5, created_time:"2024-04-08T15:23:17.155536205+08:00", children:[UNKNOWN:Address already in use {created_time:"2024-04-08T15:23:17.155524529+08:00", errno:98, os_error:"Address already in use", syscall:"bind"}]}]} Traceback (most recent call last): File "/mnt/c/Users/78299/Desktop/privit-main/src/benchmark/nodectl.py", line 47, in ppd.RPC.serve(args.node_id, nodes_def) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 210, in serve server.add_insecure_port(nodes_def[node_id]) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_server.py", line 1329, in add_insecure_port return _common.validate_port_binding_result( File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_common.py", line 181, in validate_port_binding_result raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address) RuntimeError: Failed to bind to address 127.0.0.1:61320; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

I have restarted the program and it is still stuck. Could you please take a look at how to solve this 'Address already in use' problem? Thanks.

If it's local host, try to use a different port.

from spu.

warpoons commented on June 12, 2024

Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different:
node-side:
[2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320
[2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0
E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320
[2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
[2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last):
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run
ret_objs = fn(self, *args, **kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init
link = libspu.link.create_brpc(desc, my_rank)
RuntimeError: what:
[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start
stacktrace:
#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8
#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84
#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b
#3 cfunction_call+0x4fc697

Terminal-side:
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init
results = [future.result() for future in futures]
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in
results = [future.result() for future in futures]
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run
return self._call(self._stub.Run, fn, *args, **kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call
raise Exception("remote exception", result)
Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))

Thanks.

from spu.

anakinxc commented on June 12, 2024

Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different: node-side: [2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320 [2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0 E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320 [2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread [2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last): File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run ret_objs = fn(self, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init link = libspu.link.create_brpc(desc, my_rank) RuntimeError: what: [external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start stacktrace: #0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8 #1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84 #2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b #3 cfunction_call+0x4fc697

Terminal-side: File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run return self._call(self._stub.Run, fn, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call raise Exception("remote exception", result) Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))

Thanks.

Can you share snippet of your 2pc.json?

from spu.

warpoons commented on June 12, 2024

Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different: node-side: [2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320 [2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0 E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320 [2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread [2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last): File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run ret_objs = fn(self, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init link = libspu.link.create_brpc(desc, my_rank) RuntimeError: what: [external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start stacktrace: #0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8 #1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84 #2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b #3 cfunction_call+0x4fc697
Terminal-side: File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run return self._call(self._stub.Run, fn, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call raise Exception("remote exception", result) Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))
Thanks.

Can you share snippet of your 2pc.json?

Sure. Here is the whole 2pc.json:
{
"id": "colocated.2pc",
"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},
"devices": {
"SPU": {
"kind": "SPU",
"config": {
"node_ids": [
"node:0",
"node:1"
],
"experimental_data_folder": [
"/tmp/spu_data_0/",
"/tmp/spu_data_1/"
],
"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],
"runtime_config": {
"protocol": "SEMI2K",
"field": "FM64",
"enable_pphlo_profile": true,
"enable_hal_profile": true
}
}
},
"P1": {
"kind": "PYU",
"config": {
"node_id": "node:0"
}
},
"P2": {
"kind": "PYU",
"config": {
"node_id": "node:1"
}
}
}
}

from spu.

warpoons commented on June 12, 2024

Hi @anakinxc. I have checked the 2pc.json file again and found that:

"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},

and

"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],

should be different. I modified them to 127.0.0.1:61320/127.0.0.1:61321 and 127.0.0.1:61330/127.0.0.1:61331 and it works.

But I don't understand how these IPs are determined and why they need to be different？

Thank you for your patient comments.

from spu.

anakinxc commented on June 12, 2024

Hi @anakinxc. I have checked the 2pc.json file again and found that:

"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},

and

"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],

should be different. I modified them to 127.0.0.1:61320/127.0.0.1:61321 and 127.0.0.1:61330/127.0.0.1:61331 and it works.

But I don't understand how these IPs are determined and why they need to be different？

Thank you for your patient comments.

IPs in nodes are for python layer communications, and spu internal addrs are for spu runtimes.

In current implementation, these layers are using different RPC frameworks, so they have to be different at this point.

from spu.

warpoons commented on June 12, 2024

Hi @anakinxc. I have checked the 2pc.json file again and found that:

"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},

and

"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],

should be different. I modified them to 127.0.0.1:61320/127.0.0.1:61321 and 127.0.0.1:61330/127.0.0.1:61331 and it works.
But I don't understand how these IPs are determined and why they need to be different？
Thank you for your patient comments.

IPs in nodes are for python layer communications, and spu internal addrs are for spu runtimes.

In current implementation, these layers are using different RPC frameworks, so they have to be different at this point.

Thank you so much for your explanation. I'll keep this in mind.

from spu.

How to use SPU to evaluate private models in 2PC setting with only one machine? about spu HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent