Giter Club home page Giter Club logo

Comments (14)

tpppppub avatar tpppppub commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine?
The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

from spu.

warpoons avatar warpoons commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine? The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.

An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?

from spu.

tpppppub avatar tpppppub commented on June 12, 2024

I don't think they are strictly equivalent. It depends on your network conditions.

from spu.

warpoons avatar warpoons commented on June 12, 2024

I don't think they are strictly equivalent. It depends on your network conditions.

Get it. Thanks.

from spu.

anakinxc avatar anakinxc commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine? The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.

An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?

multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.

from spu.

warpoons avatar warpoons commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine? The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.
An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?

multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.

Hi @anakinxc. Thanks for your helpful explanation. I'll try real LAN/WAN setting when I solved the 2PC implementation problem.

I tried to run 2pc.json for this and bugs happened. Here is the output:

E0408 15:23:17.155576810 9376 chttp2_server.cc:1060] UNKNOWN:No address added out of total 1 resolved for '127.0.0.1:61320' {created_time:"2024-04-08T15:23:17.155547663+08:00", children:[UNKNOWN:Unable to configure socket {fd:5, created_time:"2024-04-08T15:23:17.155536205+08:00", children:[UNKNOWN:Address already in use {created_time:"2024-04-08T15:23:17.155524529+08:00", errno:98, os_error:"Address already in use", syscall:"bind"}]}]}
Traceback (most recent call last):
File "/mnt/c/Users/78299/Desktop/privit-main/src/benchmark/nodectl.py", line 47, in
ppd.RPC.serve(args.node_id, nodes_def)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 210, in serve
server.add_insecure_port(nodes_def[node_id])
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_server.py", line 1329, in add_insecure_port
return _common.validate_port_binding_result(
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_common.py", line 181, in validate_port_binding_result
raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
RuntimeError: Failed to bind to address 127.0.0.1:61320; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

I have restarted the program and it is still stuck. Could you please take a look at how to solve this 'Address already in use' problem? Thanks.

from spu.

anakinxc avatar anakinxc commented on June 12, 2024

Is there a way to perform 2PC private inference using only one machine? The flax resnet example simulates multiple parties via multiple processes so it's running on a single server. To run 2PC just need to replace the 3pc config with 2pc config. If you want to limit the network bandwidth and latency to simulate WAN settings, you need to run SPU runtime on multiple nodes (VMs or docker containers). Please refer to this document.

Hi @tpppppub. Thanks for your prompt reply. Directly replacing the 3pc in ResNet50 config with 2pc and modifying the "protocol": "CHEETAH" to "protocol": "SEMI2K" works for me.
An additional question: is the simulation of multiple parties via multiple processes somewhat equivalent to LAN? If so, how much bandwidth and latency is this simulation equivalent to the countparts of LAN settings?

multiprocess simulation's network is just a local loopback, which usually does not actually go through network adaptor and is significantly faster than real LAN.

Hi @anakinxc. Thanks for your helpful explanation. I'll try real LAN/WAN setting when I solved the 2PC implementation problem.

I tried to run 2pc.json for this and bugs happened. Here is the output:

E0408 15:23:17.155576810 9376 chttp2_server.cc:1060] UNKNOWN:No address added out of total 1 resolved for '127.0.0.1:61320' {created_time:"2024-04-08T15:23:17.155547663+08:00", children:[UNKNOWN:Unable to configure socket {fd:5, created_time:"2024-04-08T15:23:17.155536205+08:00", children:[UNKNOWN:Address already in use {created_time:"2024-04-08T15:23:17.155524529+08:00", errno:98, os_error:"Address already in use", syscall:"bind"}]}]} Traceback (most recent call last): File "/mnt/c/Users/78299/Desktop/privit-main/src/benchmark/nodectl.py", line 47, in ppd.RPC.serve(args.node_id, nodes_def) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 210, in serve server.add_insecure_port(nodes_def[node_id]) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_server.py", line 1329, in add_insecure_port return _common.validate_port_binding_result( File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/grpc/_common.py", line 181, in validate_port_binding_result raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address) RuntimeError: Failed to bind to address 127.0.0.1:61320; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

I have restarted the program and it is still stuck. Could you please take a look at how to solve this 'Address already in use' problem? Thanks.

If it's local host, try to use a different port.

from spu.

warpoons avatar warpoons commented on June 12, 2024

Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different:
node-side:
[2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320
[2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0
E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320
[2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
[2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last):
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run
ret_objs = fn(self, *args, **kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init
link = libspu.link.create_brpc(desc, my_rank)
RuntimeError: what:
[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start
stacktrace:
#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8
#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84
#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b
#3 cfunction_call+0x4fc697

Terminal-side:
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init
results = [future.result() for future in futures]
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in
results = [future.result() for future in futures]
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run
return self._call(self._stub.Run, fn, *args, **kwargs)
File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call
raise Exception("remote exception", result)
Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))

Thanks.

from spu.

anakinxc avatar anakinxc commented on June 12, 2024

Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different: node-side: [2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320 [2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0 E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320 [2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread [2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last): File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run ret_objs = fn(self, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init link = libspu.link.create_brpc(desc, my_rank) RuntimeError: what: [external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start stacktrace: #0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8 #1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84 #2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b #3 cfunction_call+0x4fc697

Terminal-side: File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run return self._call(self._stub.Run, fn, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call raise Exception("remote exception", result) Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))

Thanks.

Can you share snippet of your 2pc.json?

from spu.

warpoons avatar warpoons commented on June 12, 2024

Hi @anakinxc. I have restarted my computer and tried different ports. Bugs turn to be different: node-side: [2024-04-08 15:46:29,585] [MainProcess] Starting grpc server at 127.0.0.1:61320 [2024-04-08 15:47:32,656] [MainProcess] Run : builtin_spu_init at node:0 E0408 15:47:32.659998 310 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Fail to listen 127.0.0.1:61320 [2024-04-08 15:47:32.670] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread [2024-04-08 15:47:32,671] [MainProcess] Traceback (most recent call last): File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run ret_objs = fn(self, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init link = libspu.link.create_brpc(desc, my_rank) RuntimeError: what: [external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start stacktrace: #0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8 #1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84 #2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b #3 cfunction_call+0x4fc697
Terminal-side: File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in init results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 979, in results = [future.result() for future in futures] File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 253, in run return self._call(self._stub.Run, fn, *args, **kwargs) File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 246, in _call raise Exception("remote exception", result) Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 328, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/warpoons/anaconda3/envs/spu_py310/lib/python3.10/site-packages/spu/utils/distributed.py", line 550, in builtin_spu_init\n link = libspu.link.create_brpc(desc, my_rank)\nRuntimeError: what: \n\t[external/yacl/yacl/link/transport/brpc_link.cc:104] brpc server failed start\nstacktrace: \n#0 yacl::link::FactoryBrpc::CreateContext()+0x7f5b2adda0c8\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f5b293e6f84\n#2 pybind11::cpp_function::dispatcher()+0x7f5b293ba98b\n#3 cfunction_call+0x4fc697\n\n\n'))
Thanks.

Can you share snippet of your 2pc.json?

Sure. Here is the whole 2pc.json:
{
"id": "colocated.2pc",
"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},
"devices": {
"SPU": {
"kind": "SPU",
"config": {
"node_ids": [
"node:0",
"node:1"
],
"experimental_data_folder": [
"/tmp/spu_data_0/",
"/tmp/spu_data_1/"
],
"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],
"runtime_config": {
"protocol": "SEMI2K",
"field": "FM64",
"enable_pphlo_profile": true,
"enable_hal_profile": true
}
}
},
"P1": {
"kind": "PYU",
"config": {
"node_id": "node:0"
}
},
"P2": {
"kind": "PYU",
"config": {
"node_id": "node:1"
}
}
}
}

from spu.

warpoons avatar warpoons commented on June 12, 2024

Hi @anakinxc. I have checked the 2pc.json file again and found that:

"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},

and

"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],

should be different. I modified them to 127.0.0.1:61320/127.0.0.1:61321 and 127.0.0.1:61330/127.0.0.1:61331 and it works.

But I don't understand how these IPs are determined and why they need to be different?

Thank you for your patient comments.

from spu.

anakinxc avatar anakinxc commented on June 12, 2024

Hi @anakinxc. I have checked the 2pc.json file again and found that:

"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},

and

"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],

should be different. I modified them to 127.0.0.1:61320/127.0.0.1:61321 and 127.0.0.1:61330/127.0.0.1:61331 and it works.

But I don't understand how these IPs are determined and why they need to be different?

Thank you for your patient comments.

IPs in nodes are for python layer communications, and spu internal addrs are for spu runtimes.

In current implementation, these layers are using different RPC frameworks, so they have to be different at this point.

from spu.

warpoons avatar warpoons commented on June 12, 2024

Hi @anakinxc. I have checked the 2pc.json file again and found that:

"nodes": {
"node:0": "127.0.0.1:61320",
"node:1": "127.0.0.1:61321"
},

and

"spu_internal_addrs": [
"127.0.0.1:61320",
"127.0.0.1:61321"
],

should be different. I modified them to 127.0.0.1:61320/127.0.0.1:61321 and 127.0.0.1:61330/127.0.0.1:61331 and it works.
But I don't understand how these IPs are determined and why they need to be different?
Thank you for your patient comments.

IPs in nodes are for python layer communications, and spu internal addrs are for spu runtimes.

In current implementation, these layers are using different RPC frameworks, so they have to be different at this point.

Thank you so much for your explanation. I'll keep this in mind.

from spu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.