Comments (6)
It's a system limit to number of open files or connections, you can refer to https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/ to increase it.
from xorbits.
from xorbits.
i tried, but it's not working, is that possible to limit the multi-process number in xorbitsai 天天高兴! @.***
…
------------------ 原始邮件 ------------------ 发件人: "xorbitsai/xorbits" @.>; 发送时间: 2023年10月7日(星期六) 上午10:30 @.>; @.@.>; 主题: Re: [xorbitsai/xorbits] BUG: too many open files (Issue #731) It's a system limit to number of open files or connections, you can refer to https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/ to increase it. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hi, could you please provide the complete error stack and message for us to debug? Thanks.
from xorbits.
i tried, but it's not working, is that possible to limit the multi-process number in xorbitsai 天天高兴! @.***
…
------------------ 原始邮件 ------------------ 发件人: "xorbitsai/xorbits" @.>; 发送时间: 2023年10月7日(星期六) 上午10:30 _@**._>; _@.@._>; 主题: Re: [xorbitsai/xorbits] BUG: too many open files (Issue #731) It's a system limit to number of open files or connections, you can refer to https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/ to increase it. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: _@_.*>Hi, could you please provide the complete error stack and message for us to debug? Thanks.
Also, you could try to use mmap
backend to run your code. This way to enable mmap
backend:
import xorbits
xorbits.init(storage_config={"mmap": {"root_dirs": "<your dir>"}})
Keep your codes and just add this line when xorbits init.
from xorbits.
from xorbits.
I met this issue again. Please help to solve.
===================================
I create a dataframe
N1 | N2 |
---|---|
query | ori_txt |
text len 10-50 | 300-5000 |
running code is as below:
import xorbits.pandas as pd
from xorbits.experimental import dedup
data_lst = [{'query': 'xxx', 'ori_txt': 'xxx'}, {'query': 'xxx', 'ori_txt': 'xxx'}]
df = pd.DataFrame(data_lst])
res = dedup(df, col="query", method="minhash", threshold=threshold,
num_perm=num_perm, min_length=min_length, ngrams=ngrams, seed=seed,
verbose=True)
print('ori len: ', len(data_lst))
print('dedup len: ', len(res['query'].tolist()))
data_lst contains 10k+ data.
===================================
I got error as below:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/forkserver.py", line 258, in main
fds = reduction.recvfds(s, MAXFDS_TO_SEND + 1)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/reduction.py", line 159, in recvfds
raise EOFError
EOFError
Traceback (most recent call last):
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xorbits/_mars/deploy/oscar/session.py", line 1954, in get_default_or_create
session = new_session("127.0.0.1", init_local=True, **kwargs)
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xorbits/_mars/deploy/oscar/session.py", line 1924, in new_session
session = SyncSession.init(
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xorbits/_mars/deploy/oscar/session.py", line 1550, in init
isolated_session = fut.result()
File "/home//miniconda3/envs/train_py310/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/home//miniconda3/envs/train_py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xorbits/_mars/deploy/oscar/session.py", line 775, in init
await new_cluster_in_isolation(
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xorbits/_mars/deploy/oscar/local.py", line 101, in new_cluster_in_isolation
await cluster.start()
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xorbits/_mars/deploy/oscar/local.py", line 344, in start
await self._start_worker_pools()
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xorbits/_mars/deploy/oscar/local.py", line 386, in _start_worker_pools
worker_pool = await create_worker_actor_pool(
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xorbits/_mars/deploy/oscar/pool.py", line 310, in create_worker_actor_pool
return await create_actor_pool(
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/api.py", line 179, in create_actor_pool
return await get_backend(scheme).create_actor_pool(
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/indigen/backend.py", line 49, in create_actor_pool
return await create_actor_pool(
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/pool.py", line 1585, in create_actor_pool
pool: MainActorPoolType = await pool_cls.create(
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/pool.py", line 1282, in create
processes, ext_addresses = await cls.wait_sub_pools_ready(tasks)
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/indigen/pool.py", line 221, in wait_sub_pools_ready
process, status = await task
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/indigen/pool.py", line 213, in start_sub_pool
return await create_pool_task
File "/home//miniconda3/envs/train_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/indigen/pool.py", line 203, in start_pool_in_process
process.start()
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/context.py", line 300, in _Popen
return Popen(process_obj)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/popen_forkserver.py", line 35, in __init__
super().__init__(process_obj)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/popen_forkserver.py", line 58, in _launch
f.write(buf.getbuffer())
BrokenPipeError: [Errno 32] Broken pipe
2024-04-12 13:59:09,343 asyncio 671836 ERROR Task was destroyed but it is pending!
task: <Task pending name='Task-10' coro=<MainActorPoolBase.monitor_sub_pools() running at /home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/pool.py:1458> wait_for=<Future pending cb=[Task.task_wakeup()]>>
2024-04-12 13:59:09,344 asyncio 671836 ERROR Task exception was never retrieved
future: <Task finished name='Task-402' coro=<MainActorPool.start_sub_pool() done, defined at /home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/indigen/pool.py:180> exception=OSError(24, 'Too many open files')>
Traceback (most recent call last):
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/indigen/pool.py", line 213, in start_sub_pool
return await create_pool_task
File "/home//miniconda3/envs/train_py310/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home//miniconda3/envs/train_py310/lib/python3.10/site-packages/xoscar/backends/indigen/pool.py", line 203, in start_pool_in_process
process.start()
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/context.py", line 300, in _Popen
return Popen(process_obj)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/popen_forkserver.py", line 35, in __init__
super().__init__(process_obj)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/popen_forkserver.py", line 51, in _launch
self.sentinel, w = forkserver.connect_to_new_process(self._fds)
File "/home//miniconda3/envs/train_py310/lib/python3.10/multiprocessing/forkserver.py", line 87, in connect_to_new_process
with socket.socket(socket.AF_UNIX) as client:
File "/home//miniconda3/envs/train_py310/lib/python3.10/socket.py", line 232, in __init__
_socket.socket.__init__(self, family, type, proto, fileno)
OSError: [Errno 24] Too many open files
===================================
I did operations as below:
sudo sysctl -w fs.file-max=100000
ulimit -S -n 1048576
It still gives out error. Error comes from len(res['query'].tolist()), so how to parse the result.
Thanks for your reply.
from xorbits.
Related Issues (20)
- BUG: df.map_chunk with empty DataFrame cannot work
- BUG: df groupby nunique when by is series type
- BUG: read_parquet generates a memory allocation error HOT 1
- BUG: Integrated pandas can't Read CSV while latest pandas can HOT 1
- BUG: user-defined function groupby.agg has unexpected keyword argument HOT 2
- BUG: pd.read_csv cannot read pathlib.Path
- BUG: read_csv Indexing to a list of numbers is not supported. HOT 1
- BUG: pd.read_csv(compression="gzip") can not run paralllel
- BUG: `xorbits._mars.learn.neighbors.NearestNeighbors` doesn't work
- BUG: service stopped when pivot a 1125138913x5 matrix into 4000 columns on a 160U-4096GBmem machine HOT 1
- BUG: set column when using fallback results
- BUG: FileNotFoundError: [Errno 2] No such file or directory HOT 2
- How to perform deduplication in a cluster environment? HOT 5
- Does xorbits support sklearn and which algorithms are supported? HOT 6
- FEAT: how xorbits datastes export to json file HOT 5
- BUG: How to read local csv file HOT 4
- BUG: xorbits.shutdown occur some error
- BUG: OSError: [Errno 24] Too many open files HOT 5
- ENH: xorbits's read_parquet compatible with pandas on pyarrow engine
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xorbits.