Comments (4)
Wendy Wong commented: [https://support.h2o.ai/a/tickets/102968|https://support.h2o.ai/a/tickets/102968|smart-link] -> More details of investigations and logs
When running
{quote}self.raw_frame = h2o.assign(self.raw_frame, 'raw_frame')
{quote}
on a H2O Cluster with 6 nodes of 8gb for a dataset of 3.1gb a connection error occurs:
h2o.exceptions.H2OConnectionError: Unexpected HTTP error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Right before the assign, the data is loaded and sorted with the following code:
{quote}temp_raw = h2o.import_file(path=os.path.join(self.config['input_dir'], self.config['data_frame_file']),{quote}
{quote}destination_frame="temp_raw", col_names=col_names, col_types=dict_types_import,{quote}
{quote}skipped_columns=skipped_indices, na_strings=["Infinity", "-Infinity", "NaN"]){quote}
{quote}[http://logging.info|http://logging.info|smart-link]("ID of temp_raw " + temp_raw.frame_id){quote}
{quote}http://logging.info|http://logging.info|smart-link{quote}
{quote}self.raw_frame = temp_raw.sort([self.config['col_name_id']]){quote}
{quote}[http://logging.info|http://logging.info|smart-link]("ID of temp_raw " + temp_raw.frame_id){quote}
{quote}http://logging.info|http://logging.info|smart-link{quote}
When removing the sort, the assign succeeds. When using the exact same code (so with the sort) on a data set of 1.6gb, the assign also succeeds. And when running the sort+assign on a single node cluster of 48gb, the code also succeeds.
When replacing the assign by another function or property on the self.raw_frame, e.g. nrows, frame_id these calls also fail.
In annex the output of the script we are running. We added some [http://logging.info|http://logging.info|smart-link] statements regarding the H2O frames present in the H2O backend, which you can see with statements as:[2022-09-15 12:57:25] [INFO ] – ['cols_frame', 'temp_raw']
from h2o-3.
Wendy Wong commented: Yuliia has generated the dataset that will reproduce the error. It is located in s3://h2o-public-test-data/bigdata/laptop/sort_merge_tests/pubdev_8866_mem_leak_data.csv
from h2o-3.
Wendy Wong commented: This one needs to be done by Q2 of 2023.
from h2o-3.
JIRA Issue Details
Jira Issue: PUBDEV-8937
Assignee: Wendy Wong
Reporter: Wendy Wong
State: In Progress
Fix Version: N/A
Attachments: N/A
Development PRs: N/A
from h2o-3.
Related Issues (20)
- Add git user info into release pipeline
- UpliftDRF - fix find best split point
- Add GLM Ordinal regression loglikelihood and AIC calculation.
- XGBoost support all parameters available for booster=gblinear
- Rename loglikelihood to negative_loglikelihood when it actually means the -log(likelihood)
- xgboost extension fails to initialize on JDK 17 due to attempt to use reflection to load native library HOT 2
- UpliftDRF MLI - Implement Shapley values
- The explain function is not working with UpliftDRF model
- Reimplement the explain function to support uplift models
- h2o 3.44.0.3 does not support JDK/Java 21 HOT 1
- Address CVE-2023-35116 in h2o-steam.jar
- Add newer R versions on jenkins for automated tests
- Improve perRow metric calculation
- StackedEnsemble: Error reading MOJO JSON: Object not supported
- Update warning regarding multithreading in H2OFrame.as_data_frame
- Increase run time for Java 8 Core unit HOT 1
- CVEs for Sparkling Water CVE-2023-2976 HOT 1
- CVEs for Sparkling Water CVE-2023-36478 HOT 1
- fix intermittent test failure: Java 8 AutoML JUnit / ai.h2o.automl.AutoMLTest.test_algos_have_default_parameters_enforcing_reproducibility
- One-Class Classification to detect anomaly or not HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2o-3.