Giter Club home page Giter Club logo

Comments (5)

tomasfryda avatar tomasfryda commented on June 11, 2024 1

Thank you for the log entry. I think we'll need to fix both or at least make sure the problem is not in (1) as well.

The exception from java has combination of path separators (/ and \) but the log entry contains just \ so I think there is some wrong conversion of path separators in the java backend. It's possible that the issue (1) is only in the error handling part which could explain why the previous version worked but we should look in to it to make sure.

java.lang.RuntimeException: java.io.FileNotFoundException: [C:\Users\ROEL](file:///C:/Users/ROEL)~1.VER\AppData\Local\Temp\tmpmg5yuobe.h2oframe2Convert.csv 

from h2o-3.

tomasfryda avatar tomasfryda commented on June 11, 2024

Thank you for taking time to pinpoint the issue. Unfortunately, I don't have Windows machine so I have just 2 untested hypotheses:
(1) unexpected character in the path,
(2) two processes trying to open the same file (which is supported on unix-like systems but not on Windows).

If it's just the (1), could you provide us with part of the h2o log? I'm interested in log entry ExportFiles processing (SOME_PATH).
e.g.

01-31 10:35:21.528 127.0.0.1:54321       3076   5715826-38  INFO water.default: ExportFiles processing (/tmp/iris.csv)

If that is the only problem, you could workaround it by adding something like the following to the top of your script/jupyter notebook (just make sure the path exists).

import tempfile
tempfile.tempdir = "C:\\tmp\\"

If it's the (2), we will need to fix creating the temporary file. It should be a simple thing to fix. I think something like the following would do. cc @wendycwong

--- a/h2o-py/h2o/frame.py
+++ b/h2o-py/h2o/frame.py
@@ -1970,12 +1970,16 @@ class H2OFrame(Keyed, H2ODisplay):
         if can_use_pandas() and use_pandas:
             import pandas
             if (can_use_datatable()) or (can_use_polars() and can_use_pyarrow()): # can use multi-thread
-                with tempfile.NamedTemporaryFile(suffix=".h2oframe2Convert.csv") as exportFile:
+                exportFile = tempfile.NamedTemporaryFile(suffix=".h2oframe2Convert.csv", delete=False)
+                try:
+                    exportFile.close()
                     h2o.export_file(self, exportFile.name, force=True)
                     if can_use_datatable(): # use datatable for multi-thread by default
                         return self.convert_with_datatable(exportFile.name)
                     elif can_use_polars() and can_use_pyarrow():  # polar/pyarrow if datatable is not available
                         return self.convert_with_polars(exportFile.name)
+                finally:
+                    os.unlink(exportFile.name)
             warnings.warn("converting H2O frame to pandas dataframe using single-thread.  For faster conversion using"
                           " multi-thread, install datatable (for Python 3.9 or lower), or polars and pyarrow "
                           "(for Python 3.10 or above).", H2ODependencyWarning)

You can patch you h2o library using that code but it might get little more involved. If it's just the (2) I think we could manage to release the fix in the upcoming major release (likely within the next month). If the problem is in (1) as well we would probably require your help in providing us with the line from the log.

from h2o-3.

RoelVerbelen avatar RoelVerbelen commented on June 11, 2024

Hi @tomasfryda

Thanks for your reponse.

Here is that part of the logs:

01-31 10:18:43.312 127.0.0.1:54321       31396  8557915-20  INFO water.default: ExportFiles processing (C:\Users\ROEL~1.VER\AppData\Local\Temp\tmpmg5yuobe.h2oframe2Convert.csv)
01-31 10:18:43.314 127.0.0.1:54321       31396  8557915-20  WARN water.default: File C:\Users\ROEL~1.VER\AppData\Local\Temp\tmpmg5yuobe.h2oframe2Convert.csv exists, but will be overwritten!
01-31 10:18:43.325 127.0.0.1:54321       31396      FJ-1-7 ERROR water.default: 
java.lang.RuntimeException: java.io.FileNotFoundException: C:\Users\ROEL~1.VER\AppData\Local\Temp\tmpmg5yuobe.h2oframe2Convert.csv (The process cannot access the file because it is being used by another process)

Sounds like it might be (2) rather?

from h2o-3.

kalaiselvan263 avatar kalaiselvan263 commented on June 11, 2024

@tomasfryda Do you have any workaround for the fix.

from h2o-3.

tomasfryda avatar tomasfryda commented on June 11, 2024

@kalaiselvan263 Not yet. I think the modification I suggested (#16045 (comment)) would work but I don't have a windows machine to test it on.

You would need to find where the h2o package is installed and navigate to file frame.py. On macOS this gives me the path to the file in python3 which has the h2o installed: import sysconfig; print(sysconfig.get_paths()["purelib"]+"/h2o/frame.py") and I think it would work on Windows as well (you'd just need to change / to \).

If that wouldn't work you can change the exportFile to some predefined path that would not contain any special characters, e.g.:

import random
exportFile = "C:\\tmp\\h2o_tempfile_{}.csv".format(random.randint(0,1e8))

It's not perfect and with this change there could be issues with multiple users trying to do the same thing at the same time but the probability of that is pretty low (1e-8) and on Windows you'd be more likely to end up with the same error The process cannot access the file because it is being used by another process so you'd just have to retry.

from h2o-3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.