Giter Club home page Giter Club logo

Comments (9)

david-cortes avatar david-cortes commented on August 23, 2024

Thanks for the bug report. If you were using the version of this library that was uploaded yesterday to PyPI, there was a bad bug that would make code that builds more than one model either crash or produce near-random results. Could you confirm whether the issue is still present with the latest version of this library, uploaded today (0.5.20.post3) ?:

pip install isotree==0.5.20.post3

from isotree.

tufanbt avatar tufanbt commented on August 23, 2024

I found a workaround using a for loop in a bash command invoking python script for each iteration instead of the for loop inside python script.
As the commit you referenced above, I got some interesting error messages at the end of each script run, although the code worked fine and printed and saved some outputs as expected. The error messages were like some "double-free" stuff and
"segmentation fault (core dumped)".
Each script built just one model, should I worry about their output quality (like could those be near-random results)?

from isotree.

david-cortes avatar david-cortes commented on August 23, 2024

To clarify: are you still seeing these error messages with version 0.5.20.post3 (as opposed to 0.5.20.post2)?

Regarding the reliability of the models - if at any point you see a message about double-free or memory corruption in any library, then yes, there is some chance that whatever the model outputs could be random noise.

from isotree.

tufanbt avatar tufanbt commented on August 23, 2024

For now, there does not seem to be such error messages anymore. With the for loop, my memory usage swings between 580-620 GBs during iterations which was at ~35GB's just before starting for loop. So, I am assuming imputer part is eating up all that memory after training with the size of my data and some categorical features. So, here is the final question: How can I reset the memory usage of the IsolationForest instance without killing the kernel (on Jupyter) or ending the process (while running .py scripts)?

from isotree.

david-cortes avatar david-cortes commented on August 23, 2024

I don't know. This library uses cython which internally should call a method __dealloc__ at some point when the object is garbage collected. I think that's likely to happen when you delete the object and then call gc.collect(), but am not sure if it's guaranteed to happen every time you call the GC manually or if it follows some other heuristic, or whether cython itself holds of the call to __dealloc__ for later or not.

from isotree.

tufanbt avatar tufanbt commented on August 23, 2024

Well then, my problem has nothing to do with this library, but maybe its dependencies (as far as I understand). I am closing the issue.

from isotree.

tufanbt avatar tufanbt commented on August 23, 2024

Here is a more tangible problem: imputer.drop_imputer() kills jupyter lab kernel without any messages in notebook or terminal running jupyter. That can be related to my problem, as this is seemingly what documentation suggests me to clear memory usage of imputer. Reopening the issue.

from isotree.

david-cortes avatar david-cortes commented on August 23, 2024

Yes, thanks for pointing this out. Should be solved now:

pip install -U git+https://github.com/david-cortes/isotree.git

Also, if I understood your problem, you are monitoring memory consumption through some external tool after fitting the model - in such case, even dropping the imputer might not show a large difference, since memory consumed and freed by a process is not released back to the OS in its entirety unless you're using some system like FreeBSD, or unless you're LD_PRELOAD'ing libjemalloc.so or similar. But in such case, that memory consumed by the process should not increase as more objects are created, as it will reuse what it had previously requested.

from isotree.

tufanbt avatar tufanbt commented on August 23, 2024

I am using htop on a Linux machine. Thatโ€™s fine if it will reuse it as you suggested, and my recent experience validates your suggestion. Closing the issue (for good i hope ๐Ÿ˜„)

from isotree.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.