Giter Club home page Giter Club logo

Comments (5)

AdrianAntico avatar AdrianAntico commented on May 27, 2024

@PyntieHet Can you try changing the name of "Product.Type" to "ProductType" and rerunning? If that doesn't work, can you send over your code and a data snippet (feel free to mask the data). I just tested a 3 group case with and without XREGS on my side and had no issue.

from autoquant.

PyntieHet avatar PyntieHet commented on May 27, 2024

Changed the column name and got the same error.

As I was preparing a reproduceable example to send over, I just trimmed down the dataset from 11M rows (medium dataset here - I've got another pushing 1B rows for future use) down to 10,000 and it appears to be working as intended. I'll be honest I wasn't expecting that.

Just tested with 4 groups on the smaller data and it appears to be working as intended as well.

So I don't know if might be a ram allocation issue on my pc forcing it to drop a column before trying to write the output but appears to be an issue with the larger data I am providing and not number of group variables. As it's still persisting on an additional run, I'll have to see if this persists on cloud compute with more resources but I am thinking this might be a hardware issue currently.

from autoquant.

AdrianAntico avatar AdrianAntico commented on May 27, 2024

@PyntieHet That is interesting. How much RAM do you have available? It's possible that an operation gets cut short due to the memory issue but then proceeds to try the next step, which is dependent on the previous step causing an error. I haven't tried building a forecast model with that much data. data.table is used throughout which should minimize the RAM usage (and run times), however. I can do some testing to see how far I can push it on my side but is sounds like spinning up a cloud instance might be needed for the 1B row example you're mentioning.

You could try separating out the data based on the group levels combinations and then build separate models for each subset... I'm going to close the ticket for now.

from autoquant.

PyntieHet avatar PyntieHet commented on May 27, 2024

I currently have 32GB available at the moment but it does go to near max so I've got a feeling that's it. I am currently doing your suggestion of splitting out into individual models at the moment but was hoping not to.

I did try disk.frame which is built on data.table as a way to circumvent ram limits but it doesn't fit well with the current package structure as it's still experimental so had to put that to the side for now.

An ideal option would be to stream in new data on an existing model, as that would dramatically reduce the overhead I would need to update this routinely. That appears to be available in the python version of CatBoost with init_model but is oddly missing in the R version from what I can tell.

Thanks for looking into this for me though.

from autoquant.

AdrianAntico avatar AdrianAntico commented on May 27, 2024

You could setup a parallel::foreach() job that runs each model independently. Just make sure you are feeding the subsetted data to each job otherwise the full data gets duplicated in each job.

from autoquant.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.