Appears to be an issue when supplying more than 2 group levels. Works fine with 1 or 2

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error in `[.data.table`(UpdateData., , `:=`(eval(GroupVariables.) about autoquant HOT 5 CLOSED

PyntieHet commented on May 27, 2024

Error in `[.data.table`(UpdateData., , `:=`(eval(GroupVariables.)

from autoquant.

Comments (5)

AdrianAntico commented on May 27, 2024

@PyntieHet Can you try changing the name of "Product.Type" to "ProductType" and rerunning? If that doesn't work, can you send over your code and a data snippet (feel free to mask the data). I just tested a 3 group case with and without XREGS on my side and had no issue.

from autoquant.

PyntieHet commented on May 27, 2024

Changed the column name and got the same error.

As I was preparing a reproduceable example to send over, I just trimmed down the dataset from 11M rows (medium dataset here - I've got another pushing 1B rows for future use) down to 10,000 and it appears to be working as intended. I'll be honest I wasn't expecting that.

Just tested with 4 groups on the smaller data and it appears to be working as intended as well.

So I don't know if might be a ram allocation issue on my pc forcing it to drop a column before trying to write the output but appears to be an issue with the larger data I am providing and not number of group variables. As it's still persisting on an additional run, I'll have to see if this persists on cloud compute with more resources but I am thinking this might be a hardware issue currently.

from autoquant.

AdrianAntico commented on May 27, 2024

@PyntieHet That is interesting. How much RAM do you have available? It's possible that an operation gets cut short due to the memory issue but then proceeds to try the next step, which is dependent on the previous step causing an error. I haven't tried building a forecast model with that much data. data.table is used throughout which should minimize the RAM usage (and run times), however. I can do some testing to see how far I can push it on my side but is sounds like spinning up a cloud instance might be needed for the 1B row example you're mentioning.

You could try separating out the data based on the group levels combinations and then build separate models for each subset... I'm going to close the ticket for now.

from autoquant.

PyntieHet commented on May 27, 2024

I currently have 32GB available at the moment but it does go to near max so I've got a feeling that's it. I am currently doing your suggestion of splitting out into individual models at the moment but was hoping not to.

I did try disk.frame which is built on data.table as a way to circumvent ram limits but it doesn't fit well with the current package structure as it's still experimental so had to put that to the side for now.

An ideal option would be to stream in new data on an existing model, as that would dramatically reduce the overhead I would need to update this routinely. That appears to be available in the python version of CatBoost with init_model but is oddly missing in the R version from what I can tell.

Thanks for looking into this for me though.

from autoquant.

AdrianAntico commented on May 27, 2024

You could setup a parallel::foreach() job that runs each model independently. Just make sure you are feeding the subsetted data to each job otherwise the full data gets duplicated in each job.

from autoquant.

Error in `[.data.table`(UpdateData., , `:=`(eval(GroupVariables.) about autoquant HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent