Comments (5)
@PyntieHet Can you try changing the name of "Product.Type" to "ProductType" and rerunning? If that doesn't work, can you send over your code and a data snippet (feel free to mask the data). I just tested a 3 group case with and without XREGS on my side and had no issue.
from autoquant.
Changed the column name and got the same error.
As I was preparing a reproduceable example to send over, I just trimmed down the dataset from 11M rows (medium dataset here - I've got another pushing 1B rows for future use) down to 10,000 and it appears to be working as intended. I'll be honest I wasn't expecting that.
Just tested with 4 groups on the smaller data and it appears to be working as intended as well.
So I don't know if might be a ram allocation issue on my pc forcing it to drop a column before trying to write the output but appears to be an issue with the larger data I am providing and not number of group variables. As it's still persisting on an additional run, I'll have to see if this persists on cloud compute with more resources but I am thinking this might be a hardware issue currently.
from autoquant.
@PyntieHet That is interesting. How much RAM do you have available? It's possible that an operation gets cut short due to the memory issue but then proceeds to try the next step, which is dependent on the previous step causing an error. I haven't tried building a forecast model with that much data. data.table is used throughout which should minimize the RAM usage (and run times), however. I can do some testing to see how far I can push it on my side but is sounds like spinning up a cloud instance might be needed for the 1B row example you're mentioning.
You could try separating out the data based on the group levels combinations and then build separate models for each subset... I'm going to close the ticket for now.
from autoquant.
I currently have 32GB available at the moment but it does go to near max so I've got a feeling that's it. I am currently doing your suggestion of splitting out into individual models at the moment but was hoping not to.
I did try disk.frame which is built on data.table as a way to circumvent ram limits but it doesn't fit well with the current package structure as it's still experimental so had to put that to the side for now.
An ideal option would be to stream in new data on an existing model, as that would dramatically reduce the overhead I would need to update this routinely. That appears to be available in the python version of CatBoost with init_model but is oddly missing in the R version from what I can tell.
Thanks for looking into this for me though.
from autoquant.
You could setup a parallel::foreach() job that runs each model independently. Just make sure you are feeding the subsetted data to each job otherwise the full data gets duplicated in each job.
from autoquant.
Related Issues (20)
- Model Fails to Build AutoBanditSarima HOT 11
- h2o HOT 3
- Some items of .SDcols are not column names HOT 2
- Model was not able to be built HOT 1
- Error in AutoCatBoostRegression HOT 1
- Unable to install RemixAutoML HOT 2
- AutoCatBoostCARMA error. HOT 4
- Confidence level and forecast function HOT 8
- Non-Zero Exit Status HOT 1
- Error catboost.train HOT 2
- Some items of .SDcols are not column names; [Predict.V1] HOT 15
- AutoCatBoostCARMA doesn't forecast HOT 9
- unused arguments issue with FakeDataGenerator HOT 3
- AutoXGBoostCARMA Error HOT 11
- [Thanks] Great library HOT 1
- non-zero exit status HOT 1
- AUTO TS HOT 1
- Error when calling ModelInsightsReport: object 'RemixOutput' not found HOT 13
- Installation misprint HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autoquant.