Comments (5)
The whole idea is about you must prepare data for boosting. The idea of tree-based methods is that you do cuts, in order to get maximum entropy. Boosting doesn't make any linear equations to the data. So if you have a training parameter which value is in [9, 11] boosting may do some cuts. But as soon as you check it on valid set, where this feature sails between [11, 13] - previous cuts doesn't work at all and we might get 0.5 prediction accuracy. Okay, you need to normalize the data, but in my opinion ordinary sklearn MinMax or Standard Scalers just reshape the data, so kind of shifting transformation may help. So we get deltas, which are way better for classification problems... Moreover we can try normalize these deltas by dividing it by value of an original feature.
My transformation looks like: (df.feature - df.feature.shift(-1))/df.feature
@annaveronika am I right?
from catboost.
We don't have any specific support for time series in catboost, so you need to find your own way to prepare data to use it in gradient boosting.
from catboost.
shouldn't the video be changed, as its misleading?
from catboost.
No, it's a very common thing to use gradient boosting for time series. The way you are using it is up to your task.
from catboost.
I'm wondering how to use the "has_time=True" how do I specify which column is my Date column to Catboost? I saw that there is a way to build a "Data format description" file but how do you pass this to the algorythm and does it use to improve the results? Could you give an example of how to implement this when one of your columns is a pandas Date type and has_time=True? (I also already have columns that explodes the dates into it's components).
from catboost.
Related Issues (20)
- "RuntimeError: Attempt to pop from an empty stack" is raised when running models fit in parallel with threads. HOT 5
- Python package--build from source failed HOT 10
- Different Between PairLogitPairwise and PairLogit and Impact on Categorical Values HOT 1
- Val in CatBoost's plot_tree HOT 1
- Major difference between predictions from trained model HOT 4
- "Plain" train mode still build the oblivious tree HOT 3
- Question for building ordered boosting tree
- Get difference tree result when converting cat_features to numerical values HOT 2
- Why does leaf value in plot tree is related to learning rate?
- How to recursive remove features by best loss ? HOT 2
- Issue with Categorical Feature Encoding in Binary Classification HOT 1
- Request to enable sample weights for Cox and AFT objectives HOT 1
- C++ standalone evaluator multiclass support HOT 1
- The results calculated according to the formula described in the doc are different from the results displayed by the model.
- Documentation: broken links HOT 1
- SetPredictionType(modelHandle, APT_CLASS) is broken HOT 1
- Where is the place for calculating the score function? HOT 2
- Build catboost python package with custom glibc HOT 3
- Custom RMSE loss in tutorial get difference tree structure with the original RMSE loss!!! HOT 3
- Tensor Search Helpers Should Be Unreachable HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from catboost.