Comments (7)
I don't think you will need to write twice the model. Only the inference part (which looks quite easy). For your next model you could just write the fit
method in numpy and the predict
in pytorch so that you don't to replicate any work. Keep us posted!
from hummingbird.
Ciao Marco, adding a custom op shouldn't be too hard. Unfortunately at the moment we don't provide a specific API for this but I can tell you how you can do it. (we love contributions 😄).
So first thing you need to add the class of your custom op among the supported ops.
Then you need to write a converter taking as input your operator and returning a pytorch model version. To do this, first you need to register a converter. You can use this as an example where instead of having "SklearnMLPClassifier"
you should put "Sklearn_your_custom_op_class_name"
.
Then you need to provide the actual converter. Given your implementation that is pretty much uses a bunch on np
funtions, should be straightforward to implement it. You can look into other converts implementations to get an idea on how you can do it. For example here.
Let me know if this works for you.
from hummingbird.
Hi @interesaaat!
Thank you for your reply!
So, if I understand well, the idea is that you take parameters and other relevant attributes (e.g. classes_
) from fitted sklearn
estimators and pass them to a corresponding nn.Module
, that implements the same logic.
However, I'm wondering if, in this case, it's easier to simply create a new nn.Module
class (instead of inheriting from sklearn.base.BaseEstimator
) that internally uses an hummingbird-converted class.
I'm not 100% sure how I would write it, but what I mean is something like this (ignoring the fact that inverse_transform
is not supported):
class PCADetector(torch.nn.Module):
def __init__(self, n_components):
super().__init__()
self.n_components = n_components
def fit(self, X: np.ndarray):
model = PCA(n_components=self.n_components)
model.fit(X)
self.estimator_ = convert(model, backend="pytorch", test_input=X)
def forward(self, x):
x_hat = self.estimator_.inverse_transform(self.estimator_.transform(x))
residuals = x - x_hat
spe = np.sqrt(np.sum(residuals**2, axis=1))
return spe
To give a little bit of context, I'll try to explain why I would like to do.
The final goal is being able to deploy these models without having to deal with python package. The big problem of sklearn
, and python in general for machine learning, is that it's very difficult to deploy custom models when you are not allowed to use docker in production (our case). Custom models might be defined in a project-related repo, and the only way to ship it is to bundle them together with the source code. But this is something we want to avoid, as it might raise other dependencies issue.
Another approach is to compile or convert the model to somthing like onnx or tvm. However, onnx and tvm support is very limited for custom models that are not using deep learning frameworks. That's why I'm trying to understand if Hummingbird could help me.
However, I'm not sure if a composition approach, like the one above, can be adapted to work with subsequent conversions to onnx or similar.
On the other side, maybe following the official approach you described to register custom operators in Hummingbird, might be more robust.
What do you think?
Thanks again,
Marco.
from hummingbird.
Yea the approach above won't work because even if you wrap the model as a pytorch module, the internal code still uses numpy so you will need that dependency + python. Hummingbird should be able to help in your use case because as long as you provide your model implementation as tensor operations, using TorchScript or ONNX you can export it without any python or other dependencies.
from hummingbird.
The only thing I don't like is having to write twice the same model. But, at this point, I guess that this is the only way to go (I've really investigated all possible solutions I could find). Because the only alternative solution I see, is to directly write the code in a compiled language.
I'll try to implement it and let you know if that works.
from hummingbird.
Closing at the moment. We can reopen in case.
from hummingbird.
Unfortunately I haven't had the time to work on it. I'll update you as soon as I can.
Thanks!
from hummingbird.
Related Issues (20)
- Next release? HOT 4
- xgboost tweedie loss predictions do not match HOT 2
- Pandas 2.0.0 breaks pyspark which breaks our tests HOT 1
- New onnx - failed workflow run HOT 2
- PyTorch SGDClassifier `predict` result does not match Sklearn model
- Compilation issue in Prophet HOT 4
- AttributeError: 'NoneType' object has no attribute 'split' HOT 4
- Update github actions vers
- How to compose hummingbird model with other torchscript models HOT 4
- TVM MacOS pipeline intermittent failures HOT 3
- incorrect prediction from torchscript model converted from xgboost classifier trained with multi-label dataset HOT 8
- Build is failing - SKL 1.3 release HOT 1
- TVM MacOS pipeline failing again HOT 1
- New LGBM Version 4.0.0 changes HOT 2
- TVM + Mac HOT 3
- Performing simple inference HOT 3
- XGBoost 2.0.0 breaks tests HOT 8
- Example cases for DecisionTreeClassifier HOT 2
- onnxruntime==1.16.0 release breaks tests HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hummingbird.