Comments (6)
There are plans to add automated model retraining/finetuning but they will be outside of Obsei. Idea to keep isolation between model serving part and training/finetuning part. First we might need to think how to collect feedback about model prediction automatically from the user.
For now we can go ahea with pythonic implementation.
from obsei.
@lalitpagaria I think here we can use PyTorch data loaders, it'd elegantly batch data for us.
from obsei.
@shahrukhx01 If I am correct, you are talking about this https://pytorch.org/docs/stable/data.html ?
But it will add dependency to pytorch for other parts like source, sink and few simple non transformers based analyzer.
How about using simple solution like following -
https://stackoverflow.com/questions/8290397/how-to-split-an-iterable-in-constant-size-chunks
Also currently in analyzers we are calling transformers pipeline for single text sequentially. But few pipeline support list[str] so we can leverage transformers internal batching as well.
from obsei.
@lalitpagaria yes, if that's the case then we can go with the plain pythonic implementation, as long as you don't have any plans to add automated model retraining/finetuning, then such implementation can come handy, otherwise, what you suggested fits our requirement.
from obsei.
@lalitpagaria Could you please confirm which of the following analyzers need this change? Since, this won't take much I'll create a quick PR on this based on the pythonic approach
- base_analyzer.py
- classification_analyzer.py
- dummy_analyzer.py
- ner_analyzer.py
- pii_analyzer.py
- sentiment_analyzer.py
- translation_analyzer.py
from obsei.
Almost all as each receive list as input. Just to make sure batch_size
input param should be configurable.
from obsei.
Related Issues (20)
- [BUG] Google News only return 100 query even if max_results is set at 1000 HOT 4
- [BUG] Facebook source failing with unexpected keyword `long_term_token`
- Integrate Freshdesk, Salesforce and SAP
- [BUG] Import issue on Python 3.7 version
- Make StrongCopyleft dependencies optional HOT 1
- [Observer] Youtube comments integration HOT 1
- google Colab getting "no module named 'dateparser'" HOT 11
- ModuleNotFoundError: No module named 'torch' HOT 5
- [BUG] TypeError: '<' not supported between instances of 'datetime.datetime' and 'NoneType' HOT 5
- [BUG]TwitterSourceConfig - AttributeError: At least one non empty parameter required (query, keywords, hashtags, and usernames) HOT 5
- [Observer] Add Youtube Transcript support
- [BUG] Map Review observer not honouring cutoff date
- More granular dependency division to choose analyzer dependencies
- Tiyaro API integration for analyzer HOT 1
- Fix obsei website
- OpenAI GPT3 integration as analyzer
- UNNABLE to import NER ANALYZER not found . HOT 1
- [BUG] Runtime error Demo version . HOT 5
- [BUG] pydantic.errors.PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package HOT 1
- [BUG] Tutorial 4 first step for Google Collab - pip install obsei[all] partially fails with a pip dependency error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from obsei.