Giter Club home page Giter Club logo

Comments (4)

emeli-dral avatar emeli-dral commented on May 21, 2024 1

Hey @rmminusrslash ,
thanks for more details!

We thought about adding an error message based on data size. But the limit would depend on the user infrastructure especially if used locally, so it would be hard to set a universal threshold when sampling should be applied. And as a priority, we are also working right now to speed up the UI which should solve part of cases when reports are too large to display. Hopefully, it will help a lot 🤞

We are thinking about adding a flag later that the user can set on their own ("large dataset") which would then generate a variation of report that is best suited for larger datasets. It will include not only sampling but a different aggregated views for some parts of the report.

Agree on your comment of making the limitation for large datasets and sampling option even more clear for Jupyter notebook: we already added this now to the Quick-start part of the docs.

from evidently.

emeli-dral avatar emeli-dral commented on May 21, 2024

Hey @rmminusrslash,
Thanks for reporting! Unfortunately, this is the current limitation of the tool.

The report is large because the tool stores all the data necessary to generate interactive plots directly inside the HTML. We plan to fix it when we create a service version of the tool (where we decouple the data storage and the browser-based web service).

For now there are two workarounds:

  1. Use some sampling strategy for your dataset, for instance random sapling. For Jupyter notebook, that can be done directly with pandas. For command line interface, we have a configuration - you can choose random sampling or pick the n-th rows.
  2. Use JSON profile. This way, Evidently calculates the metrics and statistical tests but they can be logged or displayed elsewhere. We have an example for MLflow https://docs.evidentlyai.com/step-by-step-guides/integrations/evidently-+-mlflow and i am working now on one for Grafana.

We understand this limits how you can use the tool now, and are working hard to get to the more feature-full version!

from evidently.

rmminusrslash avatar rmminusrslash commented on May 21, 2024

Hey @emeli-dral,

ah, I probably should have been more clear about what I was asking. I tried sampling when I figured out the root cause, up to 10K datapoints worked.

Would it make sense to

  • add sampling as the default if the dataset exceeds current limits (display a message that sampling happened)
  • if you decide against it, at least raise an unsupported exception that mentions the sampling option and mention the limitation in the docs

The current behavior of failing silently might not be ideal until you release the full version (unless you expect people to try the tool mostly with toy data)

from evidently.

emeli-dral avatar emeli-dral commented on May 21, 2024

Now reports by default do not use any raw data plots and this reduces reports size significantly

from evidently.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.