Giter Club home page Giter Club logo

lt_downsampling_java8's Introduction

Largest-Triangle downsampling algorithm implementations for Java8

These implementations are based on the paper "Downsampling Time Series for Visual Representation" by Sveinn Steinarsson from the Faculty of Industrial Engineering, Mechanical Engineering and Computer Science University of Iceland (2013). You can read the paper here

The goal of Largest-Triangle downsampling algorithms for data visualization is to reduce the number of points in a number series without losing important visual features of the resulting graph. It is important to be aware that these algorithms are not numerically correct.

Download

Latest version: 0.1.0

You can add this library into your Maven/Gradle/SBT/Leiningen project thanks to JitPack.io. Follow the instructions here.

Example Gradle instructions

Add this into your build.gradle file:

allprojects {
  repositories {
    maven { url 'https://jitpack.io' }
  }
}

dependencies {
  implementation 'com.github.ggalmazor:lt_downsampling_java8:0.1.0'
}

Largest-Triangle Three-Buckets

This version of the algorithm groups numbers in same sized buckets and then selects from each bucket the point that produces the largest area with points on neighbour buckets.

You can produce a downsampled version of an input series with:

List<Point> input = Arrays.asList(...);
int numberOfBuckets = 200;

List<Point> output = LTThreeBuckets.ofSorted(input, numberOfBuckets);

First and last points of the original series are always in the output. Then, the rest are grouped into the defined amount of buckets and the algorithm chooses the best point from each bucket, resulting in a list of 202 elements.

Notes on Point types

  • This library requires to provide lists of instances of the Point supertype.
  • It also provides and uses internally the DoublePoint subtype, which can also be used to feed data to the library.
  • However, users are free to create implementations of Point that best fit their Domain.

Largest-Triangle Dynamic

Not yet implemented

Example

This is how a raw timeseries with ~5000 data points and downsampled versions (2000, 500, and 250 buckets) look like (graphed by AirTable) image image image image

These are closeups for 250, 500, 1000, and 2000 buckets with raw data in the back: image image image image

Other java implementations you might want to check

lt_downsampling_java8's People

Contributors

capsuleman avatar dependabot[bot] avatar ggalmazor avatar github-actions[bot] avatar gradle-update-robot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lt_downsampling_java8's Issues

Memory consumption issue

Hi @ggalmazor!
I submitted a PR/opened an issue with a bug in your library about a year ago. Thanks again for your library; it's been incredibly useful in my project. Now, I'm reaching out for a feature request I'd like to implement in your library ๐Ÿ˜Š

Context

In my project, we utilize the down-sampling library to visualize large datasets (approximately 1,000,000 points) on a web interface that can handle only a few thousand points. These points are timeseries with two attributes: timestamp (Date) and measure (long). To integrate with your library, I need to extend this class with com.ggalmazor.ltdownsampling.Point, which uses two BigDecimal. These two additions cost at least 32b x 2 per point.
About a month ago, we encountered a memory issue when multiple requests were made simultaneously. As a quick fix, we increased the application's memory allocation. However, we're now seeking a long-term solution that would be more memory-efficient.

Proposed Implementation

I'm considering creating a new class, lighter than Point, without BigDecimal attributes x and y. Instead, it would have only getters and setters performing on-the-fly conversion between the attributes and the BigDecimal value used by the algorithm. This implementation would introduce a new class without altering the core of the library. The Point class would still be available for backward compatibility.

Potential Blockers

Are there any potential blockers for developing this implementation? I'm thinking about:

  • Utilizing specific BigDecimal methods
  • Intensively using BigDecimal values, which might negatively impact algorithm performance when converting values on-the-fly multiple times
  • Any other considerations?

I'd be thrilled to update the library in this direction and open a PR for these changes. Do you agree with updating the library in this direction?

P.S.: Is there a chance of duplicated arrays during down-sampling (resulting in extra memory usage)?

Last bucket is too wide

Hello @ggalmazor!

Thanks for your implementation of LTTB algorithm, I am currently using it on a project using large timeseries!
In a recent development, I had to downsample a new type of data, and some of them have a size with the same order of magnitude than the number of buckets: ~5500 pts for 1000 buckets for example.
Because the last bucket is taking all the remaining space, it is too wide:

int regular_bucket_size = data.size() / numberOfBucket // 5
int last_bucket_size = data.size() / numberOfBucket + data.size() % numberOfBucket // 505 -> ~10% of data

Which give results like this one.
image

In this implementation in Python, bucket size are more equally split:
https://git.sr.ht/~javiljoen/lttb-numpy/tree/master/item/src/lttb/lttb.py#L81 (cf: https://numpy.org/doc/stable/reference/generated/numpy.array_split.html)

I would be happy to contribute with a fix soon.

Best regards,
Guillaume Vagner

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.