Giter Club home page Giter Club logo

Comments (15)

odrotbohm avatar odrotbohm commented on June 2, 2024 4

I'll make sure the optimization gets applied in the 1.2 GA release in two weeks and the corresponding service releases for 1.0.x and 1.1.x.

from spring-modulith.

JakeTiritilli avatar JakeTiritilli commented on June 2, 2024 3

I'm not sure about other databases, but for PostgreSQL we found that the hash index works well given that Modulith is doing a full equality comparison on the serialized event instead of searching for a substring. And it's not restricted by the size limit that the b-tree index has. We saw a significant performance improvement in the setting of the completion timestamp after adding this index.

CREATE INDEX IF NOT EXISTS event_publication_serialized_event_hash_idx ON event_publication USING hash(serialized_event);

from spring-modulith.

odrotbohm avatar odrotbohm commented on June 2, 2024 1

Okay, I'll replace the original index for Postgres then. I think I'll refrain from adding additional indexes to make sure we only support the interactions our code triggers. Totally fine if you add more indexes to support interactions you need on top.

from spring-modulith.

odrotbohm avatar odrotbohm commented on June 2, 2024

If you want to use bigger event payloads, you need to adapt the schema definitions declared here to scale up the size of the serialized event column.

from spring-modulith.

razubuddy avatar razubuddy commented on June 2, 2024

Yes, I use default JDBC PostgreSQL schema
Default provided schema for PostgreSQL in 1.2.0-M2 looks like this:

CREATE TABLE IF NOT EXISTS event_publication
(
  id               UUID NOT NULL,
  listener_id      TEXT NOT NULL,
  event_type       TEXT NOT NULL,
  serialized_event TEXT NOT NULL,
  publication_date TIMESTAMP WITH TIME ZONE NOT NULL,
  completion_date  TIMESTAMP WITH TIME ZONE,
  PRIMARY KEY (id)
);
CREATE INDEX IF NOT EXISTS event_publication_by_listener_id_and_serialized_event_idx ON event_publication (listener_id, serialized_event);
CREATE INDEX IF NOT EXISTS event_publication_by_completion_date_idx ON event_publication (completion_date);

The index event_publication_by_listener_id_and_serialized_event_idx is justified by query in JdbcEventPublicationRepository:

UPDATE EVENT_PUBLICATION
SET COMPLETION_DATE = ?
WHERE
    LISTENER_ID = ?
    AND SERIALIZED_EVENT = ?

This make me think I should care SERIALIZED_EVENT to be relatively small if it is indexed and Spring Modulith is designed in this direction, so I think using TEXT datatype is misleading.
On the other hand if you plan to support bigger event payloads, then it shouldn't be indexed and updating COMPLETION_DATE query could be redesigned, am I wright?

from spring-modulith.

odrotbohm avatar odrotbohm commented on June 2, 2024

I am not a Postgres expert, but it looks like we might need to optimize our index declaration. Would you mind adding USING GIN to the declaration and try again, as recommended here?

from spring-modulith.

razubuddy avatar razubuddy commented on June 2, 2024

GIN, GIST index types are dedicated for column types like tsvector, it is won't create for text column type

from spring-modulith.

matiwinnetou avatar matiwinnetou commented on June 2, 2024

Also running into this issue, in my case I can make events smaller but not sure how long this trick will suffice.

from spring-modulith.

matiwinnetou avatar matiwinnetou commented on June 2, 2024

@odrotbohm what if one introduces another field, a hash of serialised_event and then put index on it?

from spring-modulith.

matiwinnetou avatar matiwinnetou commented on June 2, 2024

It seems like GIN indices are supported by using a trick:

image

from spring-modulith.

matiwinnetou avatar matiwinnetou commented on June 2, 2024
image

from spring-modulith.

matiwinnetou avatar matiwinnetou commented on June 2, 2024

Here is my solution, I had to split index into 2:

-- Create a GIN index on just the tsvector column
CREATE INDEX IF NOT EXISTS serialized_event_tsvector_idx 
ON event_publication 
USING gin (to_tsvector('simple', serialized_event));

-- Create a separate index for listener_id if needed
CREATE INDEX IF NOT EXISTS listener_id_idx 
ON event_publication 
(listener_id);

Unfortunately I could not use JSONB as it is recommended because this is a serialized_event is a TEXT field on JPA in Modulith.

from spring-modulith.

dustinsand avatar dustinsand commented on June 2, 2024

Hi @odrotbohm , in case you missed the post from @JakeTiritilli about the hash index for PostgreSQL, I wanted to bring it to your attention because it dramatically reduced the load on PostgreSQL when the event_publication table had a large volume of records (1+ million in our case during performance testing) for SELECT and UPDATE. The SQL commands are now doing an "Index Scan using event_publication_serialized_event_hash_idx on event_publication" with O(1) !

from spring-modulith.

odrotbohm avatar odrotbohm commented on June 2, 2024

Are you folks using this index in addition to the event_publication_by_listener_id_and_serialized_event_idx one we already declare or as a replacement for that?

from spring-modulith.

dustinsand avatar dustinsand commented on June 2, 2024

Hi @odrotbohm, we replaced event_publication_by_listener_id_and_serialized_event_idx with the hash index.

FYI, we also added an index specifically for queries by the listener_id and completion_date for use cases we had.

CREATE INDEX IF NOT EXISTS event_publication_listener_id_idx ON event_publication(listener_id);
CREATE INDEX IF NOT EXISTS event_publication_completion_date_idx ON event_publication(completion_date);

from spring-modulith.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.