Giter Club home page Giter Club logo

Comments (5)

jkburges avatar jkburges commented on August 15, 2024

An outage which just happened was possibly (likely) caused by materialised view refresh.

from aatams.

jkburges avatar jkburges commented on August 15, 2024

I'm not sure if I've made it clear enough in the past w.r.t. to the seriousness of this issue.

There are two main problems occurring as a result of the issues list above:

  1. data consistency - by having "join" tables and materialised views, the chances of having inconsistent data goes up immensely, and so this is related to the work @xhoenner is doing currently with the reconciliation;
  2. performance, reliability - the materialised view refresh is putting an enormous load on the system every night, and is in fact making the app unavailable while it is happening. With continued growth of the DB, this outage period is only going to increase, possibly to the point of where it starts overlapping with business hours.

from aatams.

jkburges avatar jkburges commented on August 15, 2024

I just prototyped (1) from above (loading detections using COPY).

I was able to load ~3.5M detections in around 11s (~300k records/s) - this is around 1500x as fast as the web app currently loads detections (including all the validation and what not).

FWIW, we could re-load every uploaded detection (~90M) in around 5 minutes this way :-)

from aatams.

jkburges avatar jkburges commented on August 15, 2024

To be fairer, I added a compound index and a primary key constraint. It takes now ~40s to load 3.5M records - still fast enough.

Table definition:

-- Table: raw_det

-- DROP TABLE raw_det;

CREATE TABLE raw_det
(
  "timestamp" timestamp with time zone,
  receiver_name text,
  transmitter_id text,
  id bigint NOT NULL,
  CONSTRAINT raw_det_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE raw_det
  OWNER TO aatams;

-- Index: pagination_index

-- DROP INDEX pagination_index;

CREATE INDEX pagination_index
  ON raw_det
  USING btree
  ("timestamp", receiver_name COLLATE pg_catalog."default", transmitter_id COLLATE pg_catalog."default");

COPY query (including some awk magic to add an id column, we would need to do a similar thing to add a receiver_download_file_id column):

COPY raw_det
FROM PROGRAM 'awk -F, ''{$(NF+1)=++i;}1'' OFS=, /tmp/big_dets.csv'
CSV HEADER DELIMITER ',';

from aatams.

jkburges avatar jkburges commented on August 15, 2024

Note that in 1), the file would have to exist on the same server as the DB. Possibly, rather than storing the file on disk, we could store it in the DB (as a BLOB or some such), and COPY from that.

This would also simplify deployment and backups somewhat (because only the DB has state, not DB + filesystem).

from aatams.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.