mfvanek / pg-index-health Goto Github PK

View Code? Open in Web Editor NEW

118.0 8.0 14.0 2.02 MB

pg-index-health is a Java library for analyzing and maintaining indexes and tables health in Postgresql databases.

License: Apache License 2.0

Java 100.00%

java postgres postgresql postgresql-monitoring health-check sql indexes gradle testcontainers java-11

pg-index-health's Introduction

pg-index-health is a Java library for analyzing and maintaining indexes and tables health in PostgreSQL databases.

Supported PostgreSQL versions

Support for previous versions of PostgreSQL

Compatibility with PostgreSQL versions 9.6, 10 and 11 is no longer guaranteed, but it is very likely.
We focus only on the currently maintained versions of PostgreSQL.
For more information please see PostgreSQL Versioning Policy.

Supported Java versions

Supports Java 11 and above
For Java 8 compatible version take a look at release 0.7.0 and lower

Available checks

pg-index-health allows you to detect the following problems:

Invalid (broken) indexes (sql).
Duplicated (completely identical) indexes (sql).
Intersected (partially identical) indexes (sql).
Unused indexes (sql).
Foreign keys without associated indexes (sql).
Indexes with null values (sql).
Tables with missing indexes (sql).
Tables without primary key (sql).
Indexes bloat (sql).
Tables bloat (sql).
Tables without description (sql).
Columns without description (sql).
Columns with json type (sql).
Columns of serial types that are not primary keys (sql).
Functions without description (sql).
Indexes with boolean (sql).
Tables with not valid constraints (sql).
B-tree indexes on array columns (sql).

For raw sql queries see pg-index-health-sql project.

How does it work?

pg_index_health utilizes the Cumulative Statistics System (formerly known as PostgreSQL's statistics collector).
You can call pg_stat_reset() on each host to reset all statistics counters for the current database to zero but the best way to do it is to use DatabaseManagement::resetStatistics() method.

Installation

Using Gradle:

implementation 'io.github.mfvanek:pg-index-health:0.11.0'

with Kotlin DSL

implementation("io.github.mfvanek:pg-index-health:0.11.0")

Using Maven:

<dependency>
  <groupId>io.github.mfvanek</groupId>
  <artifactId>pg-index-health</artifactId>
  <version>0.11.0</version>
</dependency>

Articles and publications

Index health in PostgreSQL through the eyes of a Java developer

How to use

There are three main scenarios of using pg-index-health in your projects:

unit\functional testing;
collecting indexes health data and monitoring bloat;
analysis of database configuration.

All these cases are covered with examples in the pg-index-health-demo project.

Integration with Spring Boot

There is a Spring Boot starter pg-index-health-test-starter for unit/integration testing as well.
More examples you can find in pg-index-health-spring-boot-demo project.

Questions, issues, feature requests and contributions

If you have any question or a problem with the library, please file an issue.
Contributions are always welcome! Please see contributing guide for more details.
We utilize Testcontainers for testing pg-index-health. So you need to have Docker installed on your machine.

pg-index-health's People

Contributors

Stargazers

Watchers

Forkers

kondratyev-nv davletvm gridl biter777 sbutterfly dgrushetskiy mslowiak miguelff hiteshbedre enkarin alexrogalskiy manumafe98 blohny temamak mfaulther

pg-index-health's Issues

Bloat monitoring doesn't work properly under user without read privileges.

Mention this point in the documentation.
Need to fix tables bloat query.

Move all sql queries to a separate repository and connect it as a git submodule

Refactoring is needed on IndexMaintenance to separate diagnostics on indexes and tables into different classes.

Consider to add diagnostic on usage serial/bigserial types for non primary key columns

Sometimes in database migrations we've faced with curious constructions like

CREATE TABLE some_table
(
    id bigserial primary key,
    document_copy_id bigserial, // it's a foreign key
...

It's look like a code smell to use serial/bigserial type for foreign key column in that case.
So it would be great to have a diagnostic for finding and preventing such cases.

Also It would be great to generate sql query to fix this case (after #129 ). For example:

alter table some_table alter column document_copy_id drop default;
drop sequence if exists some_table_document_copy_id_seq;

Consider to add check for columns with type json

With json type it's possible to get an error
org.postgresql.util.PSQLException: ERROR: could not identify an equality operator for type json
in queries like
select distinct id, json_column from table_with_json;

We should avoid using json type.
Jsonb should be preferred instead

Add support for PostgreSQL 13

Blocked by mfvanek/pg-index-health-sql#8

Support several PgContext in indexMaintenance.get* methods

It is needed, when you want to run checks on several schemas in cluster.
Or in all schemas

Add examples how to use in tests to README

Hello, could you add examples of using your library at unit/functional tests to readme?

Add a check for functions and procedures without a description

Please, add a check for functions and procedures without a description. You can use the following query for it:

SELECT
    n.nspname,
    p.proname, obj_description(p.oid)
FROM 
    pg_catalog.pg_namespace n
JOIN 
    pg_catalog.pg_proc p ON 
    p.pronamespace = n.oid
WHERE 
    (obj_description(p.oid) is null or length(trim(obj_description(p.oid))) = 0)
AND
    n.nspname = 'public'::text;

Prepare HighAvailabilityPgConnectionImpl for using as a long-live singleton (spring bean)

HighAvailabilityPgConnectionImpl needs to be a long-live singleton for using as a spring bean in spring boot starter.
In this case we cannot determe primary host in constructor and remember it because in real PostgreSQL cluster primary host may change its location due to failover or switchover.

This issue is linked with mfvanek/pg-index-health-spring-boot-demo#11

Sonar. Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception

Need to fix code smells https://sonarcloud.io/project/issues?resolved=false&severities=MAJOR&id=mfvanek_pg-index-health

Do not create separate connection to the primary host

Use pg_is_in_recovery() to mark primary host.
This will save on database connections.

Consider to add Forbidden API Checker

Add Forbidden API Checker instead of IllegalMethodCall from checkstyle
https://github.com/policeman-tools/forbidden-apis
https://plugins.gradle.org/plugin/de.thetaphi.forbiddenapis

Create a Spring Boot Starter in order to simplify using with @SpringBootTest

Based on https://github.com/mfvanek/pg-index-health-demo/blob/master/src/test/java/io/github/mfvanek/pg/index/health/demo/IndexesMaintenanceTest.java

We need something like

import io.github.mfvanek.pg.connection.PgConnection;
import io.github.mfvanek.pg.connection.PgConnectionImpl;
import io.github.mfvanek.pg.index.maintenance.IndexMaintenanceOnHostImpl;
import io.github.mfvanek.pg.index.maintenance.IndexesMaintenanceOnHost;
import io.github.mfvanek.pg.table.maintenance.TablesMaintenanceOnHost;
import io.github.mfvanek.pg.table.maintenance.TablesMaintenanceOnHostImpl;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.TestConfiguration;
import org.springframework.context.annotation.Bean;

import javax.sql.DataSource;

@TestConfiguration
public class DatabaseStructureHealthConfig {

  @Bean
  public PgConnection pgConnection(@Qualifier("dataSource") DataSource dataSource) {
    return PgConnectionImpl.ofPrimary(dataSource);
  }

  @Bean
  public IndexesMaintenanceOnHost indexesMaintenance(PgConnection pgConnection) {
    return new IndexMaintenanceOnHostImpl(pgConnection);
  }

  @Bean
  public TablesMaintenanceOnHost tablesMaintenance(PgConnection pgConnection) {
    return new TablesMaintenanceOnHostImpl(pgConnection);
  }
}

Migrate to AssertJ in tests

https://github.com/assertj

GitHub CI is broken for PR from forks

See https://github.com/mfvanek/pg-index-health/pull/124/checks

Run docker/login-action@v1
  with:
    ecr: auto
    logout: true
  env:
    TEST_PG_VERSION: 9.6.[2](https://github.com/mfvanek/pg-index-health/pull/124/checks#step:4:2)[3](https://github.com/mfvanek/pg-index-health/pull/124/checks#step:4:3)
    JAVA_HOME: /opt/hostedtoolcache/Java_Adopt_jdk/8.0.292-1/x6[4](https://github.com/mfvanek/pg-index-health/pull/124/checks#step:4:4)
Error: Username and password required

Add ability to distinguish between index types in checks for duplicated and intersected indexes.

btree/gist/gin etc.

Add aggregated Jacoco report

https://docs.gradle.org/current/samples/sample_jvm_multi_project_with_code_coverage_standalone.html

False positive result on missed primary key for materialized views.

Wrong detection of Intersected Indexes

create table test_table (
    id bigserial primary key,
    key bigint not null
);
create index indx on test_table(id, key);

Will fail on intersected indexes:
indx and test_table_pkey

How should I resolve this issue?

Add prometheus instrumentation

Consider adding metrics in prometheus format.
We should use Micrometer 1.9.12 (or higher but compatible with Spring Boot 2.7.x)

Consider adding new checks based on the article by Kirill Borovikov posted on Habr

See https://habr.com/ru/company/tensor/blog/488104/

Better API for using in tests

Hello. Let's test our indexes:

        List<DuplicatedIndexes> intersectedIndexes = indexMaintenance.getIntersectedIndexes(PG_CONTEXT);
        Assertions.assertThat(intersectedIndexes)
            .usingRecursiveFieldByFieldElementComparator()
            .containsExactlyInAnyOrder(
                DuplicatedIndexes.of(List.of(
                     IndexWithSize.of("table_name1", "index1", 1234),
                     IndexWithSize.of("table_name2", "index2", 54321))
           ));

There are several problems with API:

No equals methods in pojo classes. That's why we should use reflection to compare (usingRecursiveFieldByFieldElementComparator).
No varchar on DuplicatedIndexes.of(...). We wrap via List.of(..)
No guarantee order of IndexWithSize inside of DuplicatedIndexes. And tests may fail unpredictably. And users can't influence on it
Index size can be different, depending of the use of index

To fix all of them, I transform result into another datastructure:

     List<DuplicatedIndexes> intersectedIndexes = indexMaintenance.getIntersectedIndexes(PG_CONTEXT);

        List<Map.Entry<String, List<String>>> tableToIndexes = intersectedIndexes.stream()
            .map(index -> {
                String tableName = index.getTableName();
                List<String> indexes = index.getDuplicatedIndexes().stream()
                    .map(Index::getIndexName).sorted().collect(Collectors.toList());
                return Map.entry(tableName, indexes);
            })
            .collect(Collectors.toList());

        Assertions.assertThat(tableToIndexes)
            .containsExactlyInAnyOrder(
                Map.entry("hiding", List.of("hidings_primary_key", "sku_hidings_live_indx")),
                Map.entry("msku", List.of("ix_live_msku", "msku_title_indx")),
                Map.entry("msku", List.of("msku_pkey", "msku_title_indx"))
            );

        List<IndexWithNulls> indexesWithNulls = indexMaintenance.getIndexesWithNullValues(PG_CONTEXT);

        List<Map.Entry<String, String>> tableToIndexes = indexesWithNulls.stream()
            .map(index -> {
                String tableName = index.getTableName();
                String value = index.getIndexName() + " " + index.getNullableField();
                return Map.entry(tableName, value);
            })
            .collect(Collectors.toList());

        Assertions.assertThat(tableToIndexes)
            .containsExactlyInAnyOrder(
                Map.entry("hiding", "hidings_search_indx comment")
            );

Can you provide more flexible API for tests?

Add validation of JPA Entities against real db schema

Hibernate can do such validation on startup but usually in real apps it isn't an appropriate behavior.

Add diagnostics for measuring bloat of indexes and tables

bloat control

Add ability to accept zero value of thresholdUnitsCount argument in withIndexSizeThreshold and withTableSizeThreshold methods in Exclusions.Builder

Add new test case for indexes with different opclass

Add tests for

CREATE INDEX ON firms USING btree(lower(name));
CREATE INDEX ON firms USING btree(lower(name) text_pattern_ops);

See https://habr.com/ru/post/490824/#comment_21376248

Add retry policy in sql query execution to resist network inconsistency

[2020-04-09 08:04:44,663] ERROR Task execution error: class java.lang.RuntimeException
java.lang.RuntimeException: org.postgresql.util.PSQLException: Connection to :6432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at io.github.mfvanek.pg.utils.QueryExecutor.executeQuery(QueryExecutor.java:110)
at io.github.mfvanek.pg.utils.QueryExecutor.executeQueryWithSchema(QueryExecutor.java:64)
at io.github.mfvanek.pg.index.maintenance.IndexMaintenanceImpl.executeQuery(IndexMaintenanceImpl.java:210)
at io.github.mfvanek.pg.index.maintenance.IndexMaintenanceImpl.getPotentiallyUnusedIndexes(IndexMaintenanceImpl.java:88)
at io.github.mfvanek.pg.index.health.IndexesHealthImpl.lambda$getUnusedIndexes$0(IndexesHealthImpl.java:102)
at io.github.mfvanek.pg.index.health.IndexesHealthImpl.doOnHost(IndexesHealthImpl.java:188)
at io.github.mfvanek.pg.index.health.IndexesHealthImpl.getUnusedIndexes(IndexesHealthImpl.java:101)
at io.github.mfvanek.pg.index.health.logger.AbstractIndexesHealthLogger.logUnusedIndexes(AbstractIndexesHealthLogger.java:123)
at io.github.mfvanek.pg.index.health.logger.AbstractIndexesHealthLogger.logAll(AbstractIndexesHealthLogger.java:64)
at ru.yandex.market.notifier.jobs.db.maintenance.NotifierIndexesHealthLoggerJob.doJob(NotifierIndexesHealthLoggerJob.java:34)
at ru.yandex.market.notifier.jobs.zk.AbstractJob.run(AbstractJob.java:25)
at ru.yandex.market.checkout.common.tasks.ZooTask.fireRunnable(ZooTask.java:366)
at ru.yandex.market.checkout.common.tasks.ZooTask.fire(ZooTask.java:357)
at ru.yandex.market.checkout.common.tasks.ZooTask.tryToFireOnSchedule(ZooTask.java:347)
at ru.yandex.market.checkout.common.tasks.ZooTask.runOnSchedule(ZooTask.java:292)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.postgresql.util.PSQLException: Connection to :6432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at org.postgresql.Driver$ConnectThread.getResult(Driver.java:409)
at org.postgresql.Driver.connect(Driver.java:267)
at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:55)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:355)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:899)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:429)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:354)
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753)
at io.github.mfvanek.pg.utils.QueryExecutor.executeQuery(QueryExecutor.java:97)
... 15 common frames omitted
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:591)
at org.postgresql.core.PGStream.(PGStream.java:75)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:91)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.(PgConnection.java:195)
at org.postgresql.Driver.makeConnection(Driver.java:458)
at org.postgresql.Driver.access$100(Driver.java:57)
at org.postgresql.Driver$ConnectThread.run(Driver.java:368)
... 1 common frames omitted

Consider to add a check to prevent unnecessary where-clauses in index creating queries

We need a check to prevent unnecessary where-clauses like this
create index idx_table_a on table_a (client_id) where client_id is not null;
when client_id is declared as not null in table

Wrong detection of Duplicated indexes

create table t
(
    id   bigserial primary key,
    text text not null
);
create index ix_text on t (text);
create index ix_text_collate on t (text COLLATE "C.UTF-8");

Migrate codebase to Java 11

Create a branch with code for Java 8
Fix the latest Java 8 compatible release in the documentation
Migrate codebase to Java 11

Add feature to create cluster containers programmaticaly

Current state of PostgreSqlClusterWrapper provides two containers of PostgreSQL with hard coded configuration.

For various demo/test scenarios, it is essential to have programmatic control over the process of creating a PostgreSQL HA cluster. We need an API that provides control over the creation of the required number of PostgreSQL instances and their configuration.

Explicitly specify pg_catalog and information_schema for all sql queries

Replace "Master" with "Primary"

Generate package-info.java and add to the project

See https://www.baeldung.com/java-package-info

Log last stat_reset timestamp

From pg_stat_database https://www.postgresql.org/docs/10/monitoring-stats.html

Wrong detection of intersected indexes

create table msku.msku (
  market_sku_id                      bigserial        primary key,
  parent_model_id                    bigint           not null,
  category_id                        bigint           not null,
  vendor_id                          bigint           not null,
  sku_type                           msku.sku_type_enum,
  title                              text,
  picture_url                        text,
  published_on_blue_market           boolean          not null default false,
  published_on_market                boolean          not null default false,
  created_ts                         timestamp,
  deleted                            boolean
);

create index concurrently msku_title_indx on msku.msku(market_sku_id, title) where not deleted;

False positive
DuplicatedIndexes{tableName='msku.msku', totalSize=16384, indexes=[IndexWithSize{tableName='msku.msku', indexName='msku.msku_title_indx', indexSizeInBytes=8192}, IndexWithSize{tableName='msku.msku', indexName='msku.msku_pkey', indexSizeInBytes=8192}]}]>

Add ability to suggest recommended values to PostgreSQL's important settings.

get max_connections and calculate recommended values

Add a check for identifiers with maximum length

By default identifier length is bounded with 63 bytes.
When any identifier (table/index/sequence name) is out of this range PostgreSQL will silently truncate the identifier name. No errors are raised.
It might be confusing for developers in case of two identifiers with long names that differ a bit at the ending:

create index concurrently if not exists my_custom_index_on_my_custom_table_with_very_loooooooooooooooooooong_name_on_field1
    on my_custom_table (field1);
create index concurrently if not exists my_custom_index_on_my_custom_table_with_very_loooooooooooooooooooong_name_on_field2
    on my_custom_table (field2);

For first index you will get a warning with sql state 42622 like:
identifier "my_custom_index_on_my_custom_table_with_very_loooooooooooooooooooong_name_on_field1" will be truncated to "my_custom_index_on_my_custom_table_with_very_looooooooooooooooo"
and index will be created with truncated name.

For second index you'll get a warning:
relation "my_custom_index_on_my_custom_table_with_very_looooooooooooooooo" already exists, skipping and no index

pg-index-health as cmd-utility

Иван, приветствую!

Подскажи пожалуйста, а можно ли оформить pg-index-health как консольную утилиту?

Migrate gradle build config to buildSrc

After #200 being merged

See https://github.com/Timester/tmsvr-blog

Add nullability mark field to ForeignKey class and method for generating sql request for creating corresponding index

If foreign key is not covered with corresponding index we have to create it manually.
This is boring and error prone operation due to several circumstances: possible table downtime when creating index and nullability of fields from which index consists.
We can generate sql request for index creation automatically.
For example:

ForeignKey{tableName='leads.lead_clob', constraintName='lead_clob_fkey_lead_info_id', columnsInConstraint=[lead_info_id]}

might lead to a few possible solutions:

concurrently and field is nullable

create index concurrently if not exists lead_clob_lead_info_id_without_nulls_idx
    on leads.lead_clob (lead_info_id) where lead_info_id is not null;

concurrently and field is not null

create index concurrently if not exists lead_clob_lead_info_id_idx
    on leads.lead_clob (lead_info_id);

with table lock

create index if not exists lead_clob_lead_info_id_without_nulls_idx
    on leads.lead_clob (lead_info_id) where lead_info_id is not null;

and etc.

We need to implement a kind of Formatter with fluent API based on GOF builder pattern which will generate text for sql requests.
Options:

concurrently or not (default true)
exclude nulls or not (default true)
add "without_null" part to the name of index or not (default true)
add "idx" as prefix or suffix or not at all (default add as suffix)
generated index name length cannot exceed 63. see PostgreSQL Limits

create table if not exists another_table
(
    id bigserial primary key, -- should NOT be found
    constraint not_reserved_id check (id > 1000),
    constraint less_than_million check (id < 1000000)
);

create table if not exists one_more_table
(
    id bigserial, -- should be found. Why not PK?
    constraint unique_id unique (id),
    constraint not_reserved_id check (id > 1000),
    constraint less_than_million check (id < 1000000)
);

create table if not exists test_table
(
    id bigserial, -- should be found
    num bigserial, -- should be found
    constraint test_table_pkey_id primary key (id),
    constraint test_table_fkey_other_id foreign key (id) references another_table (id),
    constraint test_table_fkey_one_more_id foreign key (id) references one_more_table (id)
);

Use named parameters instead of ? in sql queries.

Linked with #49