Giter Club home page Giter Club logo

pg-index-health's Introduction

pg-index-health

pg-index-health is a Java library for analyzing and maintaining indexes and tables health in PostgreSQL databases.

Java CI Maven Central License: Apache 2.0 javadoc codecov

Bugs Vulnerabilities Code Smells Lines of Code Coverage

Supported PostgreSQL versions

PostgreSQL 12 PostgreSQL 13 PostgreSQL 14 PostgreSQL 15 PostgreSQL 16

Support for previous versions of PostgreSQL

Compatibility with PostgreSQL versions 9.6, 10 and 11 is no longer guaranteed, but it is very likely.
We focus only on the currently maintained versions of PostgreSQL.
For more information please see PostgreSQL Versioning Policy.

Supported Java versions

Supports Java 11 and above
For Java 8 compatible version take a look at release 0.7.0 and lower

Available checks

pg-index-health allows you to detect the following problems:

  1. Invalid (broken) indexes (sql).
  2. Duplicated (completely identical) indexes (sql).
  3. Intersected (partially identical) indexes (sql).
  4. Unused indexes (sql).
  5. Foreign keys without associated indexes (sql).
  6. Indexes with null values (sql).
  7. Tables with missing indexes (sql).
  8. Tables without primary key (sql).
  9. Indexes bloat (sql).
  10. Tables bloat (sql).
  11. Tables without description (sql).
  12. Columns without description (sql).
  13. Columns with json type (sql).
  14. Columns of serial types that are not primary keys (sql).
  15. Functions without description (sql).
  16. Indexes with boolean (sql).
  17. Tables with not valid constraints (sql).
  18. B-tree indexes on array columns (sql).

For raw sql queries see pg-index-health-sql project.

How does it work?

pg_index_health utilizes the Cumulative Statistics System (formerly known as PostgreSQL's statistics collector).
You can call pg_stat_reset() on each host to reset all statistics counters for the current database to zero but the best way to do it is to use DatabaseManagement::resetStatistics() method.

Installation

Using Gradle:

implementation 'io.github.mfvanek:pg-index-health:0.11.0'
with Kotlin DSL
implementation("io.github.mfvanek:pg-index-health:0.11.0")

Using Maven:

<dependency>
  <groupId>io.github.mfvanek</groupId>
  <artifactId>pg-index-health</artifactId>
  <version>0.11.0</version>
</dependency>

Articles and publications

How to use

There are three main scenarios of using pg-index-health in your projects:

  • unit\functional testing;
  • collecting indexes health data and monitoring bloat;
  • analysis of database configuration.

All these cases are covered with examples in the pg-index-health-demo project.

Integration with Spring Boot

There is a Spring Boot starter pg-index-health-test-starter for unit/integration testing as well.
More examples you can find in pg-index-health-spring-boot-demo project.

Questions, issues, feature requests and contributions

  • If you have any question or a problem with the library, please file an issue.
  • Contributions are always welcome! Please see contributing guide for more details.
  • We utilize Testcontainers for testing pg-index-health. So you need to have Docker installed on your machine.

pg-index-health's People

Contributors

dependabot[bot] avatar enkarin avatar evreke avatar kondratyev-nv avatar mfaulther avatar mfvanek avatar mslowiak avatar sbutterfly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pg-index-health's Issues

Consider to add diagnostic on usage serial/bigserial types for non primary key columns

Sometimes in database migrations we've faced with curious constructions like

CREATE TABLE some_table
(
    id bigserial primary key,
    document_copy_id bigserial, // it's a foreign key
...

It's look like a code smell to use serial/bigserial type for foreign key column in that case.
So it would be great to have a diagnostic for finding and preventing such cases.

Also It would be great to generate sql query to fix this case (after #129 ). For example:

alter table some_table alter column document_copy_id drop default;
drop sequence if exists some_table_document_copy_id_seq;

Consider to add check for columns with type json

With json type it's possible to get an error
org.postgresql.util.PSQLException: ERROR: could not identify an equality operator for type json
in queries like
select distinct id, json_column from table_with_json;

We should avoid using json type.
Jsonb should be preferred instead

Add a check for functions and procedures without a description

Please, add a check for functions and procedures without a description. You can use the following query for it:

SELECT
    n.nspname,
    p.proname, obj_description(p.oid)
FROM 
    pg_catalog.pg_namespace n
JOIN 
    pg_catalog.pg_proc p ON 
    p.pronamespace = n.oid
WHERE 
    (obj_description(p.oid) is null or length(trim(obj_description(p.oid))) = 0)
AND
    n.nspname = 'public'::text;

Create a Spring Boot Starter in order to simplify using with @SpringBootTest

Based on https://github.com/mfvanek/pg-index-health-demo/blob/master/src/test/java/io/github/mfvanek/pg/index/health/demo/IndexesMaintenanceTest.java

We need something like

import io.github.mfvanek.pg.connection.PgConnection;
import io.github.mfvanek.pg.connection.PgConnectionImpl;
import io.github.mfvanek.pg.index.maintenance.IndexMaintenanceOnHostImpl;
import io.github.mfvanek.pg.index.maintenance.IndexesMaintenanceOnHost;
import io.github.mfvanek.pg.table.maintenance.TablesMaintenanceOnHost;
import io.github.mfvanek.pg.table.maintenance.TablesMaintenanceOnHostImpl;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.TestConfiguration;
import org.springframework.context.annotation.Bean;

import javax.sql.DataSource;

@TestConfiguration
public class DatabaseStructureHealthConfig {

  @Bean
  public PgConnection pgConnection(@Qualifier("dataSource") DataSource dataSource) {
    return PgConnectionImpl.ofPrimary(dataSource);
  }

  @Bean
  public IndexesMaintenanceOnHost indexesMaintenance(PgConnection pgConnection) {
    return new IndexMaintenanceOnHostImpl(pgConnection);
  }

  @Bean
  public TablesMaintenanceOnHost tablesMaintenance(PgConnection pgConnection) {
    return new TablesMaintenanceOnHostImpl(pgConnection);
  }
}

GitHub CI is broken for PR from forks

See https://github.com/mfvanek/pg-index-health/pull/124/checks

Run docker/login-action@v1
  with:
    ecr: auto
    logout: true
  env:
    TEST_PG_VERSION: 9.6.[2](https://github.com/mfvanek/pg-index-health/pull/124/checks#step:4:2)[3](https://github.com/mfvanek/pg-index-health/pull/124/checks#step:4:3)
    JAVA_HOME: /opt/hostedtoolcache/Java_Adopt_jdk/8.0.292-1/x6[4](https://github.com/mfvanek/pg-index-health/pull/124/checks#step:4:4)
Error: Username and password required

Wrong detection of Intersected Indexes

create table test_table (
    id bigserial primary key,
    key bigint not null
);
create index indx on test_table(id, key);

Will fail on intersected indexes:
indx and test_table_pkey

How should I resolve this issue?

Add prometheus instrumentation

Consider adding metrics in prometheus format.
We should use Micrometer 1.9.12 (or higher but compatible with Spring Boot 2.7.x)

Better API for using in tests

Hello. Let's test our indexes:

        List<DuplicatedIndexes> intersectedIndexes = indexMaintenance.getIntersectedIndexes(PG_CONTEXT);
        Assertions.assertThat(intersectedIndexes)
            .usingRecursiveFieldByFieldElementComparator()
            .containsExactlyInAnyOrder(
                DuplicatedIndexes.of(List.of(
                     IndexWithSize.of("table_name1", "index1", 1234),
                     IndexWithSize.of("table_name2", "index2", 54321))
           ));

There are several problems with API:

  1. No equals methods in pojo classes. That's why we should use reflection to compare (usingRecursiveFieldByFieldElementComparator).
  2. No varchar on DuplicatedIndexes.of(...). We wrap via List.of(..)
  3. No guarantee order of IndexWithSize inside of DuplicatedIndexes. And tests may fail unpredictably. And users can't influence on it
  4. Index size can be different, depending of the use of index

To fix all of them, I transform result into another datastructure:

     List<DuplicatedIndexes> intersectedIndexes = indexMaintenance.getIntersectedIndexes(PG_CONTEXT);

        List<Map.Entry<String, List<String>>> tableToIndexes = intersectedIndexes.stream()
            .map(index -> {
                String tableName = index.getTableName();
                List<String> indexes = index.getDuplicatedIndexes().stream()
                    .map(Index::getIndexName).sorted().collect(Collectors.toList());
                return Map.entry(tableName, indexes);
            })
            .collect(Collectors.toList());

        Assertions.assertThat(tableToIndexes)
            .containsExactlyInAnyOrder(
                Map.entry("hiding", List.of("hidings_primary_key", "sku_hidings_live_indx")),
                Map.entry("msku", List.of("ix_live_msku", "msku_title_indx")),
                Map.entry("msku", List.of("msku_pkey", "msku_title_indx"))
            );

Or

        List<IndexWithNulls> indexesWithNulls = indexMaintenance.getIndexesWithNullValues(PG_CONTEXT);

        List<Map.Entry<String, String>> tableToIndexes = indexesWithNulls.stream()
            .map(index -> {
                String tableName = index.getTableName();
                String value = index.getIndexName() + " " + index.getNullableField();
                return Map.entry(tableName, value);
            })
            .collect(Collectors.toList());

        Assertions.assertThat(tableToIndexes)
            .containsExactlyInAnyOrder(
                Map.entry("hiding", "hidings_search_indx comment")
            );

Can you provide more flexible API for tests?

Add retry policy in sql query execution to resist network inconsistency

[2020-04-09 08:04:44,663] ERROR Task execution error: class java.lang.RuntimeException
java.lang.RuntimeException: org.postgresql.util.PSQLException: Connection to :6432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at io.github.mfvanek.pg.utils.QueryExecutor.executeQuery(QueryExecutor.java:110)
at io.github.mfvanek.pg.utils.QueryExecutor.executeQueryWithSchema(QueryExecutor.java:64)
at io.github.mfvanek.pg.index.maintenance.IndexMaintenanceImpl.executeQuery(IndexMaintenanceImpl.java:210)
at io.github.mfvanek.pg.index.maintenance.IndexMaintenanceImpl.getPotentiallyUnusedIndexes(IndexMaintenanceImpl.java:88)
at io.github.mfvanek.pg.index.health.IndexesHealthImpl.lambda$getUnusedIndexes$0(IndexesHealthImpl.java:102)
at io.github.mfvanek.pg.index.health.IndexesHealthImpl.doOnHost(IndexesHealthImpl.java:188)
at io.github.mfvanek.pg.index.health.IndexesHealthImpl.getUnusedIndexes(IndexesHealthImpl.java:101)
at io.github.mfvanek.pg.index.health.logger.AbstractIndexesHealthLogger.logUnusedIndexes(AbstractIndexesHealthLogger.java:123)
at io.github.mfvanek.pg.index.health.logger.AbstractIndexesHealthLogger.logAll(AbstractIndexesHealthLogger.java:64)
at ru.yandex.market.notifier.jobs.db.maintenance.NotifierIndexesHealthLoggerJob.doJob(NotifierIndexesHealthLoggerJob.java:34)
at ru.yandex.market.notifier.jobs.zk.AbstractJob.run(AbstractJob.java:25)
at ru.yandex.market.checkout.common.tasks.ZooTask.fireRunnable(ZooTask.java:366)
at ru.yandex.market.checkout.common.tasks.ZooTask.fire(ZooTask.java:357)
at ru.yandex.market.checkout.common.tasks.ZooTask.tryToFireOnSchedule(ZooTask.java:347)
at ru.yandex.market.checkout.common.tasks.ZooTask.runOnSchedule(ZooTask.java:292)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.postgresql.util.PSQLException: Connection to :6432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at org.postgresql.Driver$ConnectThread.getResult(Driver.java:409)
at org.postgresql.Driver.connect(Driver.java:267)
at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:55)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:355)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:899)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:429)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:354)
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753)
at io.github.mfvanek.pg.utils.QueryExecutor.executeQuery(QueryExecutor.java:97)
... 15 common frames omitted
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:591)
at org.postgresql.core.PGStream.(PGStream.java:75)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:91)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.(PgConnection.java:195)
at org.postgresql.Driver.makeConnection(Driver.java:458)
at org.postgresql.Driver.access$100(Driver.java:57)
at org.postgresql.Driver$ConnectThread.run(Driver.java:368)
... 1 common frames omitted

Wrong detection of Duplicated indexes

create table t
(
    id   bigserial primary key,
    text text not null
);
create index ix_text on t (text);
create index ix_text_collate on t (text COLLATE "C.UTF-8");

Migrate codebase to Java 11

Create a branch with code for Java 8
Fix the latest Java 8 compatible release in the documentation
Migrate codebase to Java 11

Add feature to create cluster containers programmaticaly

Current state of PostgreSqlClusterWrapper provides two containers of PostgreSQL with hard coded configuration.

For various demo/test scenarios, it is essential to have programmatic control over the process of creating a PostgreSQL HA cluster. We need an API that provides control over the creation of the required number of PostgreSQL instances and their configuration.

Wrong detection of intersected indexes

create table msku.msku (
  market_sku_id                      bigserial        primary key,
  parent_model_id                    bigint           not null,
  category_id                        bigint           not null,
  vendor_id                          bigint           not null,
  sku_type                           msku.sku_type_enum,
  title                              text,
  picture_url                        text,
  published_on_blue_market           boolean          not null default false,
  published_on_market                boolean          not null default false,
  created_ts                         timestamp,
  deleted                            boolean
);

create index concurrently msku_title_indx on msku.msku(market_sku_id, title) where not deleted;

False positive
DuplicatedIndexes{tableName='msku.msku', totalSize=16384, indexes=[IndexWithSize{tableName='msku.msku', indexName='msku.msku_title_indx', indexSizeInBytes=8192}, IndexWithSize{tableName='msku.msku', indexName='msku.msku_pkey', indexSizeInBytes=8192}]}]>

Add a check for identifiers with maximum length

By default identifier length is bounded with 63 bytes.
When any identifier (table/index/sequence name) is out of this range PostgreSQL will silently truncate the identifier name. No errors are raised.
It might be confusing for developers in case of two identifiers with long names that differ a bit at the ending:

create index concurrently if not exists my_custom_index_on_my_custom_table_with_very_loooooooooooooooooooong_name_on_field1
    on my_custom_table (field1);
create index concurrently if not exists my_custom_index_on_my_custom_table_with_very_loooooooooooooooooooong_name_on_field2
    on my_custom_table (field2);

For first index you will get a warning with sql state 42622 like:
identifier "my_custom_index_on_my_custom_table_with_very_loooooooooooooooooooong_name_on_field1" will be truncated to "my_custom_index_on_my_custom_table_with_very_looooooooooooooooo"
and index will be created with truncated name.

For second index you'll get a warning:
relation "my_custom_index_on_my_custom_table_with_very_looooooooooooooooo" already exists, skipping and no index

pg-index-health as cmd-utility

Иван, приветствую!

Подскажи пожалуйста, а можно ли оформить pg-index-health как консольную утилиту?

Add nullability mark field to ForeignKey class and method for generating sql request for creating corresponding index

If foreign key is not covered with corresponding index we have to create it manually.
This is boring and error prone operation due to several circumstances: possible table downtime when creating index and nullability of fields from which index consists.
We can generate sql request for index creation automatically.
For example:

ForeignKey{tableName='leads.lead_clob', constraintName='lead_clob_fkey_lead_info_id', columnsInConstraint=[lead_info_id]}

might lead to a few possible solutions:

  1. concurrently and field is nullable
create index concurrently if not exists lead_clob_lead_info_id_without_nulls_idx
    on leads.lead_clob (lead_info_id) where lead_info_id is not null;
  1. concurrently and field is not null
create index concurrently if not exists lead_clob_lead_info_id_idx
    on leads.lead_clob (lead_info_id);
  1. with table lock
create index if not exists lead_clob_lead_info_id_without_nulls_idx
    on leads.lead_clob (lead_info_id) where lead_info_id is not null;

and etc.

We need to implement a kind of Formatter with fluent API based on GOF builder pattern which will generate text for sql requests.
Options:

  1. concurrently or not (default true)
  2. exclude nulls or not (default true)
  3. add "without_null" part to the name of index or not (default true)
  4. add "idx" as prefix or suffix or not at all (default add as suffix)
  5. generated index name length cannot exceed 63. see PostgreSQL Limits

[Bug] Wrong detection on column that is primary key and also have another constraints

create table if not exists another_table
(
    id bigserial primary key, -- should NOT be found
    constraint not_reserved_id check (id > 1000),
    constraint less_than_million check (id < 1000000)
);

create table if not exists one_more_table
(
    id bigserial, -- should be found. Why not PK?
    constraint unique_id unique (id),
    constraint not_reserved_id check (id > 1000),
    constraint less_than_million check (id < 1000000)
);

create table if not exists test_table
(
    id bigserial, -- should be found
    num bigserial, -- should be found
    constraint test_table_pkey_id primary key (id),
    constraint test_table_fkey_other_id foreign key (id) references another_table (id),
    constraint test_table_fkey_one_more_id foreign key (id) references one_more_table (id)
);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.