Giter Club home page Giter Club logo

chistadata-anansi-profiler's Introduction

ChistaDATA Anansi Profiler

Introduction

ChistaDATA Anansi is a log analysing tool built for ClickHouse. It is a CLI tool written in pure GO. The reports generated by the profiler sheds light on the different aspects of queries like execution time, memory usage, bytes read and other basic but crucial information.

The "anansi" takes the log output from a running ClickHouse/PostgreSQL database instance and processes it into a txt file. The report it provides presents all the information found in a pleasant and easy to understand reporting style.

To allow the Anansi profiler to parse the Postgres log file, you will need to provide the following parameters:

  1. Just specify for the which database. It should be specified using the flag "--database-type=postgres".

  2. If you are using a custom prefix in your Postgres log file, you will need to specify it using the flag "--log-prefix=". This will enable the Anansi profiler to correctly identify and parse the relevant information in your log file.

  3. This tool captures all of the DML/DDL queries by default. If you discard the any query type, just specify it into like "--discard-queries=insert"

  4. Just set the these parameters on PostgreSQL side: log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '; log_min_duration_statement = 5000 #if we need that queries take longer than 5 second log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_temp_files = 0 log_autovacuum_min_duration = 0 log_error_verbosity = default log_statement = off lc_messages='C'

Features

  • Multiple log file support
  • Report customization using CLI arguments
  • Multiple report formats are available (text, markdown)
  • Can be compiled to binaries compatible with multiple OS and architectures
  • Capable of processing huge files
  • Supports compressed files
  • Capable of fetching files from S3
  • These reports can help shed light on top n queries for now. Errors occurring in the system, checkpoint behaviour, vacuum behaviour, trends and other basic but critical information for a these database systems are also available in the future.

Installation

Prerequisites
  • Go
  • git
  • make
Clone the repo
git clone https://github.com/ChistaDATA/ChistaDATA-Anansi-Profiler.git
Build
make build

Your executable anansi-profiler will be generated in the working directory.

Example

Run with a sample file
./anansi-profiler -n 2 ./server.log
Output

# Current date: 2022-10-06 13:06:39.088724 +0530 IST m=+19.943042793
# Hostname: local
# Files:
	* server.log
# Overall: 243122, Unique: 17 , QPS: 9.10
# Time range: 2022-10-03 18:42:08.317458 +0000 UTC to 2022-10-04 02:07:25.907276 +0000 UTC
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time        13676.0   0.00s   0.93s   0.06s   0.72s   0.18s   0.00s
# Rows read         53.44B    0.00   0.53M   0.22M   0.52M   0.16M   0.26M
# Bytes read        1.43TB   0.00B 26.02MB  6.16MB 24.91MB  7.74MB  2.27MB
# Peak Memory                0.00B  1.67GB  0.12GB  1.55GB  0.39GB  4.00MB

# Profile
# Rank Response time   Calls R/Call Query
# ==== =============== ===== ====== =====
#    1 12340.9  90.24% 16995  0.73s select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.eve
#    2  43.53s   0.32% 17070  0.00s select min(amount) from salary where rand > 10000
#    3   0.49s   0.13%     1  0.49s insert into salary select * from salary
#    4   0.01s   0.00%     1  0.01s select * from land_registry_price_paid_uk limit 5;

# Query 1 : 1.243 QPS
# Time range: From 2022-10-03 18:42:08.4814 +0000 UTC to 2022-10-04 02:05:44.907986 +0000 UTC
# ====================================================================
# Attribute      total     min     max     avg     95%  stddev  median
# ============ ======= ======= ======= ======= ======= ======= =======
# Count         17.00K 
# Exec time    12340.9   0.00s   0.93s   0.73s   0.75s   0.04s   0.73s
# Rows read      8.77B    0.00   0.53M   0.52M   0.53M  21.50K   0.52M
# Bytes read    0.35TB   0.00B 22.03MB 21.33MB 21.97MB  0.89MB 21.37MB
# Peak Memory            0.00B  1.67GB  1.54GB  1.55GB 64.00MB  1.55GB
# ====================================================================
# Databases    default (16995/16995)  
# Hosts        [::1] (16995/16995)  
# Users        default (16995/16995)  
# Completion   16988/16995
# Query_time distribution
# ====================================================================
#   1us  
#  10us  
# 100us  
#   1ms  
#  10ms  
# 100ms  ###########################################################
#    1s  
#  10s+  
# ====================================================================
# Query
select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.event_id 

# Query 2 : 1.248 QPS
# Time range: From 2022-10-03 18:42:11.876586 +0000 UTC to 2022-10-04 02:07:25.693396 +0000 UTC
# ====================================================================
# Attribute      total     min     max     avg     95%  stddev  median
# ============ ======= ======= ======= ======= ======= ======= =======
# Count         17.07K 
# Exec time     43.53s   0.00s   0.13s   0.00s   0.00s   0.00s   0.00s
# Rows read      4.41B    0.00   0.27M   0.26M   0.27M  10.29K   0.26M
# Bytes read   65.66GB   0.00B  4.07MB  3.94MB  4.05MB  0.16MB  3.95MB
# Peak Memory           0.53MB  4.20MB  2.88MB  2.89MB 90.90KB  2.88MB
# ====================================================================
# Databases    default (17070/17070)  
# Hosts        [::1] (17070/17070)  
# Users        default (17070/17070)  
# Completion   17066/17070
# Query_time distribution
# ====================================================================
#   1us  
#  10us  
# 100us  
#   1ms  ###########################################################
#  10ms  
# 100ms  
#    1s  
#  10s+  
# ====================================================================
# Query
select min(amount) from salary where rand > 10000

Usage

anansi-profiler <file-paths> ...

Arguments:

<file-paths> ...    Paths of log files

Flags:

Flags:
  -h, --help                                 Show context-sensitive help.
  -n, --top-query-count=10                   Count of queries for top x table
  -r, --report-type="text"                   Report type to be generated, types: md, text
  -c, --minimum-query-call-count=1           Minimum no of query calls needed
      --database-type="clickhouse"           Which database? Possible values: clickhouse, postgres
      --database-version="0"                 Database version
      --sort-field="ExecTime"                Sort queries by the given field, possible values: ExecTime, RowsRead, BytesRead, PeakMemory, QPS, QueryCount
      --sort-field-operation="max"           Sort queries by the given operation on field, possible values: sum, min, max, avg, per95, stdDev, median
      --sort-order="desc"                    Sort order, possible values: asc, desc
      --log-level="error"                    Log level, possible values: panic, fatal, error, warn, info, debug, trace
      --log-prefix=""                        Prefix of log
      --discard-queries=,...                 It will consider all the query types by default but type of queries can be discarded, possible values: select, update, delete, insert
      --s3-access-key-id=STRING
      --s3-secret-access-key=STRING
      --s3-session-token=STRING
      --s3-region=STRING
      --s3-object-urls=S3-OBJECT-URLS,...

chistadata-anansi-profiler's People

Contributors

sinsinan avatar jimdegriz avatar askdba avatar ihsnlky avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.