Giter Club home page Giter Club logo

chistadata-anansi-profiler's Introduction

ChistaDATA Anansi Profiler

Introduction

ChistaDATA Anansi is a log analysing tool built for ClickHouse. It is a CLI tool written in pure GO. The reports generated by the profiler sheds light on the different aspects of queries like execution time, memory usage, bytes read and other basic but crucial information.

The "anansi" takes the log output from a running ClickHouse/PostgreSQL database instance and processes it into a txt file. The report it provides presents all the information found in a pleasant and easy to understand reporting style.

To allow the Anansi profiler to parse the Postgres log file, you will need to provide the following parameters:

  1. Just specify for the which database. It should be specified using the flag "--database-type=postgres".

  2. If you are using a custom prefix in your Postgres log file, you will need to specify it using the flag "--log-prefix=". This will enable the Anansi profiler to correctly identify and parse the relevant information in your log file.

  3. This tool captures all of the DML/DDL queries by default. If you discard the any query type, just specify it into like "--discard-queries=insert"

  4. Just set the these parameters on PostgreSQL side: log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '; log_min_duration_statement = 5000 #if we need that queries take longer than 5 second log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_temp_files = 0 log_autovacuum_min_duration = 0 log_error_verbosity = default log_statement = off lc_messages='C'

Features

  • Multiple log file support
  • Report customization using CLI arguments
  • Multiple report formats are available (text, markdown)
  • Can be compiled to binaries compatible with multiple OS and architectures
  • Capable of processing huge files
  • Supports compressed files
  • Capable of fetching files from S3
  • These reports can help shed light on top n queries for now. Errors occurring in the system, checkpoint behaviour, vacuum behaviour, trends and other basic but critical information for a these database systems are also available in the future.

Installation

Prerequisites

  • Go
  • git
  • make

Clone the repo

git clone https://github.com/ChistaDATA/ChistaDATA-Anansi-Profiler.git

Build

make build

Your executable anansi-profiler will be generated in the working directory.

Example

Run with a sample file

./anansi-profiler -n 2 ./server.log

Output

# Current date: 2022-10-06 13:06:39.088724 +0530 IST m=+19.943042793
# Hostname: local
# Files:
 * server.log
# Overall: 243122, Unique: 17 , QPS: 9.10
# Time range: 2022-10-03 18:42:08.317458 +0000 UTC to 2022-10-04 02:07:25.907276 +0000 UTC
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time        13676.0   0.00s   0.93s   0.06s   0.72s   0.18s   0.00s
# Rows read         53.44B    0.00   0.53M   0.22M   0.52M   0.16M   0.26M
# Bytes read        1.43TB   0.00B 26.02MB  6.16MB 24.91MB  7.74MB  2.27MB
# Peak Memory                0.00B  1.67GB  0.12GB  1.55GB  0.39GB  4.00MB

# Profile
# Rank Response time   Calls R/Call Query
# ==== =============== ===== ====== =====
#    1 12340.9  90.24% 16995  0.73s select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.eve
#    2  43.53s   0.32% 17070  0.00s select min(amount) from salary where rand > 10000
#    3   0.49s   0.13%     1  0.49s insert into salary select * from salary
#    4   0.01s   0.00%     1  0.01s select * from land_registry_price_paid_uk limit 5;

# Query 1 : 1.243 QPS
# Time range: From 2022-10-03 18:42:08.4814 +0000 UTC to 2022-10-04 02:05:44.907986 +0000 UTC
# ====================================================================
# Attribute      total     min     max     avg     95%  stddev  median
# ============ ======= ======= ======= ======= ======= ======= =======
# Count         17.00K 
# Exec time    12340.9   0.00s   0.93s   0.73s   0.75s   0.04s   0.73s
# Rows read      8.77B    0.00   0.53M   0.52M   0.53M  21.50K   0.52M
# Bytes read    0.35TB   0.00B 22.03MB 21.33MB 21.97MB  0.89MB 21.37MB
# Peak Memory            0.00B  1.67GB  1.54GB  1.55GB 64.00MB  1.55GB
# ====================================================================
# Databases    default (16995/16995)  
# Hosts        [::1] (16995/16995)  
# Users        default (16995/16995)  
# Completion   16988/16995
# Query_time distribution
# ====================================================================
#   1us  
#  10us  
# 100us  
#   1ms  
#  10ms  
# 100ms  ###########################################################
#    1s  
#  10s+  
# ====================================================================
# Query
select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.event_id 

# Query 2 : 1.248 QPS
# Time range: From 2022-10-03 18:42:11.876586 +0000 UTC to 2022-10-04 02:07:25.693396 +0000 UTC
# ====================================================================
# Attribute      total     min     max     avg     95%  stddev  median
# ============ ======= ======= ======= ======= ======= ======= =======
# Count         17.07K 
# Exec time     43.53s   0.00s   0.13s   0.00s   0.00s   0.00s   0.00s
# Rows read      4.41B    0.00   0.27M   0.26M   0.27M  10.29K   0.26M
# Bytes read   65.66GB   0.00B  4.07MB  3.94MB  4.05MB  0.16MB  3.95MB
# Peak Memory           0.53MB  4.20MB  2.88MB  2.89MB 90.90KB  2.88MB
# ====================================================================
# Databases    default (17070/17070)  
# Hosts        [::1] (17070/17070)  
# Users        default (17070/17070)  
# Completion   17066/17070
# Query_time distribution
# ====================================================================
#   1us  
#  10us  
# 100us  
#   1ms  ###########################################################
#  10ms  
# 100ms  
#    1s  
#  10s+  
# ====================================================================
# Query
select min(amount) from salary where rand > 10000

Usage

anansi-profiler <file-paths> ...

Arguments:

<file-paths> ...    Paths of log files

Flags:

Flags:
  -h, --help                                 Show context-sensitive help.
  -n, --top-query-count=10                   Count of queries for top x table
  -r, --report-type="text"                   Report type to be generated, types: md, text
  -c, --minimum-query-call-count=1           Minimum no of query calls needed
      --database-type="clickhouse"           Which database? Possible values: clickhouse, postgres
      --database-version="0"                 Database version
      --sort-field="ExecTime"                Sort queries by the given field, possible values: ExecTime, RowsRead, BytesRead, PeakMemory, QPS, QueryCount
      --sort-field-operation="max"           Sort queries by the given operation on field, possible values: sum, min, max, avg, per95, stdDev, median
      --sort-order="desc"                    Sort order, possible values: asc, desc
      --log-level="error"                    Log level, possible values: panic, fatal, error, warn, info, debug, trace
      --log-prefix=""                        Prefix of log
      --discard-queries=,...                 It will consider all the query types by default but type of queries can be discarded, possible values: select, update, delete, insert
      --s3-access-key-id=STRING
      --s3-secret-access-key=STRING
      --s3-session-token=STRING
      --s3-region=STRING
      --s3-object-urls=S3-OBJECT-URLS,...

chistadata-anansi-profiler's People

Contributors

sinsinan avatar jimdegriz avatar askdba avatar ihsnlky avatar cansayin avatar hkucuk-pure avatar

Stargazers

Max Justus Spransy avatar Varun Vasan V avatar  avatar  avatar Anjul Sahu avatar  avatar  avatar Vishal Joshy avatar  avatar Vaisakh B avatar Aaron Harlap avatar Eugene Klimov avatar

Watchers

Praseed Pai KT avatar

chistadata-anansi-profiler's Issues

Fetch files from S3

S3 is the default object store these days. So a s3 fetch feature would be a great addition.

The user could specify access_id, secret and file URLs, the program should fetch and parse the files.

Improve log processing and output

Is your feature request related to a problem? Please describe.
Display the progress of the profiling from start to end.

Describe the solution you'd like
I would like to see that the process started and the percentile progress of the log file.

Describe alternatives you've considered
When the program completes, it does not display anything. It should display a summary of how many queries were processed, the output file name and location, the duration it took, and the file size.

Additional context
The ability to choose the output file name and location would be advised. The current working directory may not be suitable for profiler results.

Add support for compressed files in query profiler

I would like to suggest adding support for compressed files in the query profiler. Currently, the query profiler only accepts uncompressed files, which can be a problem when working with large datasets. By adding support for compressed files, users can upload and analyze large data files more easily.

The query profiler could be updated to support common compression formats such as gzip or zip. This would involve adding a decompression step before parsing the file and analyzing the queries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.