Giter Club home page Giter Club logo

clickhouse-bulk's Introduction

ClickHouse-Bulk

build codecov download binaries Go Report Card godoc

Simple Yandex ClickHouse insert collector. It collect requests and send to ClickHouse servers.

Installation

Download binary for you platorm

or

Use docker image

or from sources (Go 1.13+):

git clone https://github.com/nikepan/clickhouse-bulk
cd clickhouse-bulk
go build

Features

  • Group n requests and send to any of ClickHouse server
  • Sending collected data by interval
  • Tested with VALUES, TabSeparated formats
  • Supports many servers to send
  • Supports query in query parameters and in body
  • Supports other query parameters like username, password, database
  • Supports basic authentication

For example:

INSERT INTO table3 (c1, c2, c3) VALUES ('v1', 'v2', 'v3')
INSERT INTO table3 (c1, c2, c3) VALUES ('v4', 'v5', 'v6')

sends as

INSERT INTO table3 (c1, c2, c3) VALUES ('v1', 'v2', 'v3')('v4', 'v5', 'v6')

Options

  • -config - config file (json); default config.json

Configuration file

{
  "listen": ":8124",
  "flush_count": 10000, // check by \n char
  "flush_interval": 1000, // milliseconds
  "clean_interval": 0, // how often cleanup internal tables - e.g. inserts to different temporary tables, or as workaround for query_id etc. milliseconds
  "remove_query_id": true, // some drivers sends query_id which prevents inserts to be batched
  "dump_check_interval": 300, // interval for try to send dumps (seconds); -1 to disable
  "debug": false, // log incoming requests
  "dump_dir": "dumps", // directory for dump unsended data (if clickhouse errors)
  "clickhouse": {
    "down_timeout": 60, // wait if server in down (seconds)
    "connect_timeout": 10, // wait for server connect (seconds)
    "tls_server_name": "", // override TLS serverName for certificate verification (e.g. in cases you share same "cluster" certificate across multiple nodes)
    "insecure_tls_skip_verify": false, // INSECURE - skip certificate verification at all
    "servers": [
      "http://127.0.0.1:8123"
    ]
  },
  "metrics_prefix": "prefix"
}

Environment variables (used for docker image)

  • CLICKHOUSE_BULK_DEBUG - enable debug logging
  • CLICKHOUSE_SERVERS - comma separated list of servers
  • CLICKHOUSE_FLUSH_COUNT - count of rows for insert
  • CLICKHOUSE_FLUSH_INTERVAL - insert interval
  • CLICKHOUSE_CLEAN_INTERVAL - internal tables clean interval
  • DUMP_CHECK_INTERVAL - interval of resend dumps
  • CLICKHOUSE_DOWN_TIMEOUT - wait time if server is down
  • CLICKHOUSE_CONNECT_TIMEOUT - clickhouse server connect timeout
  • CLICKHOUSE_TLS_SERVER_NAME - server name for TLS certificate verification
  • CLICKHOUSE_INSECURE_TLS_SKIP_VERIFY - skip certificate verification at all
  • METRICS_PREFIX - prefix for prometheus metrics

Quickstart

./clickhouse-bulk and send queries to :8124

Metrics

manual check main metrics curl -s http://127.0.0.1:8124/metrics | grep "^ch_"

  • ch_bad_servers 0 - actual count of bad servers
  • ch_dump_count 0 - dumps saved from launch
  • ch_queued_dumps 0 - actual dump files id directory
  • ch_good_servers 1 - actual good servers count
  • ch_received_count 40 - received requests count from launch
  • ch_sent_count 1 - sent request count from launch

Tips

For better performance words FORMAT and VALUES must be uppercase.

clickhouse-bulk's People

Contributors

dependabot[bot] avatar dern3rd avatar fearlsgroove avatar khlystov avatar lenstr avatar nikepan avatar splichy avatar testwill avatar the-alchemist avatar tuxrace avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clickhouse-bulk's Issues

Make option to choose place for logs storing

Now all logs store in syslog, that is not very useful. Please, make option to choose place for logs storing (file, syslog, work without logging) and make possible to set up log level (for ex. I don't want to store logs like ' clickhouse-bulk[40680]: 2020/08/17 13:58:01 INFO: send 0 rows to ...')

Reducing log output

Hey there,

To install clickhouse-bulk on our server I added a systemd service for it. I just noticed that we frequently get issues with our journal log filling up the whole disk space.
I've tried to reconfigure systemd-journal to limit disk usage, but as the clickhouse-bulk log spits out so much data we miss out a lot of data then (e.g. log file is growing >100mb per hour)

Could me make the "sending x rows" and "sent x rows" messages configurable so I can deactivate them and only log warn/error messages?

Thanks in advance

ERROR: server error (400) Wrong server status 400 after updating to ClickHouse server version 21.4.4.30 (official build)

Hi.
The proxy can not execute queries an prints this log after the update but it has worked before the update

request: "INSERT INTO `display` (`uuid`,`user_id`,`app_uuid`,`uuid1`,`uuid2`,`created_at`) VALUES\n('7ecd41eb-58b6-44d2-ab59-01235bc32135',86,'00806453-89a0-4fd2-9f9f-2b012f45049e','0069f823-f901-48c6-b8bb-3d5a5d61d470','4264487b-fa40-47ae-939b-a492df46caaa','1618914155')"
2021/04/21 07:12:41.249482 INFO: sending 1 rows to http://192.168.88.1:8123 of INSERT INTO `display` (`uuid`,`user_id`,`app_uuid`,`uuid1`,`uuid1`,`created_at`) VALUES
2021/04/21 07:12:41.257645 INFO: sent 1 rows to http://192.168.88.1:8123 of INSERT INTO `display` (`uuid`,`user_id`,`app_uuid`,`uuid1`,`uuid1`,`created_at`) VALUES
2021/04/21 07:12:41.257817 ERROR: server error (400) Wrong server status 400:

Please take a look, it is very urgent for us.
Best Regards
Arthur

503 when i send quires to 8214 port

Hi
I cant send quires to clickhouse-bulk
I add clickhouse -bulk to my docker compose

clickhouse:
    image: yandex/clickhouse-server:21.1.2
    ports:
      - 8123:8123
      - 9090:9000

clickhouse-bulk:
    image: nikepan/clickhouse-bulk:1.3.3
    ports:
     - "8124:8124"
    environment:
     - CLICKHOUSE_SERVERS=http://0.0.0.0:8123

When i tried to send quires to port 8214 i get 503

*   Trying 127.0.0.1:8124...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8124 (#0)
> POST /?query=CREATE&DATABASE&IF&NO&EXISTS&test HTTP/1.1
> Host: 127.0.0.1:8124
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< Content-Type: text/plain; charset=UTF-8
< Date: Mon, 26 Jul 2021 23:10:49 GMT
< Content-Length: 0
< 
* Connection #0 to host 127.0.0.1 left intact

Both containers work and i can send curl http://0.0.0.0:8124/metrics | grep "^ch_"
What am I doing wrong?

Incorrect ParseQuery logick

ParseQuery in in collector.go incorrectly handles case when there are params before and after query

if eoq >= 0 {

It might cause incorrect results or panic: runtime error: slice bounds out of range depending on params length.

Instead of

if eoq >= 0 {
			q = queryString[i+6 : eoq+6]
			params = queryString[:i] + queryString[eoq+7:]
}

It should be:

if eoq >= 0 {
			q = queryString[i+6 : i+eoq+6]
			params = queryString[:i] + queryString[i+eoq+7:]
}

Example of problematic string:
queryString = "a=11111111111111111111111111111&query=insert into x format fmt&a=1"

Option to periodically flush collected data from memory to the disk if needed.

This option could be helpful in cases when service might be killed with oom or unexpected reboot server, etc. To prevent losing all of the data collected in memory and guarantee delivery after recovery service.
Useful new options:

  1. Enable flush to disk
  2. How often
  3. Retention policy
    Could this feature be implemented?

Doesn't work with the phpClickHouse client

When doing request using the following lib https://github.com/smi2/phpClickHouse which is the best one at the moment. It seems like the clickhouse-bulk except auth config via query param, but the new version of the phpClickHouse lib is using authorization via headers.

Unfortunately the phpClickHouse doesn't provide a way to send custom query param so I can do a workaround to add username/password to query string too.

Is there any solution for this? Or could you update the lib to except the param via new way also.

The one of the official auth methods of Clickhouse is with headers: X-ClickHouse-Key, X-ClickHouse-User

Here all options:

  1. Using HTTP Basic Authentication. Example:
    $ echo 'SELECT 1' | curl 'http://user:password@localhost:8123/' -d @-
  2. In the ‘user’ and ‘password’ URL parameters. Example:
    $ echo 'SELECT 1' | curl 'http://localhost:8123/?user=user&password=password' -d @-
  3. Using ‘X-ClickHouse-User’ and ‘X-ClickHouse-Key’ headers. Example:
    $ echo 'SELECT 1' | curl -H 'X-ClickHouse-User: user' -H 'X-ClickHouse-Key: password' 'http://localhost:8123/' -d @-

https://clickhouse.tech/docs/en/interfaces/http/

High Memory Usage

I use the bulker one of my projects with the binary. I get logs from AWS Lambda and sent to bulker. I discovered that memory usage increase linearly and never stabilizes. Do you have any clue what can be the reason?

Here is my config file

  "listen": ":8124",
  "flush_count": 100000,
  "flush_interval": 5000,
  "dump_check_interval": 300,
  "debug": false,
  "dump_dir": "dumps",
  "clickhouse": {
    "down_timeout": 60,
    "connect_timeout": 10,
    "servers": [
      "http://X.X.X.X:8123",
      "http://X.X.X.X:8123",
      "http://X.X.X.X:8123",
      "http://X.X.X.X:8123"
    ]
  }
}```

ERROR 502: No working clickhouse servers

{
  "listen": ":8123",
  "flush_count": 10000,
  "flush_interval": 3000,
  "debug": true,
  "dump_dir": "dumps",

  "clickhouse": {
    "down_timeout": 300,
    "servers": [
      "http://0.0.0.0:8070"
    ]
  }
}
2018/07/11 09:15:27 query query=Insert+into+Log_buffer+FORMAT+JSONEachRow&input_format_skip_unknown_fields=1 {"ts":"2018-07-11 09:15:27","level":"DEBUG","logger":"plugins.base_core","pid":19847,"procname":"wkr:1","file":"base_core.py:352","body":"Action start for '***********'","node":"US-2","jobid":"51399907","uid":"2","type":"monitor","plug":"*****"}
2018/07/11 09:15:27 Send ERROR 502: No working clickhouse servers

while direct insert into CH works fine -

$ curl 0.0.0.0:8087
Ok.

🙏

PS Не выдерживает нагрузки?

Not working?

Hello. I write simple test script on go:

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "strings"
)

func main() {
    fmt.Println("Hello, playground")
    for i := 0; i < 500; i++ {
        post(fmt.Sprintf("(%d)", i))
    }
    println("done")
}

func post(b string) {
    bod := strings.NewReader(b)
    req, err := http.NewRequest("POST", "http://127.0.0.1:8124/?query=INSERT%20INTO%20t%20VALUES", bod)
    if err != nil {
        panic(err)
    }
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()
    _, err = ioutil.ReadAll(resp.Body)

    if err != nil {
        panic(err)
    }
}

I have default params in config
A see in log this records:

2019/11/18 18:14:07 DEBUG: query query=INSERT%20INTO%20t%20VALUES (493)
2019/11/18 18:14:07 DEBUG: query query=INSERT%20INTO%20t%20VALUES (494)
2019/11/18 18:14:07 DEBUG: query query=INSERT%20INTO%20t%20VALUES (495)
2019/11/18 18:14:07 DEBUG: query query=INSERT%20INTO%20t%20VALUES (496)
2019/11/18 18:14:07 DEBUG: query query=INSERT%20INTO%20t%20VALUES (497)
2019/11/18 18:14:07 DEBUG: query query=INSERT%20INTO%20t%20VALUES (498)
2019/11/18 18:14:07 DEBUG: query query=INSERT%20INTO%20t%20VALUES (499)
2019/11/18 18:14:08 INFO: send 500 rows to http://u:pass@ip:8123 of INSERT INTO t VALUES

But i see in CH:
curl 'some:8123?query=SELECT%20MAX(a)%20FROM%20t'
255
Looks like first packet was sended 2 times:

250
251
252
253
254
255
0
1
2
3
4
5
6

Deadlocks while trying to write dumps and uploading them to database at the same time

There are two locks (FileDumper.mu and Clickhouse.mu) that are called in different order.

When dumping files to disk Clickhouse.Dump locks Clickhouse.mu, then FileDumper.Dump is called with a lock on FileDumper.mu. At the same time, FileDumper.Listen calls ProcessNextDump with a FileDumper.mu lock and then calls SendQuery->GetNextQuery with a Clickhouse.mu lock.

One potential solution is to remove a lock from Clickhouse.Dump, since it already locks FileDumper.mu in FileDumper.Dump.

No working clickhouse servers

{
"listen": ":8124",
"flush_count": 10000,
"flush_interval": 1000,
"dump_check_interval": 300,
"debug": false,
"dump_dir": "dumps",
"clickhouse": {
"down_timeout": 60,
"connect_timeout": 10,
"servers": [
"http://clickhouse:[email protected]:8123"
]
}
}

$ curl -s http://127.0.0.1:8124/metrics | grep "^ch_" ch_bad_servers 0 ch_dump_count 0 ch_good_servers 0 ch_queued_dumps 0 ch_received_count 0 ch_sent_count 0

$curl http://clickhouse:[email protected]:8123 Ok.

It seems that he does not see servers, how can I solve this problem?

Ethernal cycle of requests with errors

For example, if I send INSERT request with wrong date format "2020-02-06 07:15:23.364727" (ruby Time format), then this request will never been executed and makes computation power leak.

Could you make a solution for this?

For example: if exception is not about clickhouse server connection then remove this bad request from a sending cycle to a separate dump file with list of bad requests.

Very strange error on insert

Periodicaly not found or database, or auth error

-- auto-generated definition
create table test
(
id Int32,
name String
)
engine = MergeTree PARTITION BY id
PRIMARY KEY id
ORDER BY (id, name)
SETTINGS index_granularity = 8192;

⇨ http server started on [::]:8124
2021/09/10 21:26:10.503957 DEBUG: query INSERT INTO gc.test (id, name) VALUES (7, 'xcvbx')
2021/09/10 21:26:11.506905 INFO: sending 1 rows to http://10.0.10.141:8123 of INSERT INTO gc.test (id, name) VALUES
2021/09/10 21:26:11.521327 INFO: sent 1 rows to http://10.0.10.141:8123 of INSERT INTO gc.test (id, name) VALUES
2021/09/10 21:26:16.768973 DEBUG: query INSERT INTO gc.test (id, name) VALUES (8, 'xcvbx')
2021/09/10 21:26:17.504073 INFO: sending 1 rows to http://10.0.10.142:8123 of INSERT INTO gc.test (id, name) VALUES
2021/09/10 21:26:17.517043 INFO: sent 1 rows to http://10.0.10.142:8123 of INSERT INTO gc.test (id, name) VALUES
2021/09/10 21:26:17.517161 ERROR: Send (500) Wrong server status 500:
response: Code: 516, e.displayText() = DB::Exception: chtdidx: Authentication failed: password is incorrect or there is no user with such name (version 21.2.2.8 (official build))

request: "INSERT INTO gc.test (id, name) VALUES\n(8, 'xcvbx')"; response Code: 516, e.displayText() = DB::Exception: chtdidx: Authentication failed: password is incorrect or there is no user with such name (version 21.2.2.8 (official build))

2021/09/10 21:26:19.228692 DEBUG: query INSERT INTO gc.test (id, name) VALUES (8, 'xcvbx')
2021/09/10 21:26:19.508245 INFO: sending 1 rows to http://10.0.10.143:8123 of INSERT INTO gc.test (id, name) VALUES
2021/09/10 21:26:19.522896 INFO: sent 1 rows to http://10.0.10.143:8123 of INSERT INTO gc.test (id, name) VALUES
2021/09/10 21:26:19.523012 ERROR: Send (500) Wrong server status 500:
response: Code: 516, e.displayText() = DB::Exception: chtdidx: Authentication failed: password is incorrect or there is no user with such name (version 21.2.2.8 (official build))

request: "INSERT INTO gc.test (id, name) VALUES\n(8, 'xcvbx')"; response Code: 516, e.displayText() = DB::Exception: chtdidx: Authentication failed: password is incorrect or there is no user with such name (version 21.2.2.8 (official build))

2021/09/10 21:26:22.539692 DEBUG: query INSERT INTO gc.test (id, name) VALUES (8, 'xcvbx')
2021/09/10 21:26:23.503982 INFO: sending 1 rows to http://10.0.10.141:8123 of INSERT INTO gc.test (id, name) VALUES
2021/09/10 21:26:23.531772 INFO: sent 1 rows to http://10.0.10.141:8123 of INSERT INTO gc.test (id, name) VALUES

Many files in dumps directory

Hi!

Right now, on one of the instances there are 197 files in dumps directory, they contain seemingly failed queries and it looks like this:

-rw-r--r--    1 app     app          5130 Nov 26 13:21 dump202311241214102-98-500.dmp
-rw-r--r--    1 app     app         12358 Nov 26 13:21 dump202311241214102-99-500.dmp
...
app@clickhouse-bulk-576cb9c658-h2dvx $ ls -1 /app/dumps | wc -l
197

What does clickhouse-bulk do with all these files, should it remove them after resending?

Multiple clickhouse server data not send it.

We have following configuration.

{
"listen": ":8124",
"flush_count": 30000,
"flush_interval": 1000,
"debug": false,
"dump_dir": "dumps",
"clickhouse": {
"down_timeout": 300,
"servers": [
"http://127.0.0.1:8123",
"http://172.16.10.78:8123"
]
}
}

ConnectTimeout option is not tend to work properly

Hi,

At NewClickhouse method your have been implemented not right behavior for the ConnectTimeout options:

c.ConnectTimeout = connectTimeout
if c.ConnectTimeout > 0 {
    c.ConnectTimeout = 10
}

So, if I will set any positive value for ConnectTimeout - it will be not used but rewritten to 10 seconds;

ERROR: Send (503) No working clickhouse servers; response

Периодически под нагрузкой падает такой лог

clickhouse-bulk_1 | 2021/03/05 11:03:10.847398 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.847752 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.847858 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.847954 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.848023 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.848081 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.848224 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.848425 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.848488 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.848615 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.848839 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.848950 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.849238 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.849771 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.850151 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.850361 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.850426 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.850513 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.850565 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:03:10.850753 ERROR: Send (503) No working clickhouse servers; response
clickhouse-bulk_1 | 2021/03/05 11:04:32.243796 INFO: sending 26 rows to http://default:root@11111111:8123 of INSERT INTO lkdn_profiles.employees (
clickhouse-bulk_1 | 2021/03/05 11:04:42.244128 ERROR: server down (502): Post http://default:***@11111111:8123: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
clickhouse-bulk_1 | 2021/03/05 11:04:42.244156 INFO: sending 26 rows to http://default:root@11111111:8123 of INSERT INTO lkdn_profiles.employees (
clickhouse-bulk_1 | 2021/03/05 11:04:52.244517 ERROR: server down (502): Post http://default:***@11111111:8123: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
clickhouse-bulk_1 | 2021/03/05 11:04:52.244552 INFO: sending 26 rows to http://default:root@11111111:8123 of INSERT INTO lkdn_profiles.employees (
clickhouse-bulk_1 | 2021/03/05 11:05:02.244919 ERROR: server down (502): Post http://default:***@11111111:8123: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
clickhouse-bulk_1 | 2021/03/05 11:05:02.244950 INFO: sending 26 rows to http://default:root@11111111:8123 of INSERT INTO lkdn_profiles.employees (
clickhouse-bulk_1 | 2021/03/05 11:05:12.245236 ERROR: server down (502): Post http://default:***@11111111:8123: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
clickhouse-bulk_1 | 2021/03/05 11:05:12.245261 INFO: sending 26 rows to http://default:root@11111111:8123 of INSERT INTO lkdn_profiles.employees (
clickhouse-bulk_1 | 2021/03/05 11:05:22.245596 ERROR: server down (502): Post http://default:***@11111111:8123: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
clickhouse-bulk_1 | 2021/03/05 11:05:22.245626 ERROR: server error (503) No working clickhouse servers

но при этом сам сервер кх жив
echo 'SELECT 1' | curl 'http://default:root@1111:8123/' --data-binary @-
1

Bulk inserting is not working

Hi, first of all, thank your making clickhouse-bulk 💐

I am running with this config

{
  "listen": ":8125",
  "flush_count": 10000,
  "flush_interval": 3000,
  "debug": true,
  "dump_dir": "dumps",
  "clickhouse": {
    "down_timeout": 300,
    "servers": [
      "http://127.0.0.1:8123"
    ]
  }
}

Shouldn't this config collect and insert incoming requests in every 3 seconds, in bulk? I am watching logs and seeing every insert I send via HTTP (I am doing insert by python's requests) being processed immediately as I send. What am I doing wrong?

connect.Ping() results in bad connection error

One common practice after creating the connection is to check for Ping/Pong from server

if err := connect.Ping(); err != nil {
	logger.Fatal(err)
	return nil, err
}

This method works as intended when connecting to a clickhouse server via either http or tcp.

When i connect instead to clickhouse-bulk http i receive bad connection error from the driver.

'FORMAT' in string field broke insert query

For example, this query:

INSERT INTO test (date, args) VALUES ('2019-06-13', 'query=select%20args%20from%20test%20group%20by%20date%20FORMAT%20JSON')

or this

INSERT INTO test (date, args) VALUES ('2019-06-13', 'query=select%2520args%2520from%2520test%2520group%2520by%2520date%2520FORMAT%2520JSON')

generates an error:

2019/06/13 12:49:46 Send ERROR 500: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected ( before: \'NDA\', ...: (at row 2)

Request Timeout if send more then X rows at once

Hello, Nikolay!

I got Request Timeout while sending more then X (about 1000 in my case) rows at ones.
I batching requests on client side (nodejs) and sending this batch every 2500 ms. Everything goes well until my batch size reaches about 1000-1100 rows. I'm using offiсial clickhouse client for nodejs.

bulk instance deployed on another instance available via 1Gbit network.

Have you any idea why and how it could happen?

App stops sending data to Clickhouse after receiving error from it

The last lines in log file:

2023/04/21 08:29:37.756987 ERROR: server down (502): Post "CH_URL": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/04/21 08:29:37.756995 ERROR: Send (503) No working clickhouse servers; response

After that, data isn't sent to the CH server and the pod's RAM has increased over time.

I've checked app status and it looks ok:

/app # curl -s http://127.0.0.1:8124/metrics | grep "^ch_"
ch_bad_servers 0
ch_dump_count 14763
ch_good_servers 1
ch_queued_dumps 14743
ch_received_count 5.4660103e+07
ch_sent_count 2.154301e+06

Respect database specifier in connection string

So I have two databases, we shall call them default and newdb containing identical tables, and if I connect to clickhouse directly using http://username:password@localhost:8123/default or http://username:password@localhost:8123/newdb I am able to submit queries to the correct database.

However, with clickhouse-bulk, the exact same connection strings as above, inserts to both databases are aggregated into the same DB.

I can (and am now) running a copy of clickhouse-bulk per database, but this seems sub-optimal. Absolute minimum case, clickhouse-bulk should reject queries sent to /newdb if it is only going to insert into a single DB.

Q: is bulk a remedy for max_connections_count

Hi there,
Could you kindly confirm my initial thoughts about using your tool as a savior to my system?

I have a scenario where small inserts of data are posted to Clickhouse (e.g gps updates from a number of mobile devices).
Often Clickhouse returns http 500 error due to max connection count reached or due to timeout.
There are some MV that are being calculated on inserts so that might slow it down.
I changed default value from 100 to 500 but it doesn't seem to help just more queries are waiting.

I thought that using your tool can improve the situation as bulk inserts are advised due to performance boost.
Other option that I think of is usage of buffer tables.

Thanks!

clickhouse-bulk stops to resend old dumps after while

We have noticed, that we have some old dump with 502 error. They are not resend. When service is restarted dumps are resent after 5 minutes. Looks like we have some bug with d.LockedFiles and these dumps are locked.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.