archan937 / clickhouse Goto Github PK
View Code? Open in Web Editor NEWA Ruby database driver for Clickhouse
Home Page: https://rubygems.org/gems/clickhouse
License: MIT License
A Ruby database driver for Clickhouse
Home Page: https://rubygems.org/gems/clickhouse
License: MIT License
Is there a way to establish a connection to Clickhouse having your username & password set? I tried both
http://username:mypass@myhost:8123
and
http://myhost:8123?user=username&password=mypass
and neither works.
It seems like there's no way to pass config to underlying Faraday adapters.
Right now we have to do this ugly hack:
module Clickhouse
class Connection
module Client
def client
@client ||= Faraday.new(url: url) do |f|
f.adapter :net_http do |http|
http.read_timeout = 300
end
end
end
end
end
end
Is there anyway to pass this config to the client? Or perhaps we should make 300 seconds the default timeout, since that's the default in Clickhouse server as well?
If you try to insert_rows with string value like:
''The String'' with continuation
(there is 2 single quotes there)
Default Ruby CSV gem encode this string without " quotes, because there isn't any characters required to be encoded. But Clickhouse consider this 2 single quotes as field quotation mark and trigger an error: "comma required before with continuation
"
Now there is no way to change CSV generation mode without patching the gem. (Adding :force_quote=>true
solves the problem). So the problem is quite unpleasant. Probable you should add CSV options to insert_rows() arguments.
session_id parameter for the connection can be used to setup a session with Clickhouse
One application is using TEMPORARY TABLE
If I have a table with two columns, “foo” and “bar”, I cannot use insert_rows
to insert a record with only “foo”, because insert_rows
doesn't actually use the “rows” parameter. It should be something like
INSERT INTO #{table} (#{rows.join(",")}) ...
Long SQL queries cause clickhouse to respond 400 on a valid query
I finally had to switch to POST request in this way:
def query_long part1, part2
data = JSON.parse Clickhouse.connection.execute(part1, part2 + " FORMAT JSONCompact")
names = data["meta"].collect{|column| column["name"]}
types = data["meta"].collect{|column| column["type"]}
Clickhouse::Connection::Query::ResultSet.new data["data"], names, types
end
This would be cool if GEM will make this decision automatically
When performing heavy Clickhouse request it may take too much time to finish. Sometimes it may even exceed HTTPClient
connection timeout. Here is what happens in details:
Clickhouse.establish_connection(urls: ['host1.lvh.me:8123', 'host2.lvh.me:8123'])
Clickhouse::Connection::Client#request
receives Faraday::TimeoutError
exception.Clickhouse::Cluster#method_missing
retries the same request on another Clickhouse host from pond
's pool with the same result. If servers are under really heavy load and request lasts long enough, we will run out of servers in the pool and... come to the first one again which is still working over the heavy query!As a result each Clickhouse server in cluster runs the same query over and over again increasing the load.
This bloat ends by reaching maximum number of simultaneous queries, when Clickhouse refuses to take another query.
Clickhouse.connection.create_table("events") do |t|
t.fixed_string :id, 16
t.array :array_data, 'String'
t.date :date
t.engine "MergeTree(date, (date), 8192)"
end
Clickhouse.connection.insert_rows("events", :names => %w(id array_data date)) do |rows|
rows << ["d91d1c90", ["1", "2"], "2016-10-17"]
end
Clickhouse::QueryError: Code: 26, e.displayText() = DB::Exception: Cannot parse quoted string: expected opening quote:
Row 1:
Column 0, name: id, type: FixedString(16), parsed text: "d91d1c90"
Column 1, name: array_data, type: Array(String), parsed text: <EMPTY>ERROR
It happens because CSV generate array with double quotes, and Clickhouse cant understand this format
\"[\"1\", \"2\"]\"
Worked example with single quotes:
Clickhouse.connection.execute("INSERT INTO events FORMAT CSV \"d91d1c90\",\"['1','2']\",\"2016-10-17\"")
While in clickhouse console:
SELECT count()
FROM tracks
WHERE date = '2019-07-09'
┌─count()─┐
│ 4162001 │
└─────────┘
And Rails log shows
DEBUG -- : SELECT count() FROM tracks WHERE date = '2019-07-09'
DEBUG -- : [{"count()"=>0}]
The gem seems to be abandoned, but I'll drop this here anyway as a warning.
How about ActiveRecord binding?
NoMethodError: undefined method `parse' for Time:Class
It looks like we should require it explicitly.
Does this gem support defining default value as what is descried in clickhouse documentation metioned bellow?
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = engine
I saw Clickhouse::Connection::Query::Table#to_sql and think it is not supported.
Recently Yandex added support for Nullable types. But this gem does not support them.
Nullable fields are not documented yet. Some info can be found here
I forked gem and fixed this issue:
https://github.com/workgena/clickhouse/pull/1/files
https://github.com/workgena/clickhouse/pull/2/files
I'll try to write some test on this and create PR later.
When I run this command Clickhouse.connection.databases
it gives an error
Clickhouse::QueryError: Got status 401 (expected 200): Code: 194, e.displayText() = DB::Exception: Password required for user default (version 19.17.6.36 (official build))
how can I give password?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.