Giter Club home page Giter Club logo

Comments (8)

ruslandoga avatar ruslandoga commented on June 24, 2024 2

Seems to be more of a ClickHouse bug, I'll try opening an issue.

from analytics.

luizkowalski avatar luizkowalski commented on June 24, 2024 1

sure, but how do I do that? Plausible's app console? I'm not very familiar with all this, it is my first time playing around with self-hosting

nevermind, got it

ec50295d4672 :) show create plausible_events.events_v2;

SHOW CREATE TABLE plausible_events.events_v2

Query id: 9a41190e-bdeb-4d87-9d59-67ba4d8af518

β”Œβ”€statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
β”‚ CREATE TABLE plausible_events.events_v2
(
    `timestamp` DateTime CODEC(Delta(4), LZ4),
    `name` LowCardinality(String),
    `site_id` UInt64,
    `user_id` UInt64,
    `session_id` UInt64,
    `hostname` String,
    `pathname` String CODEC(ZSTD(3)),
    `referrer` String,
    `referrer_source` String,
    `country_code` FixedString(2),
    `screen_size` LowCardinality(String),
    `operating_system` LowCardinality(String),
    `browser` LowCardinality(String),
    `utm_medium` String,
    `utm_source` String,
    `utm_campaign` String,
    `meta.key` Array(String),
    `meta.value` Array(String),
    `browser_version` LowCardinality(String),
    `operating_system_version` LowCardinality(String),
    `subdivision1_code` LowCardinality(String),
    `subdivision2_code` LowCardinality(String),
    `city_geoname_id` UInt32,
    `utm_content` String,
    `utm_term` String,
    `transferred_from` String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
PRIMARY KEY (site_id, toDate(timestamp), name, user_id)
ORDER BY (site_id, toDate(timestamp), name, user_id, timestamp)
SAMPLE BY user_id
SETTINGS index_granularity = 8192 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1 row in set. Elapsed: 0.002 sec.
ec50295d4672 :) show create plausible_events.sessions_v2;

SHOW CREATE TABLE plausible_events.sessions_v2

Query id: 0b562490-b1eb-4575-896e-cd6f578a96ac

β”Œβ”€statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
β”‚ CREATE TABLE plausible_events.sessions_v2
(
    `session_id` UInt64,
    `sign` Int8,
    `site_id` UInt64,
    `user_id` UInt64,
    `hostname` String,
    `timestamp` DateTime CODEC(DoubleDelta, LZ4),
    `start` DateTime CODEC(DoubleDelta, LZ4),
    `is_bounce` UInt8,
    `entry_page` String,
    `exit_page` String,
    `pageviews` Int32,
    `events` Int32,
    `duration` UInt32,
    `referrer` String,
    `referrer_source` String,
    `country_code` LowCardinality(FixedString(2)),
    `screen_size` LowCardinality(String),
    `operating_system` LowCardinality(String),
    `browser` LowCardinality(String),
    `utm_medium` String,
    `utm_source` String,
    `utm_campaign` String,
    `browser_version` LowCardinality(String),
    `operating_system_version` LowCardinality(String),
    `subdivision1_code` LowCardinality(String),
    `subdivision2_code` LowCardinality(String),
    `city_geoname_id` UInt32,
    `utm_content` String,
    `utm_term` String,
    `transferred_from` String,
    `entry_meta.key` Array(String),
    `entry_meta.value` Array(String)
)
ENGINE = CollapsingMergeTree(sign)
PARTITION BY toYYYYMM(start)
PRIMARY KEY (site_id, toDate(start), user_id, session_id)
ORDER BY (site_id, toDate(start), user_id, session_id)
SAMPLE BY user_id
SETTINGS index_granularity = 8192 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1 row in set. Elapsed: 0.003 sec.

from analytics.

luizkowalski avatar luizkowalski commented on June 24, 2024 1

Thanks a lot, @ruslandoga! I will subscribed to that issue

from analytics.

dustinmoris avatar dustinmoris commented on June 24, 2024 1

FYI, I tried the workaround in the linked ClickHouse issue and it worked for me by setting query_plan_optimize_prewhere to 0

from analytics.

ruslandoga avatar ruslandoga commented on June 24, 2024

πŸ‘‹ @luizkowalski

Can you please run show create on events_v2 and sessions_v2 tables? I'll try out ClickHouse 24.2 on my instance later today.

from analytics.

ruslandoga avatar ruslandoga commented on June 24, 2024

Thank you! I've been able to reproduce it but I'm not sure what's the cause yet. _sample_factor is still available in ClickHouse 24.2:

7eeb32dddd7a :) select sum(s.pageviews * s._sample_factor) from sessions_v2 s sample 100;

-- β”Œβ”€sum(multiply(pageviews, _sample_factor))─┐
-- β”‚                       390594.55999999976 β”‚
-- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

-- 1 row in set. Elapsed: 0.006 sec. Processed 20.73 thousand rows, 248.77 KB (3.22 million rows/s., 38.60 MB/s.)
-- Peak memory usage: 0.00 B.

7eeb32dddd7a :) select version();

-- β”Œβ”€version()───┐
-- β”‚ 24.2.1.2248 β”‚
-- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

If the fix would involve changing the queries, it'd be unlikely to be merged until Plausible Cloud switches to the version requiring it.

from analytics.

ruslandoga avatar ruslandoga commented on June 24, 2024

Hm, this query seems to work in clickhouse client

SELECT
    toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin')),
    toUInt64(round(uniq(e0.user_id) * any(_sample_factor)))
FROM events_v2 AS e0
SAMPLE 20000000
GROUP BY toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin'))
ORDER BY toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin')) ASC

-- β”Œβ”€toStartOfHour(toTimeZone(timestamp, 'Europe/Berlin'))─┬─toUInt64(round(multiply(uniq(user_id), any(_sample_factor))))─┐
-- β”‚                                   2022-08-10 17:00:00 β”‚                                                             1 β”‚
-- β”‚                                   2022-08-10 18:00:00 β”‚                                                             1 β”‚
-- β”‚                                   2022-08-11 11:00:00 β”‚                                                             1 β”‚
-- β”‚                                   2022-08-11 12:00:00 β”‚                                                             1 β”‚
-- etc.

-- 981 rows in set. Elapsed: 0.039 sec. Processed 599.36 thousand rows, 7.19 MB (15.34 million rows/s., 184.13 MB/s.)

but adding a WHERE clause makes it fail

SELECT
    toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin')),
    toUInt64(round(uniq(e0.user_id) * any(_sample_factor)))
FROM events_v2 AS e0
SAMPLE 20000000
WHERE (e0.site_id = 2) AND ((e0.timestamp >= '2024-03-01 23:00:00') AND (e0.timestamp < '2024-03-02 23:00:00'))
GROUP BY toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin'))
ORDER BY toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin')) ASC

-- Received exception from server (version 24.2.1):
-- Code: 10. DB::Exception: Received from localhost:9000. DB::Exception: Not found column _sample_factor:  in block timestamp DateTime UInt32(size = 0), site_id UInt64 UInt64(size = 0), user_id UInt64 UInt64(size = 0). (NOT_FOUND_COLUMN_IN_BLOCK)

from analytics.

ruslandoga avatar ruslandoga commented on June 24, 2024

ClickHouse/ClickHouse#60685

I'll re-open this issue if it turns out to not be a bug.

from analytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.