Comments (8)
Seems to be more of a ClickHouse bug, I'll try opening an issue.
from analytics.
sure, but how do I do that? Plausible's app console? I'm not very familiar with all this, it is my first time playing around with self-hosting
nevermind, got it
ec50295d4672 :) show create plausible_events.events_v2;
SHOW CREATE TABLE plausible_events.events_v2
Query id: 9a41190e-bdeb-4d87-9d59-67ba4d8af518
ββstatementβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CREATE TABLE plausible_events.events_v2
(
`timestamp` DateTime CODEC(Delta(4), LZ4),
`name` LowCardinality(String),
`site_id` UInt64,
`user_id` UInt64,
`session_id` UInt64,
`hostname` String,
`pathname` String CODEC(ZSTD(3)),
`referrer` String,
`referrer_source` String,
`country_code` FixedString(2),
`screen_size` LowCardinality(String),
`operating_system` LowCardinality(String),
`browser` LowCardinality(String),
`utm_medium` String,
`utm_source` String,
`utm_campaign` String,
`meta.key` Array(String),
`meta.value` Array(String),
`browser_version` LowCardinality(String),
`operating_system_version` LowCardinality(String),
`subdivision1_code` LowCardinality(String),
`subdivision2_code` LowCardinality(String),
`city_geoname_id` UInt32,
`utm_content` String,
`utm_term` String,
`transferred_from` String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
PRIMARY KEY (site_id, toDate(timestamp), name, user_id)
ORDER BY (site_id, toDate(timestamp), name, user_id, timestamp)
SAMPLE BY user_id
SETTINGS index_granularity = 8192 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 row in set. Elapsed: 0.002 sec.
ec50295d4672 :) show create plausible_events.sessions_v2;
SHOW CREATE TABLE plausible_events.sessions_v2
Query id: 0b562490-b1eb-4575-896e-cd6f578a96ac
ββstatementβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CREATE TABLE plausible_events.sessions_v2
(
`session_id` UInt64,
`sign` Int8,
`site_id` UInt64,
`user_id` UInt64,
`hostname` String,
`timestamp` DateTime CODEC(DoubleDelta, LZ4),
`start` DateTime CODEC(DoubleDelta, LZ4),
`is_bounce` UInt8,
`entry_page` String,
`exit_page` String,
`pageviews` Int32,
`events` Int32,
`duration` UInt32,
`referrer` String,
`referrer_source` String,
`country_code` LowCardinality(FixedString(2)),
`screen_size` LowCardinality(String),
`operating_system` LowCardinality(String),
`browser` LowCardinality(String),
`utm_medium` String,
`utm_source` String,
`utm_campaign` String,
`browser_version` LowCardinality(String),
`operating_system_version` LowCardinality(String),
`subdivision1_code` LowCardinality(String),
`subdivision2_code` LowCardinality(String),
`city_geoname_id` UInt32,
`utm_content` String,
`utm_term` String,
`transferred_from` String,
`entry_meta.key` Array(String),
`entry_meta.value` Array(String)
)
ENGINE = CollapsingMergeTree(sign)
PARTITION BY toYYYYMM(start)
PRIMARY KEY (site_id, toDate(start), user_id, session_id)
ORDER BY (site_id, toDate(start), user_id, session_id)
SAMPLE BY user_id
SETTINGS index_granularity = 8192 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 row in set. Elapsed: 0.003 sec.
from analytics.
Thanks a lot, @ruslandoga! I will subscribed to that issue
from analytics.
FYI, I tried the workaround in the linked ClickHouse issue and it worked for me by setting query_plan_optimize_prewhere
to 0
from analytics.
π @luizkowalski
Can you please run show create on events_v2 and sessions_v2 tables? I'll try out ClickHouse 24.2 on my instance later today.
from analytics.
Thank you! I've been able to reproduce it but I'm not sure what's the cause yet. _sample_factor
is still available in ClickHouse 24.2:
7eeb32dddd7a :) select sum(s.pageviews * s._sample_factor) from sessions_v2 s sample 100;
-- ββsum(multiply(pageviews, _sample_factor))ββ
-- β 390594.55999999976 β
-- ββββββββββββββββββββββββββββββββββββββββββββ
-- 1 row in set. Elapsed: 0.006 sec. Processed 20.73 thousand rows, 248.77 KB (3.22 million rows/s., 38.60 MB/s.)
-- Peak memory usage: 0.00 B.
7eeb32dddd7a :) select version();
-- ββversion()ββββ
-- β 24.2.1.2248 β
-- βββββββββββββββ
If the fix would involve changing the queries, it'd be unlikely to be merged until Plausible Cloud switches to the version requiring it.
from analytics.
Hm, this query seems to work in clickhouse client
SELECT
toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin')),
toUInt64(round(uniq(e0.user_id) * any(_sample_factor)))
FROM events_v2 AS e0
SAMPLE 20000000
GROUP BY toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin'))
ORDER BY toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin')) ASC
-- ββtoStartOfHour(toTimeZone(timestamp, 'Europe/Berlin'))ββ¬βtoUInt64(round(multiply(uniq(user_id), any(_sample_factor))))ββ
-- β 2022-08-10 17:00:00 β 1 β
-- β 2022-08-10 18:00:00 β 1 β
-- β 2022-08-11 11:00:00 β 1 β
-- β 2022-08-11 12:00:00 β 1 β
-- etc.
-- 981 rows in set. Elapsed: 0.039 sec. Processed 599.36 thousand rows, 7.19 MB (15.34 million rows/s., 184.13 MB/s.)
but adding a WHERE
clause makes it fail
SELECT
toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin')),
toUInt64(round(uniq(e0.user_id) * any(_sample_factor)))
FROM events_v2 AS e0
SAMPLE 20000000
WHERE (e0.site_id = 2) AND ((e0.timestamp >= '2024-03-01 23:00:00') AND (e0.timestamp < '2024-03-02 23:00:00'))
GROUP BY toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin'))
ORDER BY toStartOfHour(toTimeZone(e0.timestamp, 'Europe/Berlin')) ASC
-- Received exception from server (version 24.2.1):
-- Code: 10. DB::Exception: Received from localhost:9000. DB::Exception: Not found column _sample_factor: in block timestamp DateTime UInt32(size = 0), site_id UInt64 UInt64(size = 0), user_id UInt64 UInt64(size = 0). (NOT_FOUND_COLUMN_IN_BLOCK)
from analytics.
I'll re-open this issue if it turns out to not be a bug.
from analytics.
Related Issues (20)
- Check to determine if Clickhouse tables should be clustered is catastrophically flawed HOT 1
- It should be possible to bring up a Plausible self-hosted instance on a new, clean clustered Clickhouse HOT 2
- Proxied events being dropped? HOT 3
- cookie named "_plausible_key" exceeds maximum size of 4096 bytes HOT 7
- Browsing site with VPN gets events dropped HOT 7
- Logging plausible event results in CORS error HOT 6
- Docker self-host plausible/analytics:v2.0 doesn't track referer HOT 2
- Websocket connection failed on community edition HOT 13
- Issue when using SMTP mail (Bamboo SMTPAdapter) HOT 1
- Accept email for viewers link to settings which the viewer do not have access to HOT 4
- Pipe | is seen as "OR" when click for filtering
- /api/v1/sites/goals PUT endpoint results in 404 on Plausible v2.1.0-rc.0 HOT 3
- SVG map in shared link is not correctly aligned HOT 2
- Script is no longer <1kb HOT 2
- Cloudflare Workers not work with Countries HOT 23
- `file-types` causes an error in W3C Markup Validation Service
- Plausible logo in emails not visible HOT 1
- Mini Chart on Dashboard does not reset if the data has been resetted HOT 3
- Cannot log on to dashboard if using VPN HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from analytics.