Giter Club home page Giter Club logo

Comments (30)

adubovikov avatar adubovikov commented on September 26, 2024 2

@Hubbitus d12a4ae

done. Please test it

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024 1

probably yes, let me do

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024 1

ok, it will require a bit more changes. V2 doesn't support custom log output. Will check it tomorrow

from promcasa.

lmangani avatar lmangani commented on September 26, 2024 1

Does this need documentation?

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024 1

you are welcome. Does it work now with units in DSN ?

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024 1

"counter_name": "counter",

"query": "SELECT 'cluster' as cluster, 'region' as region, COUNT(*) as counter FROM system.clusters",

or change "counter_name" to amount

from promcasa.

lmangani avatar lmangani commented on September 26, 2024 1

Something is wrong with your query. You should not use the value as a label, or your metric cardinality will easily explode.

from promcasa.

lmangani avatar lmangani commented on September 26, 2024 1

@aitudorm the config seems generally correct at first sight but you should confirm by looking at the metrics. Does it produce the intended results? And sure, you can generate multiple metrics by adding multiple query blocks in the JSON array.

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

@adubovikov, awesome! Thank you.

But looks like that is not enough, I've got "the node is offline:" error:

{"level":"info","msg":"init logging system","time":"2023-06-17T00:50:26+03:00"}
{"level":"info","msg":"Connecting to Proto: [http] Host: [10.220.1.10], User:[biuser], Name:[datamart], Node:[local], Port:[8123], Timeout: [30s, 30s]\n","time":"2023-06-17T00:50:26+03:00"}
....
{"level":"debug","msg":"Starting config database check","time":"2023-06-17T00:50:33+03:00"}
{"level":"debug","msg":"RunWatcherConfigDatabaseStats: CHECK DataDB: datamart","time":"2023-06-17T00:50:33+03:00"}
[clickhouse]host(s)=10.220.1.10:8123, database=datamart, username=biuser
{"level":"debug","msg":"Starting queries check","time":"2023-06-17T00:50:33+03:00"}
{"level":"error","msg":"the node is offline:","time":"2023-06-17T00:50:34+03:00"}
[clickhouse][dial] secure=false, skip_verify=false, strategy=random, ident=1, server=0 -> 10.220.1.10:8123

My relevant part of configuration looks like:

{
  "database_data": [
    {
      "host": "10.220.1.10",
      "port": 8123,
      "help": "Settings for Clickhouse Database (data)",
      "user": "biuser",
      "pass": "******",
      "name": "datamart",
      "proto": "http",
      "primary": true,
      "debug": true

Maybe it have worth to update clickhouse-go driver to version 2.10.1?

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

What is also bad - I've got only "the node is offline:" without any details, trace or some other useful info to understand what going wrong.

And that is even with "debug": true in config file.

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

b0b1806

from promcasa.

lmangani avatar lmangani commented on September 26, 2024

New release ready: https://github.com/metrico/promcasa/releases/tag/v20230617150137

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

Thank yo very much!

Documentation indeed will be very useful.

E.g. what means option database_data[*].node? I don't found equivalent in JDBC.

For the promcasa from the release I got the same error:

{"level":"debug","msg":"ping db data clickhouse [dsn parse]:read timeout: time: missing unit in duration \"30\"","time":"2023-06-18T00:21:11+03:00"}
{"level":"debug","msg":"Starting queries check","time":"2023-06-18T00:21:11+03:00"}
{"level":"error","msg":"the node is offline:","time":"2023-06-18T00:21:11+03:00"}
{"level":"debug","msg":"Starting queries check","time":"2023-06-18T00:21:21+03:00"}
{"level":"error","msg":"the node is offline:","time":"2023-06-18T00:21:21+03:00"}
{"level":"debug","msg":"Starting queries check","time":"2023-06-18T00:21:31+03:00"}
{"level":"error","msg":"the node is offline:","time":"2023-06-18T00:21:31+03:00"}

I also have tried to build it locally, but also got error:

$ make
#CGO_ENABLED=0 GOOS=linux go build -ldflags "-s -w" -o promcasa
CGO_ENABLED=1 GOOS=linux CGO_LDFLAGS="-lm -ldl" go build -a -ldflags '-extldflags "-static"' -tags netgo -installsuffix netgo -o promcasa
go: github.com/cortexproject/[email protected] requires
        github.com/thanos-io/[email protected] requires
        github.com/grpc-ecosystem/go-grpc-middleware/providers/kit/[email protected] requires
        github.com/grpc-ecosystem/go-grpc-middleware/[email protected]: invalid version: unknown revision 9a95f0fdbfea
go: github.com/cortexproject/[email protected] requires
        github.com/thanos-io/[email protected] requires
        github.com/grpc-ecosystem/go-grpc-middleware/providers/kit/[email protected] requires
        github.com/grpc-ecosystem/go-grpc-middleware/[email protected]: invalid version: unknown revision 9a95f0fdbfea
make: *** [Makefile:5: all] Error 1

Sorry for possibly newbie questions, I am not the go programmer.

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

@Hubbitus what is the version of your golang ?

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

@Hubbitus 1.20.2

git clone https://github.com/metrico/promcasa
cd promcasa
make
#CGO_ENABLED=0 GOOS=linux go build -ldflags "-s -w" -o promcasa
CGO_ENABLED=1 GOOS=linux CGO_LDFLAGS="-lm -ldl" go build -a -ldflags '-extldflags "-static"' -tags netgo -installsuffix netgo -o promcasa
go: downloading github.com/jmoiron/sqlx v1.3.5
go: downloading github.com/gofiber/fiber/v2 v2.46.0
go: downloading github.com/ansrivas/fiberprometheus/v2 v2.6.0
go: downloading github.com/mcuadros/go-defaults v1.2.0
go: downloading github.com/mitchellh/mapstructure v1.5.0
go: downloading github.com/spf13/viper v1.13.0
go: downloading gopkg.in/go-playground/validator.v9 v9.31.0
go: downloading github.com/prometheus/client_golang v1.15.0
go: downloading github.com/golang/snappy v0.0.4
go: downloading github.com/pkg/errors v0.9.1
go: downloading github.com/valyala/bytebufferpool v1.0.0
go: downloading golang.org/x/sync v0.1.0
go: downloading gopkg.in/yaml.v2 v2.4.0
go: downloading github.com/lestrrat-go/file-rotatelogs v2.4.0+incompatible
go: downloading github.com/sirupsen/logrus v1.9.0
go: downloading github.com/manifoldco/promptui v0.9.0
go: downloading github.com/ClickHouse/clickhouse-go/v2 v2.10.1
go: downloading github.com/Jeffail/gabs/v2 v2.7.0
go: downloading github.com/fsnotify/fsnotify v1.6.0
go: downloading github.com/spf13/afero v1.9.3
go: downloading github.com/spf13/cast v1.5.0
go: downloading github.com/spf13/jwalterweatherman v1.1.0
go: downloading github.com/spf13/pflag v1.0.5
go: downloading github.com/go-playground/universal-translator v0.18.1
go: downloading github.com/leodido/go-urn v1.2.4
go: downloading github.com/gofiber/adaptor/v2 v2.1.31
go: downloading github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e
go: downloading golang.org/x/sys v0.8.0
go: downloading github.com/lestrrat-go/strftime v1.0.6
go: downloading github.com/prometheus/common v0.42.0
go: downloading github.com/beorn7/perks v1.0.1
go: downloading github.com/cespare/xxhash/v2 v2.2.0
go: downloading github.com/prometheus/client_model v0.3.0
go: downloading github.com/prometheus/procfs v0.9.0
go: downloading google.golang.org/protobuf v1.30.0
go: downloading github.com/subosito/gotenv v1.4.1
go: downloading github.com/hashicorp/hcl v1.0.0
go: downloading gopkg.in/ini.v1 v1.67.0
go: downloading github.com/magiconair/properties v1.8.6
go: downloading github.com/pelletier/go-toml/v2 v2.0.5
go: downloading gopkg.in/yaml.v3 v3.0.1
go: downloading golang.org/x/text v0.9.0
go: downloading github.com/go-playground/locales v0.14.1
go: downloading github.com/pelletier/go-toml v1.9.5
go: downloading github.com/valyala/fasthttp v1.47.0
go: downloading github.com/matttproud/golang_protobuf_extensions v1.0.4
go: downloading github.com/golang/protobuf v1.5.3
go: downloading github.com/google/uuid v1.3.0
go: downloading github.com/mattn/go-colorable v0.1.13
go: downloading github.com/mattn/go-isatty v0.0.18
go: downloading github.com/mattn/go-runewidth v0.0.14
go: downloading github.com/savsgio/dictpool v0.0.0-20221023140959-7bf2e61cea94
go: downloading github.com/ClickHouse/ch-go v0.52.1
go: downloading github.com/andybalholm/brotli v1.0.5
go: downloading go.opentelemetry.io/otel/trace v1.14.0
go: downloading go.opentelemetry.io/otel v1.14.0
go: downloading github.com/klauspost/compress v1.16.3
go: downloading github.com/rivo/uniseg v0.2.0
go: downloading github.com/valyala/tcplisten v1.0.0
go: downloading github.com/paulmach/orb v0.9.0
go: downloading github.com/shopspring/decimal v1.3.1
go: downloading github.com/savsgio/gotils v0.0.0-20230208104028-c358bd845dee
go: downloading github.com/tinylib/msgp v1.1.8
go: downloading github.com/go-faster/city v1.0.1
go: downloading github.com/go-faster/errors v0.6.1
go: downloading github.com/pierrec/lz4/v4 v4.1.17
go: downloading github.com/segmentio/asm v1.2.0
go: downloading github.com/philhofer/fwd v1.1.2
#go build -a -ldflags '-extldflags "-static"' -o promcasa

ldd promcasa 
	not a dynamic executable

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

{"level":"debug","msg":"ping db data clickhouse [dsn parse]:read timeout: time: missing unit in duration \"30\"","time":"2023-06-18T00:21:11+03:00"}
seems this is a bug in clickhouse driver - now it's not full compatible with db.Ping - the param should be time.Duration, but clickhouse expects with units. Let me temporaly enable/disable keep alive

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

@Hubbitus ok. they changed DSN to units. Fixed and pushed. Please recheck this PR:

de38322

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

@adubovikov:

$ go version
go version go1.20.4 linux/amd64

And now the build was completed. Thank you.

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

At least I see now node connected from second attempt (I tried on another cluster):

{"level":"debug","msg":"Worker is waiting for jobs","time":"2023-06-18T14:25:27+03:00"}
{"level":"debug","msg":"Starting config database check","time":"2023-06-18T14:25:27+03:00"}
{"level":"debug","msg":"RunWatcherConfigDatabaseStats: CHECK DataDB: raw","time":"2023-06-18T14:25:27+03:00"}
{"level":"debug","msg":"Starting queries check","time":"2023-06-18T14:25:27+03:00"}
{"level":"error","msg":"the node is offline:","time":"2023-06-18T14:25:27+03:00"}
{"level":"debug","msg":"node is online: raw","time":"2023-06-18T14:25:29+03:00"}
{"level":"debug","msg":"Starting queries check","time":"2023-06-18T14:25:37+03:00"}
{"level":"debug","msg":"Execute query: 0SELECT COUNT(*) as amount FROM system.clusters","time":"2023-06-18T14:25:37+03:00"}
{"level":"debug","msg":"Execute Async process on node: 0 0","time":"2023-06-18T14:25:37+03:00"}
panic: interface conversion: interface {} is float64, not string

goroutine 16 [running]:
github.com/metrico/promcasa/service.(*InsertService).DoMetricsQueries(0xc0001fc240)
        /home/pasha/@Projects/@DevOps/promcasa/promcasa.git/service/insertService.go:175 +0x1c9b
github.com/metrico/promcasa/aggregator.doMetricsScheduler(0x0?)
        /home/pasha/@Projects/@DevOps/promcasa/promcasa.git/aggregator/casaAggregator.go:47 +0x7e
created by github.com/metrico/promcasa/aggregator.ActivateTimer
        /home/pasha/@Projects/@DevOps/promcasa/promcasa.git/aggregator/casaAggregator.go:27 +0x10d

But then it failed also with not clear error...

My first test metric defined like:

  "database_metrics": [
    {
      "_info": "Query to database. Refresh takes unit sign: (ns, ms, s, m, h). $refresh - this is a reference for 'refresh' param ",
      "name": "custom_cluster_nodes_count",
      "help": "My Status",
      "query": "SELECT COUNT(*) as amount FROM system.clusters",
      "labels": ["amount"],
      "counter_name": "counter",
      "refresh": "60s",
      "type":"gauge"
    }
  ],

Are there some documentation how to write probes?
Especially what means there type and difference between name and counter_name?

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

ok, I have added string validation, just to protect the promcasa.

In the example you have a way how to do it:

"query": "SELECT status, group, count(*) FROM my_index FINAL PREWHERE (datetime >= toDateTime(now()-$refresh)) AND (datetime < toDateTime(now()) ) group by status, group",

label, group and counter

in your case:
"query": "SELECT "cluster", "region", COUNT(*) as amount FROM system.clusters",

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

I've changed my metric to:

    {
      "_info": "Query to database. Refresh takes unit sign: (ns, ms, s, m, h). $refresh - this is a reference for 'refresh' param ",
      "name": "custom_cluster_nodes_count",
      "help": "Amount of cluster nodes",
      "query": "SELECT 'cluster' as cluster, 'region' as region, COUNT(*) as amount FROM system.clusters",
      "labels": ["cluster", "region"],
      "counter_name": "counter",
      "refresh": "60s",
      "type":"gauge"
    }

It looks like start scrapping:

{"level":"debug","msg":"Started the job queue","time":"2023-06-18T20:50:49+03:00"}

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.46.0                   │ 
 │               http://127.0.0.1:3215               │ 
 │       (bound on host 0.0.0.0 and port 3215)       │ 
 │                                                   │ 
 │ Handlers ............. 0  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ............. 36868 │ 
 └───────────────────────────────────────────────────┘ 

{"level":"debug","msg":"Worker is waiting for jobs","time":"2023-06-18T20:50:49+03:00"}
{"level":"debug","msg":"Starting config database check","time":"2023-06-18T20:50:49+03:00"}
{"level":"debug","msg":"RunWatcherConfigDatabaseStats: CHECK DataDB: raw","time":"2023-06-18T20:50:49+03:00"}
{"level":"debug","msg":"Starting queries check","time":"2023-06-18T20:50:49+03:00"}
{"level":"error","msg":"the node is offline:","time":"2023-06-18T20:50:49+03:00"}
{"level":"debug","msg":"node is online: raw","time":"2023-06-18T20:50:49+03:00"}
{"level":"debug","msg":"Starting queries check","time":"2023-06-18T20:50:59+03:00"}
{"level":"debug","msg":"Execute query: 0SELECT 'cluster' as cluster, 'region' as region, COUNT(*) as amount FROM system.clusters","time":"2023-06-18T20:50:59+03:00"}
{"level":"debug","msg":"Execute Async process on node: 0 0","time":"2023-06-18T20:50:59+03:00"}

But I do not see any on http:

$ http http://127.0.0.1:3215/
HTTP/1.1 404 Not Found
Content-Length: 12
Content-Type: text/plain; charset=utf-8
Date: Sun, 18 Jun 2023 17:51:38 GMT

Cannot GET /


$ http http://127.0.0.1:3215/metrics
HTTP/1.1 404 Not Found
Content-Length: 19
Content-Type: text/plain; charset=utf-8
Date: Sun, 18 Jun 2023 17:51:41 GMT

Cannot GET /metrics

What wrong on this step?

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

By default it's PUSH to prometheus, can you show me your "prometheus_client" section ?

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

Hello.
That is disabled:

  "prometheus_client": {
    "help": "Settings for internal Prometheus Client (optional)",
    "allow_ip": ["127.0.0.1"],
    "metrics_path": "/metrics",
    "service_name": "prometheus",
    "push_url": "",
    "push_name": "",
    "push_interval": "60s",
    "enable": false
  },

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

set to true ;-)

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

Thank you very much!
Now it works and I may continue to play with it.

from promcasa.

adubovikov avatar adubovikov commented on September 26, 2024

super! Enjoy!

from promcasa.

Hubbitus avatar Hubbitus commented on September 26, 2024

Hm, I got result on http interface, but it is wrong:

$ http http://127.0.0.1:3215/metrics | grep custom_cluster_nodes_count
# HELP custom_cluster_nodes_count Amount of cluster nodes
# TYPE custom_cluster_nodes_count gauge
custom_cluster_nodes_count{cluster="cluster",region="region"} 0

But direct query result:

SELECT 'cluster', 'region', COUNT(*) as amount FROM system.clusters
cluster region amount
cluster region 26

from promcasa.

aitudorm avatar aitudorm commented on September 26, 2024

Hello.

I have deployed promcasa and successfully started it. Now I am struggling with the amount of metrics.

I have configured 1 metric to count the amount of rows for the last hour.
But when I do "http://127.0.0.1:3215/metrics" | grep rows_count, initially it shows me one row like:
rows_count{amount="2.000000"} 2
After inserting some rows the result becomes:

rows_count{amount="2.000000"} 2
rows_count{amount="3.000000"} 3

and so on.

I am new to CentOS and I haven't deployed a prometheus on instance, because it is an OS without GNOME
So I want to know which of them (below) is going to prometheus?

rows_count{amount="15.000000"} 15
rows_count{amount="2.000000"} 2
rows_count{amount="3.000000"} 3
rows_count{amount="6.000000"} 6
rows_count{amount="7.000000"} 7 

from promcasa.

aitudorm avatar aitudorm commented on September 26, 2024

Something is wrong with your query. You should not use the value as a label, or your metric cardinality will easily explode.

Yes, it was related issue. Thank you

from promcasa.

aitudorm avatar aitudorm commented on September 26, 2024

Now my config file looks like:

  "database_metrics": [
    {
      "_info": "Query to database. Refesh takes unit sign: (ns, ms, s, m, h). $refresh - this is a reference for 'refresh' param ",
      "name": "rows_count",
      "help": "Amount of writes",
      "query": "select database || '.' || table as table_name, sum(rows) as qnty from system.parts where database not in ('INFORMATION_SCHEMA', 'default', 'information_schema', 'system') and active group by table_name",
      "labels": ["table_name"],
      "counter_name": "qnty",
      "refresh": "10s",
      "type":"gauge"
    }
  ],
  "

Is it possible to add a second query? Or it should be another block of JSON code of database_metrics just followed with coma?

from promcasa.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.