Giter Club home page Giter Club logo

Comments (10)

BupycHuk avatar BupycHuk commented on July 24, 2024

Hello @artemsafiyulin, could you run pmm-admin summary on PMM client and share logs from qan-mongodb-profiler

from pmm.

artemsafiyulin avatar artemsafiyulin commented on July 24, 2024

Hello @BupycHuk, when I was run pmm-admin summary I get error:

[root@mongo-01 asafiiulin]# pmm-admin summary
[GET /logs.zip][401] Logs default  &{Code:7 Error:Access denied. Message:Access denied.}
summary_mongo-01_2023_10_04_07_23_12.zip created.

But archive generated. I checked QAN_MONGODB_PROFILER_AGENT and find many same warnings:

WARN[2023-09-28T00:45:27.463+00:00] couldn't retrieve data from cursor (CappedPositionLost) Executor error during getMore :: caused by :: CollectionScan died due to failure to restore tailable cursor position. Last seen record id: RecordId(731735296)  agentID=/agent_id/84387772-8197-494a-b789-b7c314231b3a component=agent-builtin db=my_database type=qan_mongodb_profiler_agent

I was think that my oplog size is too small, checked it and it look ok:

rs0 [direct: primary] test> rs.printReplicationInfo()
actual oplog size
'17769.6298828125 MB'
---
configured oplog size
'17769.6298828125 MB'
---
log length start to end
'501441 secs (139.29 hrs)'
---
oplog first event time
'Thu Sep 28 2023 12:32:32 GMT+0000 (Coordinated Universal Time)'
---
oplog last event time
'Wed Oct 04 2023 07:49:53 GMT+0000 (Coordinated Universal Time)'
---
now
'Wed Oct 04 2023 07:49:53 GMT+0000 (Coordinated Universal Time)'
rs0 [direct: primary] test>

from pmm.

BupycHuk avatar BupycHuk commented on July 24, 2024

@artemsafiyulin could you please try to restart pmm-agent?

from pmm.

artemsafiyulin avatar artemsafiyulin commented on July 24, 2024

@BupycHuk Yes it help, so as I understand pmm-agent wasn't have time to get information from oplog before them rewrite. And after that pmm-agent wasn't try get new information. After restart I see in pmm clickhouse that we don't have information between time when pmm-agent firstly get this error and time when I restart agent. Can we change this action? For example - if pmm-agent wasn't get some information from oplog - just skip it and continue collect newest information.

from pmm.

BupycHuk avatar BupycHuk commented on July 24, 2024

Yes, we'll discuss internally to prioritize it.

from pmm.

BupycHuk avatar BupycHuk commented on July 24, 2024

@artemsafiyulin I checked the code and seems like it supposed to work the way you wrote. Could you check logs if there are any repeating error except the one you shared?

from pmm.

artemsafiyulin avatar artemsafiyulin commented on July 24, 2024

@BupycHuk great news! It means that I have problem on my side and it can were fixed.
And after yesterday restart i get this problem again. I investigated systemd logs of pmm-agent on mongodb master server and see next situation:

at 13:51 pmm-agent was restarted:

Oct 04 13:51:18 prod-mongodb-build-01 systemd[1]: Started pmm-agent.

from 13:54 to 14:31 I get multiple same errors that "couldn't add document to aggregator" and one error about "Action terminated":

Oct 04 13:54:27 prod-mongodb-build-01 pmm-agent[2321084]: WARN[2023-10-04T13:54:27.420+00:00] couldn't add document to aggregator           agentID=/agent_id/84387772-8197-494a-b789-b7c314231b3a component=agent-builtin type=qan_mongodb_profiler_agent
.
.
.
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]: WARN[2023-10-04T14:10:43.693+00:00] Action terminated with error: cannot explain this type of query
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]: (Location40413) BSON field 'findAndModify.$db' is a duplicate field
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]: github.com/percona/pmm/agent/runner/actions.(*mongodbExplainAction).Run
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]:         /tmp/go/src/github.com/percona/pmm/agent/runner/actions/mongodb_explain_action.go:99
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]: github.com/percona/pmm/agent/runner.(*Runner).handleAction.func1
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]:         /tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:238
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]: runtime/pprof.Do
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]:         /usr/local/go/src/runtime/pprof/runtime.go:40
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]: runtime.goexit
Oct 04 14:10:43 prod-mongodb-build-01 pmm-agent[2321084]:         /usr/local/go/src/runtime/asm_amd64.s:1594  component=runner id=/action_id/28a169b1-90a7-4989-8ddc-9eb539649a38 type=mongodb-explain
.
.
.
Oct 04 14:31:18 prod-mongodb-build-01 pmm-agent[2321084]: WARN[2023-10-04T14:31:18.854+00:00] couldn't add document to aggregator           agentID=/agent_id/84387772-8197-494a-b789-b7c314231b3a component=agent-builtin type=qan_mongodb_profiler_agent

from 14:36 to 14:37 I get next few same errors:

Oct 04 14:36:08 prod-mongodb-build-01 pmm-agent[2321084]: WARN[2023-10-04T14:36:08.118+00:00] time="2023-10-04T14:36:08Z" level=warning msg="cannot create metrics for oplog: connection(127.0.0.1:27017[-4]) incomplete read of message header: context canceled"  agentID=/agent_id/dc060cf8-8221-4bbc-a774-4fb649c24873 component=agent-process type=mongodb_exporter
Oct 04 14:36:08 prod-mongodb-build-01 pmm-agent[2321084]: ERRO[2023-10-04T14:36:08.119+00:00] time="2023-10-04T14:36:08Z" level=error msg="Cannot get node type to check if this is a mongos: canceled while checking out a connection from connection pool: context canceled; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 0" component=diagnosticDataCollector  agentID=/agent_id/dc060cf8-8221-4bbc-a774-4fb649c24873 component=agent-process type=mongodb_exporter

from 14:37 to 23:59:08 also many same errors about "couldn't add document to aggregator" and few errors from previously block "cannot create metrics for oplog" and "Cannot get node type to check if this is a mongos"

from 00:01:06 to 01:11:30 many same errors:

Oct 04 23:59:08 prod-mongodb-build-01 pmm-agent[2321084]: WARN[2023-10-04T23:59:08.803+00:00] couldn't add document to aggregator           agentID=/agent_id/84387772-8197-494a-b789-b7c314231b3a component=agent-builtin type=qan_mongodb_profiler_agent
Oct 05 00:01:06 prod-mongodb-build-01 pmm-agent[2321084]: WARN[2023-10-05T00:01:06.996+00:00] couldn't retrieve data from cursor (CappedPositionLost) Executor error during getMore :: caused by :: CollectionScan died due to failure to restore tailable cursor position. Last seen record id: RecordId(801320721)  agentID=/agent_id/84387772-8197-494a-b789-b7c314231b3a component=agent-builtin db=my_database type=qan_mongodb_profiler_agent
.
.
.
Oct 05 01:11:30 prod-mongodb-build-01 pmm-agent[2321084]: WARN[2023-10-05T01:11:30.557+00:00] couldn't retrieve data from cursor (CappedPositionLost) Executor error during getMore :: caused by :: CollectionScan died due to failure to restore tailable cursor position. Last seen record id: RecordId(801324404)  agentID=/agent_id/84387772-8197-494a-b789-b7c314231b3a component=agent-builtin db=my_database type=qan_mongodb_profiler_agent

After 01:11:30 log is empty. And as I can see in pmm clickhouse on PMM server in this time clickhouse has last metric from my database at same time:

58f15d02fc35 :) select period_start from pmm.metrics where database = 'my_database' ORDER BY period_start desc limit 1;

SELECT period_start
FROM pmm.metrics
WHERE database = 'my_database'
ORDER BY period_start DESC
LIMIT 1

Query id: cbb423e9-217f-4010-91fc-9abd9b7e3d2a

┌────────period_start─┐
│ 2023-10-05 01:11:00 │
└─────────────────────┘

1 rows in set. Elapsed: 0.030 sec. Processed 2.97 million rows, 8.16 MB (98.16 million rows/s., 269.30 MB/s.) 

58f15d02fc35 :)

from pmm.

artemsafiyulin avatar artemsafiyulin commented on July 24, 2024

@BupycHuk you haven't ideas ?

from pmm.

BupycHuk avatar BupycHuk commented on July 24, 2024

@artemsafiyulin, no, no ideas why it might happen. Just a question: how many DBs do you have?

from pmm.

artemsafiyulin avatar artemsafiyulin commented on July 24, 2024

@BupycHuk I have 3 mongodb instances (1 master and 2 replicas). In mongodb cluster i have 1 not default database.

from pmm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.