We've written a custom hubot to interact with AWS and report back to users. One

Sure. The batch happens here: <div class="snippet-clipboard-content notranslate po

I see. The new code changes <div class="highlight highlight-source-js notranslate

Hubot responds 4 times to one message. But only some times?,about hubotio/hubot

Comments (22)

joeyguerra commented on July 18, 2024 1

Can you start Hubot with HUBOT_LOG_LEVEL=debug to see what line of code the execution is getting to?

from hubot.

johnseekins-pathccm commented on July 18, 2024 1

Sure. The batch happens here:

const chunkSize = 10
    for (let i = 0; i < serviceNames.length; i += chunkSize) {
      let chunk = serviceNames.slice(i, i + chunkSize)
      const ignoredFromChunk = chunk.filter((service) => ignoredServices.includes(service))
      ignored.push.apply(ignored, ignoredFromChunk)
      chunk = chunk.filter((service) => !ignoredServices.includes(service))
      if (chunk.length < 1) {
        continue
      }
      let input = {
        cluster,
        services: chunk,
        include: []
      }
      let command = new DescribeServicesCommand(input)
      let serviceData
      try {
        serviceData = await ecsClient.send(command)
        serviceData = serviceData.services
      } catch (err) {
        robot.logger.error(`Request to AWS failed: ${err}`)
      }

Let's expand that a bit:

Instead of doing

    for (let i = 0; i < serviceNames.length; i++) {
      const service = serviceNames[i]
      if (ignoredServices.includes(service)) {
        ignored.push(service)
        continue
      }
      let input = {
        cluster,
        services: [service],
        include: []
      }
      let command = new DescribeServicesCommand(input)
      let serviceData
      try {
        serviceData = await ecsClient.send(command)
        serviceData = serviceData.services[0]
      } catch (err) {
        robot.logger.error(`Request to AWS failed: ${err}`)
      }

I now loop through the list of serviceArns in groups of 10 (and do some filtering). This means that I would send a request like ['service1', 'service2', ..., 'service10'] instead of [service1], [service2], etc. Reducing the time taken collecting data from AWS by a factor of 10.

I think perhaps surfacing the request timeout (somehow) would be amazing. Just so we know it's there.

from hubot.

joeyguerra commented on July 18, 2024 1

I see. The new code changes

let input = {
  cluster,
  services: [service],
  include: []
}

let input = {
  cluster,
  services: services,
  include: []
}

where services is an array of service names without the ignored ones.

from hubot.

johnseekins-pathccm commented on July 18, 2024 1

Closing this as it seems to be more an issue with timeouts within adapters. Thanks for the help!

from hubot.

joeyguerra commented on July 18, 2024

Nothing in the code immediately stands out to me as the culprit. I have a few probing questions:

how is Hubot hosted? i.e. in kubernetes, on an EC2 instance, ????
how many instances of Hubot are running?
Does a single instance of Hubot have access to Prod, Dev and Stage?
What version of Hubot is running?

from hubot.

johnseekins-pathccm commented on July 18, 2024

Nothing in the code immediately stands out to me as the culprit. I have a few probing questions:
* how is Hubot hosted? i.e. in kubernetes, on an EC2 instance, ????

Docker container in ECS

* how many instances of Hubot are running?

* Does a single instance of Hubot have access to Prod, Dev and Stage?

Yes. Read-only access. And importantly, it's to the ECS Clusters. Not separate accounts/environments/etc.

* What version of Hubot is running?

11.1

from hubot.

joeyguerra commented on July 18, 2024

Does it only respond 4 times when the value is "Production"?

from hubot.

johnseekins-pathccm commented on July 18, 2024

Or when I leave it to "default". So when cluster === Production.

from hubot.

joeyguerra commented on July 18, 2024

what chat adapter are you using?
Does it respond 4 times with the same answer?

from hubot.

johnseekins-pathccm commented on July 18, 2024

what chat adapter are you using? Does it respond 4 times with the same answer?

https://github.com/hubot-friends/hubot-slack
Yep. Exact same response, 4 times. Also takes about 20 minutes to get all four replies.

(Updated all that in the initial question, too)

from hubot.

joeyguerra commented on July 18, 2024

Ok. I've seen this behavior before during development. The issue was that the code failed to acknowledge the message. In that situation. the Slack system will "retry sending the message". Here's where the code is supposed to acknowledge the message.

I also see an issue in the Slack Adapter. It's not awaiting robot.receive. I'm unsure what that will cause, but I'll have to push a fix for that.

from hubot.

joeyguerra commented on July 18, 2024

I've also added the await call in the Slack Adapter.

from hubot.

johnseekins-pathccm commented on July 18, 2024

Seems to just...receive the message multiple times? To be clear...I definitely only typed it once, but this pattern (and I'm hesitant to give you full log messages...) looks like it's just...getting the message again.

from hubot.

johnseekins-pathccm commented on July 18, 2024

Updated to the new adapter and I still get the duplicate messages. :(

from hubot.

joeyguerra commented on July 18, 2024

Another thought is to await res.send because it's async.

from hubot.

johnseekins-pathccm commented on July 18, 2024

await res.send() also doesn't help.

from hubot.

joeyguerra commented on July 18, 2024

Is it odd that the envelope_id is different for each of those messages?

from hubot.

joeyguerra commented on July 18, 2024

Can you run a Hubot instance locally on your machine and replace the behavior?

from hubot.

xurizaemon commented on July 18, 2024

It sounds like you might have a plausible cause, so add several grains of salt to anything in this comment :)

When I've observed Hubot get into a repeats-replies state, I had a suspicion it related to functionality such as remind-her or polling plugins (eg watch statuspage, report when status changes). It seemed like the use of setTimeout() or setInterval() could create concurrent threads. (The fact that you see it reply four times specifically suggests to me this doesn't quite fit ... but maybe there's a magic number in that system I don't know about.)

If the current best theory doesn't pan out, maybe consider which plugins could be disabled to isolate the behaviour?

from hubot.

johnseekins-pathccm commented on July 18, 2024

There is a timeout in the slack response! Because this query to AWS is relatively slow, that doesn't entirely surprise me:

{"level":20,"time":1709307623089,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"Text = @hubot ecs list stale tasks"}
{"level":20,"time":1709307623089,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"Event subtype = undefined"}
{"level":20,"time":1709307623089,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"Received generic message: message"}
{"level":20,"time":1709307623090,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"Message '@hubot ecs list stale tasks' matched regex //^\\s*[@]?Hubot[:,]?\\s*(?:ecs list stale tasks( in )?([A-Za-z0-9-]+)?)/i/; listener.options = { id: null }"}
{"level":20,"time":1709307626395,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"eventHandler {
 \"envelope_id\": \"bd22596e-ee19-4201-8250-792f91fc96d7\",
 \"body\": {
    \"token\": \"<>\",
    \"team_id\": \"<>\",
    \"context_team_id\": \"<>\",
    \"context_enterprise_id\": null,
    \"api_app_id\": \"<>\",
    \"event\": {
      \"client_msg_id\": \"<>\",
      \"type\": \"message\",
      \"text\": \"<@hubot> ecs list stale tasks\",
      \"user\": \"<>\",
      \"ts\": \"1709307622.850469\",
      \"blocks\": [
        {
          \"type\": \"rich_text\",
          \"block_id\": \"5X8EE\",
          \"elements\": [
            {
              \"type\": \"rich_text_section\",
              \"elements\": [
                {
                  \"type\": \"user\",
                  \"user_id\": \"<>\"
                },
                {
                  \"type\": \"text\",
                  \"text\": \" ecs list stale tasks\"
                }
              ]
            }
          ]
        }
      ],
      \"team\": \"<>\",
      \"channel\": \"<>\",
      \"event_ts\": \"1709307622.850469\",
      \"channel_type\": \"channel\"
    },
    \"type\": \"event_callback\",
    \"event_id\": \"<>\",
    \"event_time\": 1709307622,
    \"authorizations\": [
      {
        \"enterprise_id\": null,
        \"team_id\": \"<>\",
        \"user_id\": \"<>\",
        \"is_bot\": true,
        \"is_enterprise_install\": false
      }
    ],
    \"is_ext_shared_channel\": false,
    \"event_context\": \"<>\"
  }
  \"event\": {
    \"client_msg_id\": \"<>\",
    \"type\": \"message\",
    \"text\": \"<@hubot> ecs list stale tasks\",
    \"user\": \"<>\",
    \"ts\": \"1709307622.850469\",
    \"blocks\": [
      {
        \"type\": \"rich_text\",
        \"block_id\": \"5X8EE\",
        \"elements\": [
          {
            \"type\": \"rich_text_section\",
            \"elements\": [
              {
                \"type\": \"user\",
                \"user_id\": \"<>\"
              },
              {
                \"type\": \"text\",
                \"text\": \" ecs list stale tasks\"
              }
            ]
          }
        ]
      }
    ],
    \"team\": \"<>\",
    \"channel\": \"<>\",
    \"event_ts\": \"1709307622.850469\",
    \"channel_type\": \"channel\"
  },
  \"retry_num\": 1,
  \"retry_reason\": \"timeout\",
  \"accepts_response_payload\": false
}"
}
{"level":20,"time":1709307626395,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"event {
  \"envelope_id\": \"<>\",
  \"body\": {
    \"token\": \"<>\",
    \"team_id\": \"<>\",
    \"context_team_id\": \"<>\",
    \"context_enterprise_id\": null,
    \"api_app_id\": \"<>\",
    \"event\": {
      \"client_msg_id\": \"<>\",
      \"type\": \"message\",
      \"text\": \"<@hubot> ecs list stale asks\",
      \"user\": \"<>\",
      \"ts\": \"1709307622.850469\",
      \"blocks\": [
        {
          \"type\": \"rich_text\",
          \"block_id\": \"5X8EE\",
          \"elements\": [
            {
              \"type\": \"rich_text_section\",
              \"elements\": [
                {
                  \"type\": \"user\",
                  \"user_id\": \"<>\"
                },
                {
                  \"type\": \"text\",
                  \"text\": \" ecs list stale tasks\"
                }
              ]
            }
          ]
        }
      ],
      \"team\": \"<>",
      \"channel\": \"<>\",
      \"event_ts\": \"1709307622.850469\",
      \"channel_type\": \"channel\"
    },
    \"type\": \"event_callback\",
    \"event_id\": \"<>\",
    \"event_time\": 1709307622,
    \"authorizations\": [
      {
        \"enterprise_id\": null,
        \"team_id\": \"<>\",
        \"user_id\": \"<>\",
        \"is_bot\": true,
        \"is_enterprise_install\": false
      }
    ],
    \"is_ext_shared_channel\": false,
    \"event_context\": \"<>\"
  },
  \"event\": {
    \"client_msg_id\": \"<>\",
    \"type\": \"message\",
    \"text\": \"<@hubot> ecs list stale tasks\",
    \"user\": \"<>",
    \"ts\": \"1709307622.850469\",
    \"blocks\": [
      {\n        \"type\": \"rich_text\",
        \"block_id\": \"5X8EE\",
        \"elements\": [
          {
            \"type\": \"rich_text_section\",
            \"elements\": [
              {
                \"type\": \"user\",
                \"user_id\": \"<>\"
              },
              {
                \"type\": \"text\",
                \"text\": \" ecs list stale tasks\"
              }
            ]
          }
        ]
      }
    ],
    \"team\": \"<>\",
    \"channel\": \"<>\",
    \"event_ts\": \"1709307622.850469\",
    \"channel_type\": \"channel\"
  },
  \"retry_num\": 1,
  \"retry_reason\": \"timeout\",
  \"accepts_response_payload\": false}
 user = <>"
}

from hubot.

johnseekins-pathccm commented on July 18, 2024

It's definitely me racing a timeout! I changed the code to batch AWS requests more efficiently and I'm no longer getting duplicate messages!

Relevant code:

  /*
   * Stale Deploys
   */
  robot.respond(/ecs list stale tasks( in )?([A-Za-z0-9-]+)?/i, async res => {
    const cluster = res.match[2] || defaultCluster
    const services = await paginateServices(ecsClient, cluster)
    // no need to sort these results
    const serviceNames = services.map((x) => x.split('/')[x.split('/').length - 1])
    const staleDateShort = new Date(Date.now() - shortExpireSecs)
    const staleDateLong = new Date(Date.now() - longExpireSecs)
    const expiredDate = new Date(Date.now() - expiredSecs)
    let ignored = []
    let shortExp = []
    let longExp = []
    let exp = []
    /*
     * Collect service data
     */
    const chunkSize = 10
    for (let i = 0; i < serviceNames.length; i += chunkSize) {
      let chunk = serviceNames.slice(i, i + chunkSize)
      const ignoredFromChunk = chunk.filter((service) => ignoredServices.includes(service))
      ignored.push.apply(ignored, ignoredFromChunk)
      chunk = chunk.filter((service) => !ignoredServices.includes(service))
      if (chunk.length < 1) {
        continue
      }
      let input = {
        cluster,
        services: chunk,
        include: []
      }
      let command = new DescribeServicesCommand(input)
      let serviceData
      try {
        serviceData = await ecsClient.send(command)
        serviceData = serviceData.services
      } catch (err) {
        robot.logger.error(`Request to AWS failed: ${err}`)
      }
      for (let idx = 0; idx < serviceData.length; idx++) {
        const deployDate = new Date(serviceData[idx].deployments[0].createdAt)
        // skip any service newer than our longest expiration window
        if (deployDate > staleDateLong) {
          continue
        }

        const servString = `\`${serviceData[idx].serviceName}\` (deployed ${deployDate.toISOString()})`
        if (deployDate < expiredDate) {
          exp.push(servString)
        } else if (deployDate < staleDateShort) {
          shortExp.push(servString)
        } else {
          longExp.push(servString)
        }
      }
    }

from hubot.

joeyguerra commented on July 18, 2024

Well done tracking down this bug.

I don't see the code that "batches the AWS requests". Would you mind pointing it out for me? I'd love to see how you solved it.

I'm also curious if there's a move I can make to the Slack Adapter to either not let this situation happen or make it very visible that it's happening.

from hubot.

Hubot responds 4 times to one message. But only some times? about hubot HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent