Giter Club home page Giter Club logo

Comments (22)

joeyguerra avatar joeyguerra commented on July 18, 2024 1

Can you start Hubot with HUBOT_LOG_LEVEL=debug to see what line of code the execution is getting to?

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024 1

Sure. The batch happens here:

const chunkSize = 10
    for (let i = 0; i < serviceNames.length; i += chunkSize) {
      let chunk = serviceNames.slice(i, i + chunkSize)
      const ignoredFromChunk = chunk.filter((service) => ignoredServices.includes(service))
      ignored.push.apply(ignored, ignoredFromChunk)
      chunk = chunk.filter((service) => !ignoredServices.includes(service))
      if (chunk.length < 1) {
        continue
      }
      let input = {
        cluster,
        services: chunk,
        include: []
      }
      let command = new DescribeServicesCommand(input)
      let serviceData
      try {
        serviceData = await ecsClient.send(command)
        serviceData = serviceData.services
      } catch (err) {
        robot.logger.error(`Request to AWS failed: ${err}`)
      }

Let's expand that a bit:

Instead of doing

    for (let i = 0; i < serviceNames.length; i++) {
      const service = serviceNames[i]
      if (ignoredServices.includes(service)) {
        ignored.push(service)
        continue
      }
      let input = {
        cluster,
        services: [service],
        include: []
      }
      let command = new DescribeServicesCommand(input)
      let serviceData
      try {
        serviceData = await ecsClient.send(command)
        serviceData = serviceData.services[0]
      } catch (err) {
        robot.logger.error(`Request to AWS failed: ${err}`)
      }

I now loop through the list of serviceArns in groups of 10 (and do some filtering). This means that I would send a request like ['service1', 'service2', ..., 'service10'] instead of [service1], [service2], etc. Reducing the time taken collecting data from AWS by a factor of 10.

I think perhaps surfacing the request timeout (somehow) would be amazing. Just so we know it's there.

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024 1

I see. The new code changes

let input = {
  cluster,
  services: [service],
  include: []
}

to

let input = {
  cluster,
  services: services,
  include: []
}

where services is an array of service names without the ignored ones.

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024 1

Closing this as it seems to be more an issue with timeouts within adapters. Thanks for the help!

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

Nothing in the code immediately stands out to me as the culprit. I have a few probing questions:

  • how is Hubot hosted? i.e. in kubernetes, on an EC2 instance, ????
  • how many instances of Hubot are running?
  • Does a single instance of Hubot have access to Prod, Dev and Stage?
  • What version of Hubot is running?

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024

Nothing in the code immediately stands out to me as the culprit. I have a few probing questions:

* how is Hubot hosted? i.e. in kubernetes, on an EC2 instance, ????

Docker container in ECS

* how many instances of Hubot are running?

1

* Does a single instance of Hubot have access to Prod, Dev and Stage?

Yes. Read-only access. And importantly, it's to the ECS Clusters. Not separate accounts/environments/etc.

* What version of Hubot is running?

11.1

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

Does it only respond 4 times when the value is "Production"?

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024

Or when I leave it to "default". So when cluster === Production.

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

what chat adapter are you using?
Does it respond 4 times with the same answer?

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024

what chat adapter are you using? Does it respond 4 times with the same answer?

https://github.com/hubot-friends/hubot-slack
Yep. Exact same response, 4 times. Also takes about 20 minutes to get all four replies.

(Updated all that in the initial question, too)

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

Ok. I've seen this behavior before during development. The issue was that the code failed to acknowledge the message. In that situation. the Slack system will "retry sending the message". Here's where the code is supposed to acknowledge the message.

I also see an issue in the Slack Adapter. It's not awaiting robot.receive. I'm unsure what that will cause, but I'll have to push a fix for that.

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

I've also added the await call in the Slack Adapter.

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024

Seems to just...receive the message multiple times? To be clear...I definitely only typed it once, but this pattern (and I'm hesitant to give you full log messages...) looks like it's just...getting the message again.
Screenshot 2024-02-29 at 4 15 02 PM

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024

Updated to the new adapter and I still get the duplicate messages. :(

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

Another thought is to await res.send because it's async.

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024

await res.send() also doesn't help.

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

Is it odd that the envelope_id is different for each of those messages?

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

Can you run a Hubot instance locally on your machine and replace the behavior?

from hubot.

xurizaemon avatar xurizaemon commented on July 18, 2024

It sounds like you might have a plausible cause, so add several grains of salt to anything in this comment :)

When I've observed Hubot get into a repeats-replies state, I had a suspicion it related to functionality such as remind-her or polling plugins (eg watch statuspage, report when status changes). It seemed like the use of setTimeout() or setInterval() could create concurrent threads. (The fact that you see it reply four times specifically suggests to me this doesn't quite fit ... but maybe there's a magic number in that system I don't know about.)

If the current best theory doesn't pan out, maybe consider which plugins could be disabled to isolate the behaviour?

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024

There is a timeout in the slack response! Because this query to AWS is relatively slow, that doesn't entirely surprise me:

{"level":20,"time":1709307623089,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"Text = @hubot ecs list stale tasks"}
{"level":20,"time":1709307623089,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"Event subtype = undefined"}
{"level":20,"time":1709307623089,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"Received generic message: message"}
{"level":20,"time":1709307623090,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"Message '@hubot ecs list stale tasks' matched regex //^\\s*[@]?Hubot[:,]?\\s*(?:ecs list stale tasks( in )?([A-Za-z0-9-]+)?)/i/; listener.options = { id: null }"}
{"level":20,"time":1709307626395,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"eventHandler {
 \"envelope_id\": \"bd22596e-ee19-4201-8250-792f91fc96d7\",
 \"body\": {
    \"token\": \"<>\",
    \"team_id\": \"<>\",
    \"context_team_id\": \"<>\",
    \"context_enterprise_id\": null,
    \"api_app_id\": \"<>\",
    \"event\": {
      \"client_msg_id\": \"<>\",
      \"type\": \"message\",
      \"text\": \"<@hubot> ecs list stale tasks\",
      \"user\": \"<>\",
      \"ts\": \"1709307622.850469\",
      \"blocks\": [
        {
          \"type\": \"rich_text\",
          \"block_id\": \"5X8EE\",
          \"elements\": [
            {
              \"type\": \"rich_text_section\",
              \"elements\": [
                {
                  \"type\": \"user\",
                  \"user_id\": \"<>\"
                },
                {
                  \"type\": \"text\",
                  \"text\": \" ecs list stale tasks\"
                }
              ]
            }
          ]
        }
      ],
      \"team\": \"<>\",
      \"channel\": \"<>\",
      \"event_ts\": \"1709307622.850469\",
      \"channel_type\": \"channel\"
    },
    \"type\": \"event_callback\",
    \"event_id\": \"<>\",
    \"event_time\": 1709307622,
    \"authorizations\": [
      {
        \"enterprise_id\": null,
        \"team_id\": \"<>\",
        \"user_id\": \"<>\",
        \"is_bot\": true,
        \"is_enterprise_install\": false
      }
    ],
    \"is_ext_shared_channel\": false,
    \"event_context\": \"<>\"
  }
  \"event\": {
    \"client_msg_id\": \"<>\",
    \"type\": \"message\",
    \"text\": \"<@hubot> ecs list stale tasks\",
    \"user\": \"<>\",
    \"ts\": \"1709307622.850469\",
    \"blocks\": [
      {
        \"type\": \"rich_text\",
        \"block_id\": \"5X8EE\",
        \"elements\": [
          {
            \"type\": \"rich_text_section\",
            \"elements\": [
              {
                \"type\": \"user\",
                \"user_id\": \"<>\"
              },
              {
                \"type\": \"text\",
                \"text\": \" ecs list stale tasks\"
              }
            ]
          }
        ]
      }
    ],
    \"team\": \"<>\",
    \"channel\": \"<>\",
    \"event_ts\": \"1709307622.850469\",
    \"channel_type\": \"channel\"
  },
  \"retry_num\": 1,
  \"retry_reason\": \"timeout\",
  \"accepts_response_payload\": false
}"
}
{"level":20,"time":1709307626395,"pid":11932,"hostname":"John-Seekins-MacBook-Pro-16-inch-2023-","name":"Hubot","msg":"event {
  \"envelope_id\": \"<>\",
  \"body\": {
    \"token\": \"<>\",
    \"team_id\": \"<>\",
    \"context_team_id\": \"<>\",
    \"context_enterprise_id\": null,
    \"api_app_id\": \"<>\",
    \"event\": {
      \"client_msg_id\": \"<>\",
      \"type\": \"message\",
      \"text\": \"<@hubot> ecs list stale asks\",
      \"user\": \"<>\",
      \"ts\": \"1709307622.850469\",
      \"blocks\": [
        {
          \"type\": \"rich_text\",
          \"block_id\": \"5X8EE\",
          \"elements\": [
            {
              \"type\": \"rich_text_section\",
              \"elements\": [
                {
                  \"type\": \"user\",
                  \"user_id\": \"<>\"
                },
                {
                  \"type\": \"text\",
                  \"text\": \" ecs list stale tasks\"
                }
              ]
            }
          ]
        }
      ],
      \"team\": \"<>",
      \"channel\": \"<>\",
      \"event_ts\": \"1709307622.850469\",
      \"channel_type\": \"channel\"
    },
    \"type\": \"event_callback\",
    \"event_id\": \"<>\",
    \"event_time\": 1709307622,
    \"authorizations\": [
      {
        \"enterprise_id\": null,
        \"team_id\": \"<>\",
        \"user_id\": \"<>\",
        \"is_bot\": true,
        \"is_enterprise_install\": false
      }
    ],
    \"is_ext_shared_channel\": false,
    \"event_context\": \"<>\"
  },
  \"event\": {
    \"client_msg_id\": \"<>\",
    \"type\": \"message\",
    \"text\": \"<@hubot> ecs list stale tasks\",
    \"user\": \"<>",
    \"ts\": \"1709307622.850469\",
    \"blocks\": [
      {\n        \"type\": \"rich_text\",
        \"block_id\": \"5X8EE\",
        \"elements\": [
          {
            \"type\": \"rich_text_section\",
            \"elements\": [
              {
                \"type\": \"user\",
                \"user_id\": \"<>\"
              },
              {
                \"type\": \"text\",
                \"text\": \" ecs list stale tasks\"
              }
            ]
          }
        ]
      }
    ],
    \"team\": \"<>\",
    \"channel\": \"<>\",
    \"event_ts\": \"1709307622.850469\",
    \"channel_type\": \"channel\"
  },
  \"retry_num\": 1,
  \"retry_reason\": \"timeout\",
  \"accepts_response_payload\": false}
 user = <>"
}

from hubot.

johnseekins-pathccm avatar johnseekins-pathccm commented on July 18, 2024

It's definitely me racing a timeout! I changed the code to batch AWS requests more efficiently and I'm no longer getting duplicate messages!

Relevant code:

  /*
   * Stale Deploys
   */
  robot.respond(/ecs list stale tasks( in )?([A-Za-z0-9-]+)?/i, async res => {
    const cluster = res.match[2] || defaultCluster
    const services = await paginateServices(ecsClient, cluster)
    // no need to sort these results
    const serviceNames = services.map((x) => x.split('/')[x.split('/').length - 1])
    const staleDateShort = new Date(Date.now() - shortExpireSecs)
    const staleDateLong = new Date(Date.now() - longExpireSecs)
    const expiredDate = new Date(Date.now() - expiredSecs)
    let ignored = []
    let shortExp = []
    let longExp = []
    let exp = []
    /*
     * Collect service data
     */
    const chunkSize = 10
    for (let i = 0; i < serviceNames.length; i += chunkSize) {
      let chunk = serviceNames.slice(i, i + chunkSize)
      const ignoredFromChunk = chunk.filter((service) => ignoredServices.includes(service))
      ignored.push.apply(ignored, ignoredFromChunk)
      chunk = chunk.filter((service) => !ignoredServices.includes(service))
      if (chunk.length < 1) {
        continue
      }
      let input = {
        cluster,
        services: chunk,
        include: []
      }
      let command = new DescribeServicesCommand(input)
      let serviceData
      try {
        serviceData = await ecsClient.send(command)
        serviceData = serviceData.services
      } catch (err) {
        robot.logger.error(`Request to AWS failed: ${err}`)
      }
      for (let idx = 0; idx < serviceData.length; idx++) {
        const deployDate = new Date(serviceData[idx].deployments[0].createdAt)
        // skip any service newer than our longest expiration window
        if (deployDate > staleDateLong) {
          continue
        }

        const servString = `\`${serviceData[idx].serviceName}\` (deployed ${deployDate.toISOString()})`
        if (deployDate < expiredDate) {
          exp.push(servString)
        } else if (deployDate < staleDateShort) {
          shortExp.push(servString)
        } else {
          longExp.push(servString)
        }
      }
    }

from hubot.

joeyguerra avatar joeyguerra commented on July 18, 2024

Well done tracking down this bug.

I don't see the code that "batches the AWS requests". Would you mind pointing it out for me? I'd love to see how you solved it.

I'm also curious if there's a move I can make to the Slack Adapter to either not let this situation happen or make it very visible that it's happening.

from hubot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.