Giter Club home page Giter Club logo

Comments (10)

UniqMartin avatar UniqMartin commented on May 21, 2024 1

👏 Thanks for not only tackling this issue, but also for documenting the process so nicely! Reading this issue was very instructive and insightful.

from brew.

apjanke avatar apjanke commented on May 21, 2024

Looks like there are a couple other issues besides the pattern for valid XML characters.

The output from the test steps is buffered one byte at a time, because the IO streams for the IO.pipe are encoded as BINARY.

      begin
        pid = fork do
          read.close
          $stdout.reopen(write)
          $stderr.reopen(write)
          write.close
          working_dir.cd { exec(*@command) }
        end
        write.close
        while buf = read.read(1)
          if verbose
            print buf
            $stdout.flush
          end
          @output << buf
        end
      ensure
        read.close
      end

Since buf is a BINARY aka ASCII-8BIT stream, I think that means each individual byte will be coerced to a UTF-8 character, not code unit, when it's appended to @output, which is initialized with @output = "" and thus has UTF-8 encoding. So multibyte UTF-8 characters will be transcoded wrong. The output should be buffered as binary and only converted when fully collected (or converted in chunks with support for "remainders").

Then later on, long output is truncated. At this point, it's a UTF-8 string, so slice() takes offsets in character counts. But the bounds are calculated in terms of bytes, so they may be off, and it's possible that output with lots of multibyte characters could cause the offsets to fall outside the bounds of the string, erroneously producing nil instead of a substring. Or if there are lots of multibyte characters, the resulting truncated string could still be significantly longer than 1 MB, which could be a problem if that 1 MB is a hard limit.

            if output.bytesize > BYTES_IN_1_MEGABYTE
              output = "truncated output to 1MB:\n" \
                + output.slice(-BYTES_IN_1_MEGABYTE, BYTES_IN_1_MEGABYTE)
            end

Working on fixing those up as part of this.

from brew.

apjanke avatar apjanke commented on May 21, 2024

I think I have a fix for this – apjanke@eb8db20. But I don't know how to write a unit test for it, since test-bot is already a testing scaffold, and (I think) is dependent on the state of your local git repos.

Seems to work when I run it locally on my 10.9 and 10.11 boxes. And I found why the issue might be occurring for this case: one of the file names in the akka distribution has spaces and an oddball multibyte character in its name: akka-2.4.2/lib/akka/i have sp+�ces.jar. Looks like it's there to exercise encoding and escaping issues just like this.

When I look at the diff in the output between the current code and my patch, it shows up as different. But I'm having trouble pulling out exactly the difference, because the text editors and stuff I'm working with are probably doing their own invalid-character-replacement stuff.

4784c4784
<   inflating: akka-2.4.2/lib/akka/i have sp+�ces.jar
---
>   inflating: akka-2.4.2/lib/akka/i have sp+�ces.jar

Next up: finding a simple command-line XML validator that I can pass the "before" and "after" brew-test-bot.xml output to, to see if validation is fixed. Preferably one using the same Java XML libraries that Jenkins does. Seems like I ought to know one, but my XML-fu all involves developing big Java programs.

from brew.

apjanke avatar apjanke commented on May 21, 2024

Oh, ha ha: it's just xmllint, included with OS X.

It likes my changes.

[~/tmp/test-bot]
$ xmllint --noout brew-test-bot-orig.xml
brew-test-bot-orig.xml:4784: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xF1 0x63 0x65 0x73
  inflating: akka-2.4.2/lib/akka/i have sp+�ces.jar
                                           ^
[✘ ~/tmp/test-bot]
$ xmllint --noout brew-test-bot-after.xml
[~/tmp/test-bot]
$

from brew.

MikeMcQuaid avatar MikeMcQuaid commented on May 21, 2024

@apjanke Nice work on this. When you're happy with it I think it's fine to just push to master for the test-bot.rb to run and just keep an eye on it for a few jobs.

from brew.

apjanke avatar apjanke commented on May 21, 2024

Thank you!

If @xu-cheng is awake, would like to hear his feedback before pushing, since he knows test-bot.

from brew.

xu-cheng avatar xu-cheng commented on May 21, 2024

I think you can push it. Looks good to me.

from brew.

apjanke avatar apjanke commented on May 21, 2024

Thanks! I'm about to go to bed. I'll push it in the morning when I'll be around to monitor the test jobs for a while.

from brew.

apjanke avatar apjanke commented on May 21, 2024

Pushed it. I'll be watching the test-bot for the next few hours, and running some additional builds to exercise it.

from brew.

apjanke avatar apjanke commented on May 21, 2024

I pushed the revised version. Since then, some Jenkins runs have been failing with "output not available", but they're Jenkins-level errors saying "GitHub is down for maintenance", so I'm pretty sure it's a coincidence. Will keep an eye on it.

from brew.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.