Comments (10)
👏 Thanks for not only tackling this issue, but also for documenting the process so nicely! Reading this issue was very instructive and insightful.
from brew.
Looks like there are a couple other issues besides the pattern for valid XML characters.
The output from the test steps is buffered one byte at a time, because the IO streams for the IO.pipe
are encoded as BINARY
.
begin
pid = fork do
read.close
$stdout.reopen(write)
$stderr.reopen(write)
write.close
working_dir.cd { exec(*@command) }
end
write.close
while buf = read.read(1)
if verbose
print buf
$stdout.flush
end
@output << buf
end
ensure
read.close
end
Since buf
is a BINARY
aka ASCII-8BIT
stream, I think that means each individual byte will be coerced to a UTF-8
character, not code unit, when it's appended to @output
, which is initialized with @output = ""
and thus has UTF-8 encoding. So multibyte UTF-8 characters will be transcoded wrong. The output should be buffered as binary and only converted when fully collected (or converted in chunks with support for "remainders").
Then later on, long output is truncated. At this point, it's a UTF-8 string, so slice()
takes offsets in character counts. But the bounds are calculated in terms of bytes, so they may be off, and it's possible that output with lots of multibyte characters could cause the offsets to fall outside the bounds of the string, erroneously producing nil
instead of a substring. Or if there are lots of multibyte characters, the resulting truncated string could still be significantly longer than 1 MB, which could be a problem if that 1 MB is a hard limit.
if output.bytesize > BYTES_IN_1_MEGABYTE
output = "truncated output to 1MB:\n" \
+ output.slice(-BYTES_IN_1_MEGABYTE, BYTES_IN_1_MEGABYTE)
end
Working on fixing those up as part of this.
from brew.
I think I have a fix for this – apjanke@eb8db20. But I don't know how to write a unit test for it, since test-bot
is already a testing scaffold, and (I think) is dependent on the state of your local git repos.
Seems to work when I run it locally on my 10.9 and 10.11 boxes. And I found why the issue might be occurring for this case: one of the file names in the akka
distribution has spaces and an oddball multibyte character in its name: akka-2.4.2/lib/akka/i have sp+�ces.jar
. Looks like it's there to exercise encoding and escaping issues just like this.
When I look at the diff in the output between the current code and my patch, it shows up as different. But I'm having trouble pulling out exactly the difference, because the text editors and stuff I'm working with are probably doing their own invalid-character-replacement stuff.
4784c4784
< inflating: akka-2.4.2/lib/akka/i have sp+�ces.jar
---
> inflating: akka-2.4.2/lib/akka/i have sp+�ces.jar
Next up: finding a simple command-line XML validator that I can pass the "before" and "after" brew-test-bot.xml
output to, to see if validation is fixed. Preferably one using the same Java XML libraries that Jenkins does. Seems like I ought to know one, but my XML-fu all involves developing big Java programs.
from brew.
Oh, ha ha: it's just xmllint
, included with OS X.
It likes my changes.
[~/tmp/test-bot]
$ xmllint --noout brew-test-bot-orig.xml
brew-test-bot-orig.xml:4784: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xF1 0x63 0x65 0x73
inflating: akka-2.4.2/lib/akka/i have sp+�ces.jar
^
[✘ ~/tmp/test-bot]
$ xmllint --noout brew-test-bot-after.xml
[~/tmp/test-bot]
$
from brew.
@apjanke Nice work on this. When you're happy with it I think it's fine to just push to master
for the test-bot.rb
to run and just keep an eye on it for a few jobs.
from brew.
Thank you!
If @xu-cheng is awake, would like to hear his feedback before pushing, since he knows test-bot
.
from brew.
I think you can push it. Looks good to me.
from brew.
Thanks! I'm about to go to bed. I'll push it in the morning when I'll be around to monitor the test jobs for a while.
from brew.
Pushed it. I'll be watching the test-bot for the next few hours, and running some additional builds to exercise it.
from brew.
I pushed the revised version. Since then, some Jenkins runs have been failing with "output not available", but they're Jenkins-level errors saying "GitHub is down for maintenance", so I'm pretty sure it's a coincidence. Will keep an eye on it.
from brew.
Related Issues (20)
- Support GitHub App installation access tokens HOT 2
- Command "brew update" shows "/dev/stdin: No such file or directory - /tmp/sh-thd-2853475760 (deleted) (Errno::ENOENT)" HOT 8
- How should we store installed cask information? HOT 18
- Integrating attestation verification into `brew install` (and `brew verify`) HOT 17
- `brew bump --open-pr` fails to fetch repology data HOT 5
- Add a RuboCop rule for sorting EnvConfig keys
- Make `update <formula>` actually `upgrade formula` HOT 2
- `brew info` should always print whether formula installed HOT 5
- `cask upgrade` will be downloaded and installed repeatedly HOT 10
- `brew log` regression after port to `abstract_command` HOT 1
- option to install package to home directory or a custom directory HOT 1
- Running the macOS installer changes permissions on `~/.gitconfig` HOT 5
- Introduce additional versioning information for casks HOT 6
- Using kickstart? HOT 4
- Add `brew asdf-sync`; equivalent to `brew nodenv-sync` / `brew pyenv-sync` / `brew rbenv-sync` HOT 2
- simple command to list all user installed items (not installed only as dependency) HOT 4
- Add new url scheme for the GitDownloadStrategy
- `eval "$(/usr/bin/env brew shellenv)"` causes duplicate `path`, `man path`, `info path` HOT 4
- `brew tests` should pass with `homebrew/core` untapped HOT 4
- Remove `FormulaTextAuditor`. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from brew.