Comments (18)
Okay, glad that PYTHONUTF8=1 solves the immediate problem.
I think I need to modify flawfinder to note this as an option - so don't close this yet.
Take care!
from flawfinder.
This looks like the text isn't actually UTF-8 in the file being analyzed. Have you verified that the file being examined actually complies with UTF-8?
If it doesn't comply with UTF-8 (seems likely), see the documentation on various options. Sadly, Python3 doesn't provide good tools for handling non-UTF-8 text files.
from flawfinder.
Notepad++ thinks the file is UTF-8.
VS Code thinks the file is UTF-8.
Notepad thinks the file is UTF-8.
I think the file does comply with UTF-8.
from flawfinder.
Please run "iconv" or some other tool that does byte-by-byte checking.
I think the editors just look at a few lines, and they may accept badly formatted data anyway. Python3 is extremely picky and immediately fails any time the text isn't perfect. Workarounds are documented.
from flawfinder.
Also: if the character is just a literal 0x81 byte, that is not valid UTF-8.
from flawfinder.
My bad, it's character U+0441, I have updated the title.
Tried "iconv", no difference between original file and the converted one.
from flawfinder.
Huh. That doesn't make sense to me at all. The sequence 0xd1 0x81 seems like perfectly fine UTF-8 to me, it shouldn't give you that error message.
So we agree it shouldn't happen. But clearly it's happening anyway :-).
Can you send me a URL for a mishandled file so I can just use curl/wget to get it? Ideally make the test file as small as possible while still causing the problem. I want to reproduce the problem with the smallest possible failing test. If I can reproduce it, I should be able to fix it, or at least explain it & suggest a workaround.
from flawfinder.
Oh, weird thought: This is on Windows. Is it possible it's actually being stored as UTF-16? I doubt that's what is going on, but I'm grasping at straws and maybe this is the straw I needed :-).
from flawfinder.
Here is the link to that file: https://github.com/Kitware/CMake/blob/master/Source/CTest/cmCTestBuildHandler.cxx
I am not sure how to check if it's stored as UTF-16, I mean it's just a plain text file, I don't see any header when viewing it in hex.
from flawfinder.
This character is what you need to reproduce the issue.
from flawfinder.
You posted an image showing the file. However, I need the file contents itself. Can you post it somewhere (ideally shortened) & share the URL to it? A small snippet would be best for my purposes.
from flawfinder.
Oh, whups, you did provide a link. Thank you.
I ran it on MacOS and it worked just fine. Below is the output.
Ugh, it seems to be a Windows 10 specific thing. I don't have any of those platforms.
I want to fix it, but I have to be able to reproduce it. Any ideas?
python3 ./flawfinder.py cmCTestBuildHandler.cxx
Flawfinder version 2.0.19, (C) 2001-2019 David A. Wheeler.
Number of rules (primarily dangerous function names) in C/C++ ruleset: 222
Examining cmCTestBuildHandler.cxx
FINAL RESULTS:
cmCTestBuildHandler.cxx:6204: [2] (misc) open:
Check when opening files - can an attacker redirect it (via symlinks),
force the opening of special file type (e.g., device files), move things
around to create a race condition, control its ancestors, or change its
contents? (CWE-362).
ANALYSIS SUMMARY:
Hits = 1
Lines analyzed = 6240 in approximately 0.28 seconds (22118 lines/second)
Physical Source Lines of Code (SLOC) = 364
Hits@level = [0] 0 [1] 0 [2] 1 [3] 0 [4] 0 [5] 0
Hits@level+ = [0+] 1 [1+] 1 [2+] 1 [3+] 0 [4+] 0 [5+] 0
Hits/KSLOC@level+ = [0+] 2.74725 [1+] 2.74725 [2+] 2.74725 [3+] 0 [4+] 0 [5+] 0
Minimum risk level = 1
Not every hit is necessarily a security vulnerability.
You can inhibit a report by adding a comment in this form:
// flawfinder: ignore
Make *sure* it's a false positive!
You can use the option --neverignore to show these.
There may be other security vulnerabilities; review your code!
See 'Secure Programming HOWTO'
(https://dwheeler.com/secure-programs) for more information.
from flawfinder.
Yeah, I couldn't repro it using WSL Ubuntu. Looks like this issue is not easy to tackle, I wonder if detecting encoding using chardet before opening the file an acceptable solution?
https://stackoverflow.com/questions/36303919/python-3-0-open-default-encoding
https://peps.python.org/pep-0597/
https://chardet.readthedocs.io/en/latest/usage.html#example-using-the-detect-function
from flawfinder.
Hmm, it appears that on Windows the default encoding isn't what files use. That seems like a bug in the Windows implementation. I'd prefer flawfinder to NOT always assume UTF-8, because some systems don't use UTF-8. See: https://peps.python.org/pep-0597/ - it seems the "solution" is that people writing code are supposed to magically know what the file encoding is from users. That's rediculous. I have no magic available. I need users to tell me what the encoding is, and use the default if they don't specify something.
from flawfinder.
Hmm, it appears you're trying to process UTF-8 files, but the Windows default is NOT UTF-8, and that's the mismatch.
Try this:
python3 -X utf8 flawfinder.py ....
or set PYTHONUTF8
to 1. In a shell do this:
export PYTHONUTF8=1 # linux / macOS
set PYTHONUTF8=1 # windows
.. .then run flawfinder.
from flawfinder.
That CMake repo has Tests/RunCMake/CommandLine/cmake_depends/test_UTF-16LE.h in UTF-16 encoding, if I force it by set PYTHONUTF8=1
, I get encoding error on that file.
Is there a way to exclude certain folders that contain non-product code, i.e., in this case Tests
folder.
from flawfinder.
There isn't an --exclude
option though that's not a bad idea. However, you can expressly list just the files and/or directories to scan, so just be more explicit about it.
However: can you tell me if PYTHONUTF8=1 resolves the problem with cmCTestBuildHandler.cxx ? If it does, then we're at least making progress.
Flawfinder doesn't have a way of scanning different files with different encodings. Most software developers wouldn't want to do that. If you have to do that, I suggest making a copy, changing all the source files to some consistent encoding, and then analyzing them.
from flawfinder.
Yes, PYTHONUTF8=1
resolves problem with cmCTestBuildHandler.cxx, thanks. The suggestion to make a copy and have a consistent encoding would not work for me because test_UTF-16LE.h is meant to validate that CMake can handle UTF-16, just like compilers can handle inconsistent encoding of source files.
I guess I can workaround it by analyzing only the Source
folder instead of entire repo. Thanks a lot for your help.
from flawfinder.
Related Issues (20)
- Add a --ignore option
- Invalid helpUri generated HOT 1
- SARIF artifact location paths HOT 3
- Warn when PQExec is called with a non-constant to warn about SQL injection in PostgreSQL
- --csv option wont output hits to csv file from mac terminal
- FF1057 is missing CWE attribution in the warning text HOT 1
- Flawfinder does scan the directory with symlinks and exits quietly with error code HOT 1
- binary/hex integer literals with separators lead to parse error HOT 2
- Flawfinder reports abseil::StrCat the same as std:strcat HOT 1
- Can I Modfy more CWE? HOT 1
- SARIF output malformed due to incorrect URI, which causes GitHub upload to fail HOT 1
- Supported python versions HOT 1
- Allow skipping bad characters HOT 4
- flawfinder mis-identifies symbols named "system" as CWE-78
- Declaration of simple C++ method named "read()" triggers CWE-
- Grouping issues by vulnerability title
- Add a smell score for each file
- Improve sscanf and friend vulnerability context
- False positive when a variable is named "system"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flawfinder.