Comments (29)
Did you really want BGRA? ARGB is more likely what you want? - ARGB is a in
the high byte of a register, or b,g,r,a in memory.
Is this 480x270? Is the buffer at least 518400 bytes, and stride 1920?
Original comment by [email protected]
on 12 Mar 2012 at 5:54
- Changed state: Accepted
from libyuv.
[deleted comment]
from libyuv.
[deleted comment]
from libyuv.
I am capturing in 480x360 32BGRA. The length in bytes is 691208. Stride is
1920. So the conversion to I420 is OK.
I think that you are correct that I actually want ARGB to render to screen
however I try that and it crashes in I420ToARGBRow_NEON.
The I420 frame is 259200 bytes in length. 480 x 360 with a stride of 544.
The variables to I420ToARGBRow_NEON are:
y_buf const uint8 * 0x46e9000
u_buf const uint8 * 0x4713300
v_buf const uint8 * 0x471dbc0
rgb_buf uint8 * 0x4729000
width int 480
Original comment by [email protected]
on 12 Mar 2012 at 5:57
from libyuv.
[deleted comment]
from libyuv.
If you capture 480x360 I420, you should have 480*360=172800 bytes for the Y
plane, 43200 bytes for U plane and 43200 bytes for V plane.
0x4713300 - 0x46E9000 = 0x2A300 = 172800. Looks good.
0x471dbc0 - 0x4713300 = 0xA8C0 = 43200. Also good.
You should be passing 480 for y_stride and 240 for u_stride and v_stride.
Recent versions take a dst_sample_stride=480*4 = 1920. Sounds good.
Buffer should be 480 * 360 * 4 = 691200. You provided 8 extra bytes, but your
pointer is aligned, so I don't see an issue.
Try the C version instead, to see if its a Neon specific issue?
libyuv::MaskCpuFlags(~libyuv::kCpuHasNEON);
Update to a recent version - 214 preferred. The old ConvertFromI420 didn't
support stride. Or try calling I420ToARGB() directly.
Original comment by [email protected]
on 15 Mar 2012 at 10:11
from libyuv.
I tried the C version and it works. Slow but it works.
Original comment by [email protected]
on 16 Mar 2012 at 12:07
from libyuv.
Do you know what version of libyuv you're on?
Original comment by [email protected]
on 16 Mar 2012 at 2:07
from libyuv.
The latest from SVN.
Original comment by [email protected]
on 17 Mar 2012 at 5:52
from libyuv.
Looking at the code, I don't immediately spot any issues.
At line 24 in
http://code.google.com/p/libyuv/source/browse/trunk/source/row_neon.cc
It depends on the ability to vector load into the 2nd word of a register. I'm
wondering if clang that you use assembles that differently than gcc? Is there
a way you can do a gcc build to confirm?
Original comment by [email protected]
on 20 Mar 2012 at 8:54
from libyuv.
Here's the supported compilers as used by Xcode 4.3.1.
(Default) Apple LLVM compiler 3.1
LLVM GCC 4.2
I've done a build using LLVM GCC 4.2 and I get EXC_BAD_ACCESS at line 86 of
row_neon.cc.
y_buf uint8 * 0x56db000
u_buf uint8 * 0x5705300
v_buf uint8 * 0x570fbc0
rgb_buf uint8 * 0x571b000
width int 480
Original comment by [email protected]
on 22 Mar 2012 at 11:08
from libyuv.
line 86 is );
... its just saying its somewhere in that assembly function. Do you get a
disassembly of the instruction that crashed?
Is it these ones?
vld1.u32 {d2[0]}, [%1]!
vld1.u32 {d2[1]}, [%2]!
The idea is to load 4 bytes of U and 4 bytes of V and post increment.
If it thinks its 8 or 16, it might post increment off the end of the buffers.
As a test you could remove the !
The color would be wrong but it would confirm if its reading/incrementing too
much.
Any Neon / clang experts out there?
This is the function in question:
#define YUVTORGB \
"vld1.u8 {d0}, [%0]! \n" \
"vld1.u32 {d2[0]}, [%1]! \n" \
"vld1.u32 {d2[1]}, [%2]! \n" \
"veor.u8 d2, d26 \n"/*subtract 128 from u and v*/\
"vmull.s8 q8, d2, d24 \n"/* u/v B/R component */\
"vmull.s8 q9, d2, d25 \n"/* u/v G component */\
"vmov.u8 d1, #0 \n"/* split odd/even y apart */\
"vtrn.u8 d0, d1 \n" \
"vsub.s16 q0, q0, q15 \n"/* offset y */\
"vmul.s16 q0, q0, q14 \n" \
"vadd.s16 d18, d19 \n" \
"vqadd.s16 d20, d0, d16 \n" \
"vqadd.s16 d21, d1, d16 \n" \
"vqadd.s16 d22, d0, d17 \n" \
"vqadd.s16 d23, d1, d17 \n" \
"vqadd.s16 d16, d0, d18 \n" \
"vqadd.s16 d17, d1, d18 \n" \
"vqrshrun.s16 d0, q10, #6 \n" \
"vqrshrun.s16 d1, q11, #6 \n" \
"vqrshrun.s16 d2, q8, #6 \n" \
"vmovl.u8 q10, d0 \n"/* set up for reinterleave*/\
"vmovl.u8 q11, d1 \n" \
"vmovl.u8 q8, d2 \n" \
"vtrn.u8 d20, d21 \n" \
"vtrn.u8 d22, d23 \n" \
"vtrn.u8 d16, d17 \n" \
void I420ToARGBRow_NEON(const uint8* y_buf,
const uint8* u_buf,
const uint8* v_buf,
uint8* rgb_buf,
int width) {
asm volatile(
"vld1.u8 {d24}, [%5] \n"
"vld1.u8 {d25}, [%6] \n"
"vmov.u8 d26, #128 \n"
"vmov.u16 q14, #74 \n"
"vmov.u16 q15, #16 \n"
"1: \n"
YUVTORGB
"vmov.u8 d21, d16 \n"
"vmov.u8 d23, #255 \n"
"vst4.u8 {d20, d21, d22, d23}, [%3]! \n"
"subs %4, %4, #8 \n"
"bgt 1b \n"
: "+r"(y_buf), // %0
"+r"(u_buf), // %1
"+r"(v_buf), // %2
"+r"(rgb_buf), // %3
"+r"(width) // %4
: "r"(kUVToRB), // %5
"r"(kUVToG) // %6
: "cc", "memory", "q0", "q1", "q2", "q3", "q8", "q9",
"q10", "q11", "q12", "q13", "q14", "q15"
);
}
Original comment by [email protected]
on 22 Mar 2012 at 11:53
from libyuv.
Would it be possible to get your object file? (attach a file)
I'd like to check the disassembly (otool -tV) matches the source.
Original comment by [email protected]
on 23 Mar 2012 at 12:09
from libyuv.
[deleted comment]
from libyuv.
Unix Executable File
Original comment by [email protected]
on 23 Mar 2012 at 6:05
Attachments:
from libyuv.
I was unable to obtain a disassembly.
Original comment by [email protected]
on 24 Mar 2012 at 12:11
from libyuv.
Disassembling you binary worked, but it contains no Neon?
Are you able to point to the assembly instruction that failed?
Original comment by [email protected]
on 28 Mar 2012 at 11:50
from libyuv.
My apologies but that was an older binary. I've just built this one.
Original comment by [email protected]
on 29 Mar 2012 at 2:20
Attachments:
from libyuv.
The latest binary EXC_BAD_ACCESS lands at row_neon.cc line 64 using the latest
source from svn.
y_buf const uint8 * 0x528a000
u_buf const uint8 * 0x52a2c00
v_buf const uint8 * 0x52a8f00
rgb_buf uint8 * 0x52b0000
width int 352
I was unable to obtain a disassembly.
Original comment by [email protected]
on 29 Mar 2012 at 2:25
from libyuv.
arm7? Does your iPhone/iPod have NEON?
iPhone3GS or iPad are required.
The same libyuv library will work, but you need to disable Neon.
Original comment by [email protected]
on 29 Mar 2012 at 2:37
from libyuv.
Yes it is ARM7. Yes it is for iPhone 4 and iPad 2. I'm attaching a new binary
which I've disabled NEON.
//cpu_info_ = kCpuHasNEON | kCpuInitialized;
cpu_info_ = kCpuInitialized;
So the last binary is the same as this one except it has NEON support built in.
The first binary is problematic and the second works.
Original comment by [email protected]
on 29 Mar 2012 at 2:52
Attachments:
from libyuv.
Is there anything I can do to provide more information to help resolve this?
Interestingly ConvertToI420 and I420ToBGRA are the most expensive calls in my
iOS test app. Oddly VP8(webm) encoding and decoding are less expensive.
I have to assume this is because of the lack of use of NEON.
Original comment by [email protected]
on 2 Apr 2012 at 4:24
from libyuv.
Hi J... sorry for slow response.
> Re Is there anything I can do to provide more information to help resolve
this?
Yes, the exact assembly instruction that fails and a register dump would help.
> Re Interestingly ConvertToI420 and I420ToBGRA are the most expensive calls
in my iOS test app. Oddly VP8(webm) encoding and decoding are less expensive.
This isn't entirely unexpected. In my apps, color conversion can be the single
most expensive operation. As a whole, encoding would take more, but when you
profile, the conversions are the slowest, and least useful functions. Which is
why its good to optimize them, but better to avoid conversions entirely. Stick
with YUV for capture and renderning if you can.
>Re I have to assume this is because of the lack of use of NEON.
No, but thats why I asked. Any iPad, or iPhone3GS or better has Neon.
Since your CPU has Neon, the problem is elsewhere.
I asked for your binary because you're using a different compiler, and I've
seen issues with assembly between gcc and clang on a couple low level details:
1. the @ notation for alignment
Solution: I've avoided it
Downside: Potential performance loss - aligned loads may be faster
2. passing of multimedia arrays from C to inline
Solution: typedef for sized vectors
Downside: none
3. the [] notation for accessing register offsets
unknown
Original comment by [email protected]
on 14 Apr 2012 at 1:01
from libyuv.
Because I've heard rumor of people using libyuv with NEON successfully on iOS
and that I don't believe maintaining the conversion routines is of any benefit
I'm altering my code to perform capture in YUV and render in YUV as to keep
conversion to a minimal.
This effectively means I will only require libyuv for rotation operations.
I really appreciate your help!
Original comment by [email protected]
on 16 Apr 2012 at 6:34
from libyuv.
Reproduced.
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 7f7f7f7f
Stack frame #00 pc 000cb67c
/data/data/org.webrtc.videoengineapp/lib/libwebrtc-video-demo-jni.so: Routine
I420ToARGBRow_NEON in third_party/libyuv/source/row_neon.cc:86
Stack frame #01 pc 000c75e4
/data/data/org.webrtc.videoengineapp/lib/libwebrtc-video-demo-jni.so: Routine
I420ToRGB565 in third_party/libyuv/source/convert_from.cc:950
"vld1.u8 {d24}, [%5] \n"
: "r"(kUVToRB), // %5
It was passing the vector itself, rather than its address and then tried to
load from that address.
Also affected rotate.
r246 has a fix.
Original comment by [email protected]
on 21 Apr 2012 at 1:08
from libyuv.
Original comment by [email protected]
on 21 Apr 2012 at 1:18
- Changed state: Fixed
from libyuv.
A related issue was found in scale. vector constants are treated as values, so
& is needed to get their pointer for a vld.
On x86 "m" is used to refer to constants. The instructions will operate
directly on memory.
r247 fixes the scale for Neon issue.
Original comment by [email protected]
on 22 Apr 2012 at 3:06
- Changed state: Started
from libyuv.
r247 checked into chromium.
Original comment by [email protected]
on 25 Apr 2012 at 2:59
from libyuv.
Original comment by [email protected]
on 1 May 2012 at 2:58
- Changed state: Fixed
from libyuv.
Related Issues (20)
- ARGBToUVJRow_SSSE3 used but expected ARGBToUVJRow_AVX2 HOT 3
- NV12ToARGBRow_SSSE3 used, but should be NV12ToARGBRow_AVX2, based on I422ToARGBRow_AVX2 HOT 5
- I411ToARGBRow_SSSE3 used but expected AVX2. Adapt from I422ToARGBRow_AVX2 HOT 5
- I422ToYUY2Row_SSE2 - port to AVX2 HOT 1
- I422ToARGBRow_SSSE3 used; expected I422ToARGBRow_AVX2 HOT 5
- ScaleRowDown2Box_Odd_SSSE3 for odd source width subsampling.
- Convert16To8 for higher bit depth conversions
- Convert16ToF16 for higher bit depth conversions to half float.
- MJPGToARGB prototype in wrong header HOT 2
- Row_ name consistency HOT 3
- Signed int overflows in row_gcc.cc HOT 12
- test msan HOT 8
- rename MIPS_DSPR2 to DSPR2 HOT 2
- -DLIBYUV_DISABLE_X86=1 build HOT 2
- libyuv 'Source' shows old svn content - update or remove HOT 1
- SVN turn down HOT 1
- libyuv_neon.a library build failed in ios building HOT 2
- Android - android/test_runner.py ImportError: No module named dependency_manager HOT 4
- ARGBToRGB565 neon use vsri HOT 1
- ARGBToA
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libyuv.