Giter Club home page Giter Club logo

Comments (29)

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Did you really want BGRA?  ARGB is more likely what you want?  - ARGB is a in 
the high byte of a register, or b,g,r,a in memory.

Is this 480x270?  Is the buffer at least 518400 bytes, and stride 1920?

Original comment by [email protected] on 12 Mar 2012 at 5:54

  • Changed state: Accepted

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
[deleted comment]

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
[deleted comment]

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
I am capturing in 480x360 32BGRA. The length in bytes is 691208.  Stride is 
1920. So the conversion to I420 is OK.

I think that you are correct that I actually want ARGB to render to screen 
however I try that and it crashes in I420ToARGBRow_NEON.

The I420 frame is 259200 bytes in length. 480 x 360 with a stride of 544.

The variables to I420ToARGBRow_NEON are:

y_buf   const uint8 *   0x46e9000
u_buf   const uint8 *   0x4713300
v_buf   const uint8 *   0x471dbc0
rgb_buf uint8 * 0x4729000
width   int 480

Original comment by [email protected] on 12 Mar 2012 at 5:57

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
[deleted comment]

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
If you capture 480x360 I420, you should have 480*360=172800 bytes for the Y 
plane, 43200 bytes for U plane and 43200 bytes for V plane.
0x4713300 - 0x46E9000 = 0x2A300 = 172800.  Looks good.
0x471dbc0 - 0x4713300 = 0xA8C0 = 43200.  Also good.
You should be passing 480 for y_stride and 240 for u_stride and v_stride.
Recent versions take a dst_sample_stride=480*4 = 1920.  Sounds good.
Buffer should be 480 * 360 * 4 = 691200.  You provided 8 extra bytes, but your 
pointer is aligned, so I don't see an issue.

Try the C version instead, to see if its a Neon specific issue?
libyuv::MaskCpuFlags(~libyuv::kCpuHasNEON);

Update to a recent version - 214 preferred.  The old ConvertFromI420 didn't 
support stride.  Or try calling I420ToARGB() directly.

Original comment by [email protected] on 15 Mar 2012 at 10:11

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
I tried the C version and it works. Slow but it works.

Original comment by [email protected] on 16 Mar 2012 at 12:07

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Do you know what version of libyuv you're on?

Original comment by [email protected] on 16 Mar 2012 at 2:07

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
The latest from SVN.

Original comment by [email protected] on 17 Mar 2012 at 5:52

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Looking at the code, I don't immediately spot any issues.
At line 24 in 
http://code.google.com/p/libyuv/source/browse/trunk/source/row_neon.cc
It depends on the ability to vector load into the 2nd word of a register.  I'm 
wondering if clang that you use assembles that differently than gcc?  Is there 
a way you can do a gcc build to confirm?

Original comment by [email protected] on 20 Mar 2012 at 8:54

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Here's the supported compilers as used by Xcode 4.3.1.

(Default) Apple LLVM compiler 3.1
LLVM GCC 4.2

I've done a build using LLVM GCC 4.2 and I get EXC_BAD_ACCESS at line 86 of 
row_neon.cc.

y_buf   uint8 * 0x56db000
u_buf   uint8 * 0x5705300
v_buf   uint8 * 0x570fbc0
rgb_buf uint8 * 0x571b000
width   int 480

Original comment by [email protected] on 22 Mar 2012 at 11:08

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
line 86 is   );
... its just saying its somewhere in that assembly function.  Do you get a 
disassembly of the instruction that crashed?
Is it these ones?
vld1.u32   {d2[0]}, [%1]! 
vld1.u32   {d2[1]}, [%2]!

The idea is to load 4 bytes of U and 4 bytes of V and post increment.
If it thinks its 8 or 16, it might post increment off the end of the buffers.
As a test you could remove the !
The color would be wrong but it would confirm if its reading/incrementing too 
much.

Any Neon / clang experts out there?
This is the function in question:
#define YUVTORGB                                                               \
    "vld1.u8    {d0}, [%0]!                    \n"                             \
    "vld1.u32   {d2[0]}, [%1]!                 \n"                             \
    "vld1.u32   {d2[1]}, [%2]!                 \n"                             \
    "veor.u8    d2, d26                        \n"/*subtract 128 from u and v*/\
    "vmull.s8   q8, d2, d24                    \n"/*  u/v B/R component      */\
    "vmull.s8   q9, d2, d25                    \n"/*  u/v G component        */\
    "vmov.u8    d1, #0                         \n"/*  split odd/even y apart */\
    "vtrn.u8    d0, d1                         \n"                             \
    "vsub.s16   q0, q0, q15                    \n"/*  offset y               */\
    "vmul.s16   q0, q0, q14                    \n"                             \
    "vadd.s16   d18, d19                       \n"                             \
    "vqadd.s16  d20, d0, d16                   \n"                             \
    "vqadd.s16  d21, d1, d16                   \n"                             \
    "vqadd.s16  d22, d0, d17                   \n"                             \
    "vqadd.s16  d23, d1, d17                   \n"                             \
    "vqadd.s16  d16, d0, d18                   \n"                             \
    "vqadd.s16  d17, d1, d18                   \n"                             \
    "vqrshrun.s16 d0, q10, #6                  \n"                             \
    "vqrshrun.s16 d1, q11, #6                  \n"                             \
    "vqrshrun.s16 d2, q8, #6                   \n"                             \
    "vmovl.u8   q10, d0                        \n"/*  set up for reinterleave*/\
    "vmovl.u8   q11, d1                        \n"                             \
    "vmovl.u8   q8, d2                         \n"                             \
    "vtrn.u8    d20, d21                       \n"                             \
    "vtrn.u8    d22, d23                       \n"                             \
    "vtrn.u8    d16, d17                       \n"                             \
void I420ToARGBRow_NEON(const uint8* y_buf,
                        const uint8* u_buf,
                        const uint8* v_buf,
                        uint8* rgb_buf,
                        int width) {
  asm volatile(
    "vld1.u8    {d24}, [%5]                    \n"
    "vld1.u8    {d25}, [%6]                    \n"
    "vmov.u8    d26, #128                      \n"
    "vmov.u16   q14, #74                       \n"
    "vmov.u16   q15, #16                       \n"
  "1:                                          \n"
YUVTORGB
    "vmov.u8    d21, d16                       \n"
    "vmov.u8    d23, #255                      \n"
    "vst4.u8    {d20, d21, d22, d23}, [%3]!    \n"
    "subs       %4, %4, #8                     \n"
    "bgt        1b                             \n"
    : "+r"(y_buf),    // %0
      "+r"(u_buf),    // %1
      "+r"(v_buf),    // %2
      "+r"(rgb_buf),  // %3
      "+r"(width)     // %4
    : "r"(kUVToRB),   // %5
      "r"(kUVToG)     // %6
    : "cc", "memory", "q0", "q1", "q2", "q3", "q8", "q9",
                      "q10", "q11", "q12", "q13", "q14", "q15"
  );
}


Original comment by [email protected] on 22 Mar 2012 at 11:53

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Would it be possible to get your object file?   (attach a file)
I'd like to check the disassembly (otool -tV) matches the source.

Original comment by [email protected] on 23 Mar 2012 at 12:09

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
[deleted comment]

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Unix Executable File

Original comment by [email protected] on 23 Mar 2012 at 6:05

Attachments:

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
I was unable to obtain a disassembly.

Original comment by [email protected] on 24 Mar 2012 at 12:11

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Disassembling you binary worked, but it contains no Neon?
Are you able to point to the assembly instruction that failed?

Original comment by [email protected] on 28 Mar 2012 at 11:50

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
My apologies but that was an older binary. I've just built this one.

Original comment by [email protected] on 29 Mar 2012 at 2:20

Attachments:

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
The latest binary EXC_BAD_ACCESS lands at row_neon.cc line 64 using the latest 
source from svn. 

y_buf   const uint8 *   0x528a000
u_buf   const uint8 *   0x52a2c00
v_buf   const uint8 *   0x52a8f00
rgb_buf uint8 * 0x52b0000
width   int 352

I was unable to obtain a disassembly.

Original comment by [email protected] on 29 Mar 2012 at 2:25

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
arm7?  Does your iPhone/iPod have NEON?
iPhone3GS or iPad are required.
The same libyuv library will work, but you need to disable Neon.

Original comment by [email protected] on 29 Mar 2012 at 2:37

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Yes it is ARM7. Yes it is for iPhone 4 and iPad 2. I'm attaching a new binary 
which I've disabled NEON.

  //cpu_info_ = kCpuHasNEON | kCpuInitialized;
  cpu_info_ = kCpuInitialized;

So the last binary is the same as this one except it has NEON support built in. 
The first binary is problematic and the second works.

Original comment by [email protected] on 29 Mar 2012 at 2:52

Attachments:

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Is there anything I can do to provide more information to help resolve this?

Interestingly ConvertToI420 and I420ToBGRA are the most expensive calls in my 
iOS test app. Oddly VP8(webm) encoding and decoding are less expensive.

I have to assume this is because of the lack of use of NEON.


Original comment by [email protected] on 2 Apr 2012 at 4:24

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Hi J... sorry for slow response.

> Re  Is there anything I can do to provide more information to help resolve 
this?

Yes, the exact assembly instruction that fails and a register dump would help.

> Re  Interestingly ConvertToI420 and I420ToBGRA are the most expensive calls 
in my iOS test app. Oddly VP8(webm) encoding and decoding are less expensive.

This isn't entirely unexpected.  In my apps, color conversion can be the single 
most expensive operation.  As a whole, encoding would take more, but when you 
profile, the conversions are the slowest, and least useful functions.  Which is 
why its good to optimize them, but better to avoid conversions entirely.  Stick 
with YUV for capture and renderning if you can.

>Re I have to assume this is because of the lack of use of NEON.

No, but thats why I asked. Any iPad, or iPhone3GS or better has Neon.
Since your CPU has Neon, the problem is elsewhere.

I asked for your binary because you're using a different compiler, and I've 
seen issues with assembly between gcc and clang on a couple low level details:
1. the @ notation for alignment
Solution: I've avoided it
Downside: Potential performance loss - aligned loads may be faster
2. passing of multimedia arrays from C to inline
Solution: typedef for sized vectors 
Downside: none
3. the [] notation for accessing register offsets
unknown

Original comment by [email protected] on 14 Apr 2012 at 1:01

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Because I've heard rumor of people using libyuv with NEON successfully on iOS 
and that I don't believe maintaining the conversion routines is of any benefit 
I'm altering my code to perform capture in YUV and render in YUV as to keep 
conversion to a minimal. 

This effectively means I will only require libyuv for rotation operations.

I really appreciate your help!

Original comment by [email protected] on 16 Apr 2012 at 6:34

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Reproduced.
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 7f7f7f7f
Stack frame #00 pc 000cb67c 
/data/data/org.webrtc.videoengineapp/lib/libwebrtc-video-demo-jni.so: Routine 
I420ToARGBRow_NEON in third_party/libyuv/source/row_neon.cc:86
Stack frame #01 pc 000c75e4 
/data/data/org.webrtc.videoengineapp/lib/libwebrtc-video-demo-jni.so: Routine 
I420ToRGB565 in third_party/libyuv/source/convert_from.cc:950

    "vld1.u8    {d24}, [%5]                    \n"
  : "r"(kUVToRB),   // %5

It was passing the vector itself, rather than its address and then tried to 
load from that address.
Also affected rotate.
r246 has a fix.

Original comment by [email protected] on 21 Apr 2012 at 1:08

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024

Original comment by [email protected] on 21 Apr 2012 at 1:18

  • Changed state: Fixed

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
A related issue was found in scale.  vector constants are treated as values, so 
& is needed to get their pointer for a vld.
On x86 "m" is used to refer to constants.  The instructions will operate 
directly on memory.

r247 fixes the scale for Neon issue.

Original comment by [email protected] on 22 Apr 2012 at 3:06

  • Changed state: Started

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
r247 checked into chromium.

Original comment by [email protected] on 25 Apr 2012 at 2:59

from libyuv.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024

Original comment by [email protected] on 1 May 2012 at 2:58

  • Changed state: Fixed

from libyuv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.