Hi! <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

question about test.c about vicuna HOT 4 CLOSED

schwa1z commented on July 28, 2024

question about test.c

from vicuna.

Comments (4)

michael-platzer commented on July 28, 2024

Hi @schwa1z,

How can vicuna identify the for-loop with vector instruction and unrool it? how does the for-loop works?

The for loop is not unrolled, but since the array is rather short depending on the size of the vector registers a single iteration is enough to process the entire array. As you can see in the output, vl is set to 16, which corresponds to the length of the array, so all 16 elements are processed at once. If you increase the size of the array, it eventually won't fit into the vector registers as a whole and then you will see multiple iterations of the loop.

Does the way I compare normal c-code and vector-mode c-code correct?

Yes, your implementation without vector instructions appears to be equivalent to the vector code and you should get identical results.

from vicuna.

schwa1z commented on July 28, 2024

Hi! @michael-platzer Thanks for your patient replay, you are very nice. And there are some other questions I want to ask.

I want to implement a C algorithm with vector instruction to accelerate it, but I don't know how to use vector instruction properly. There is an example, and I want to use it to ask something.

Here is the normal C code:

    for (int i = 0; i < 1024; i++){
            greydata[i] = (RGBdata_B[i] * 28 + RGBdata_G[i] * 151 + RGBdata_R[i] * 77) >> 8 ;
	}

As you can see, it's a simple RGB2grey C algorithm.
I find some documents like riscv-v-spec-0.10 and RISC-V "V" Vector Extension Intrinsics. I tried to imitate the way you write in test.c combined with those documents to implement the C code above. Here it is:

    for (int n = sizeof(greydata); n > 0; n -= vl, src_B += vl, src_G += vl, src_R += vl, dst += vl) {
        vl            = vsetvl_e16m1(n);//set the vl

        vint16m1_t vec_B = vle16_v_i16m1(src_B, vl);//src_B is the data of Blue channal
        vint16m1_t vec_G = vle16_v_i16m1(src_G, vl);//Green
        vint16m1_t vec_R = vle16_v_i16m1(src_R, vl);//Red

        vec_B           = vmul_vx_i16m1(vec_B, 28, vl);//multiply vector with scalar
        vec_G           = vmul_vx_i16m1(vec_G, 151, vl);
        vec_R           = vmul_vx_i16m1(vec_R, 77, vl);

        vec_B           = vadd_vv_i16m1(vec_B, vec_G, vl);//add vec_G and vec_R to vec_B
        vec_B           = vadd_vv_i16m1(vec_B, vec_R, vl);

        vec_B           = vsrl_vi_i16m1(vec_B, 8, vl);//logical shift right 8 bit
        vse16_v_i16m1(dst, vec_B, vl);
    }

the data range is [0-255], I initialize the array like this:

    int16_t RGBdata_B[1024]__attribute__ ((aligned (8)));
    int16_t RGBdata_G[1024]__attribute__ ((aligned (8)));
    int16_t RGBdata_R[1024]__attribute__ ((aligned (8)));

    int16_t *src_B = RGBdata_B, *src_G = RGBdata_G, *src_R = RGBdata_R;

    int16_t greydata[1024]__attribute__ ((aligned (8)));

    int16_t *dst = greydata;

Considering that the data will be out of int8_t range, I used the int16_t data type and replace the '8' in the vector instructions as you write in test.c with '16', like vle8_v_i8m1 to vle16_v_i16m1.

so, my questions:

Does the way I initialize the array correct? Especially 'attribute ((aligned (8)))'. I have learned that it's used to align memory address, but I don't know how it affects the vector instruction work.
Does the vector mode code I write correct? Especially does the way I replace vle8_v_i8m1 with vle16_v_i16m1 correct?
I have encontered an error with the vsrl instruction, the error;

test.c:76:25: error: assigning to 'vint16m1_t' (aka '__rvv_int16m1_t') from incompatible type 'int'
        vec_B           = vsrl_vi_i16m1(vec_B, 8, vl);
                        ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~

maybe I make some mistake in the code. I refer to the vsrl_16.S in /vicuna/test/alu/. How to fix it?

Thanks a lot!

from vicuna.

michael-platzer commented on July 28, 2024

Does the way I initialize the array correct? Especially 'attribute ((aligned (8)))'. I have learned that it's used to align memory address, but I don't know how it affects the vector instruction work.

Memory alignment is a performance optimization. Memory base addresses used in vector load and store instructions should be aligned to the width of the memory interface. Note that the aligned attribute takes an alignment value in bytes, whereas the width of the memory interface is specified in bits.

Does the vector mode code I write correct? Especially does the way I replace vle8_v_i8m1 with vle16_v_i16m1 correct?

At first glance the code appears correct. Test it and you will see whether it does what you want. Note that you could avoid the vadd_vv_i16m1 instructions by using the fused multiply-add. Also, since your data is only in the range 0-255, you could use 8-bit data and do widening multiplication/multiply-add followed by a narrowing right shift.

I have encontered an error with the vsrl instruction, the error;

I cannot find that particular intrinsic, are you certain that it is actually defined? A logical right shift on signed data does not appear to make much sense. You are probably looking for vsra_vx_i16m1 if you want to stick with signed data or vsrl_vx_u16m1 for unsigned data.

from vicuna.

schwa1z commented on July 28, 2024

Thank you very much! With your advice I have fixed the problem and benefited a lot from them. So I closed it, Thank you again, Thanks.

from vicuna.

question about test.c about vicuna HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent