Giter Club home page Giter Club logo

Comments (4)

michael-platzer avatar michael-platzer commented on July 28, 2024

Hi @schwa1z,

How can vicuna identify the for-loop with vector instruction and unrool it? how does the for-loop works?

The for loop is not unrolled, but since the array is rather short depending on the size of the vector registers a single iteration is enough to process the entire array. As you can see in the output, vl is set to 16, which corresponds to the length of the array, so all 16 elements are processed at once. If you increase the size of the array, it eventually won't fit into the vector registers as a whole and then you will see multiple iterations of the loop.

Does the way I compare normal c-code and vector-mode c-code correct?

Yes, your implementation without vector instructions appears to be equivalent to the vector code and you should get identical results.

from vicuna.

schwa1z avatar schwa1z commented on July 28, 2024

Hi! @michael-platzer Thanks for your patient replay, you are very nice. And there are some other questions I want to ask.

I want to implement a C algorithm with vector instruction to accelerate it, but I don't know how to use vector instruction properly. There is an example, and I want to use it to ask something.

Here is the normal C code:

    for (int i = 0; i < 1024; i++){
            greydata[i] = (RGBdata_B[i] * 28 + RGBdata_G[i] * 151 + RGBdata_R[i] * 77) >> 8 ;
	}

As you can see, it's a simple RGB2grey C algorithm.
I find some documents like riscv-v-spec-0.10 and RISC-V "V" Vector Extension Intrinsics. I tried to imitate the way you write in test.c combined with those documents to implement the C code above. Here it is:

    for (int n = sizeof(greydata); n > 0; n -= vl, src_B += vl, src_G += vl, src_R += vl, dst += vl) {
        vl            = vsetvl_e16m1(n);//set the vl

        vint16m1_t vec_B = vle16_v_i16m1(src_B, vl);//src_B is the data of Blue channal
        vint16m1_t vec_G = vle16_v_i16m1(src_G, vl);//Green
        vint16m1_t vec_R = vle16_v_i16m1(src_R, vl);//Red

        vec_B           = vmul_vx_i16m1(vec_B, 28, vl);//multiply vector with scalar
        vec_G           = vmul_vx_i16m1(vec_G, 151, vl);
        vec_R           = vmul_vx_i16m1(vec_R, 77, vl);

        vec_B           = vadd_vv_i16m1(vec_B, vec_G, vl);//add vec_G and vec_R to vec_B
        vec_B           = vadd_vv_i16m1(vec_B, vec_R, vl);

        vec_B           = vsrl_vi_i16m1(vec_B, 8, vl);//logical shift right 8 bit
        vse16_v_i16m1(dst, vec_B, vl);
    }

the data range is [0-255], I initialize the array like this:

    int16_t RGBdata_B[1024]__attribute__ ((aligned (8)));
    int16_t RGBdata_G[1024]__attribute__ ((aligned (8)));
    int16_t RGBdata_R[1024]__attribute__ ((aligned (8)));

    int16_t *src_B = RGBdata_B, *src_G = RGBdata_G, *src_R = RGBdata_R;

    int16_t greydata[1024]__attribute__ ((aligned (8)));

    int16_t *dst = greydata;

Considering that the data will be out of int8_t range, I used the int16_t data type and replace the '8' in the vector instructions as you write in test.c with '16', like vle8_v_i8m1 to vle16_v_i16m1.

so, my questions:

  1. Does the way I initialize the array correct? Especially 'attribute ((aligned (8)))'. I have learned that it's used to align memory address, but I don't know how it affects the vector instruction work.
  2. Does the vector mode code I write correct? Especially does the way I replace vle8_v_i8m1 with vle16_v_i16m1 correct?
  3. I have encontered an error with the vsrl instruction, the error;
test.c:76:25: error: assigning to 'vint16m1_t' (aka '__rvv_int16m1_t') from incompatible type 'int'
        vec_B           = vsrl_vi_i16m1(vec_B, 8, vl);
                        ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~

maybe I make some mistake in the code. I refer to the vsrl_16.S in /vicuna/test/alu/. How to fix it?

Thanks a lot!

from vicuna.

michael-platzer avatar michael-platzer commented on July 28, 2024
  1. Does the way I initialize the array correct? Especially 'attribute ((aligned (8)))'. I have learned that it's used to align memory address, but I don't know how it affects the vector instruction work.

Memory alignment is a performance optimization. Memory base addresses used in vector load and store instructions should be aligned to the width of the memory interface. Note that the aligned attribute takes an alignment value in bytes, whereas the width of the memory interface is specified in bits.

  1. Does the vector mode code I write correct? Especially does the way I replace vle8_v_i8m1 with vle16_v_i16m1 correct?

At first glance the code appears correct. Test it and you will see whether it does what you want. Note that you could avoid the vadd_vv_i16m1 instructions by using the fused multiply-add. Also, since your data is only in the range 0-255, you could use 8-bit data and do widening multiplication/multiply-add followed by a narrowing right shift.

  1. I have encontered an error with the vsrl instruction, the error;

I cannot find that particular intrinsic, are you certain that it is actually defined? A logical right shift on signed data does not appear to make much sense. You are probably looking for vsra_vx_i16m1 if you want to stick with signed data or vsrl_vx_u16m1 for unsigned data.

from vicuna.

schwa1z avatar schwa1z commented on July 28, 2024

Thank you very much! With your advice I have fixed the problem and benefited a lot from them. So I closed it, Thank you again, Thanks.

from vicuna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.