Comments (4)
Hi @schwa1z,
How can vicuna identify the for-loop with vector instruction and unrool it? how does the for-loop works?
The for loop is not unrolled, but since the array is rather short depending on the size of the vector registers a single iteration is enough to process the entire array. As you can see in the output, vl
is set to 16, which corresponds to the length of the array, so all 16 elements are processed at once. If you increase the size of the array, it eventually won't fit into the vector registers as a whole and then you will see multiple iterations of the loop.
Does the way I compare normal c-code and vector-mode c-code correct?
Yes, your implementation without vector instructions appears to be equivalent to the vector code and you should get identical results.
from vicuna.
Hi! @michael-platzer Thanks for your patient replay, you are very nice. And there are some other questions I want to ask.
I want to implement a C algorithm with vector instruction to accelerate it, but I don't know how to use vector instruction properly. There is an example, and I want to use it to ask something.
Here is the normal C code:
for (int i = 0; i < 1024; i++){
greydata[i] = (RGBdata_B[i] * 28 + RGBdata_G[i] * 151 + RGBdata_R[i] * 77) >> 8 ;
}
As you can see, it's a simple RGB2grey C algorithm.
I find some documents like riscv-v-spec-0.10 and RISC-V "V" Vector Extension Intrinsics. I tried to imitate the way you write in test.c combined with those documents to implement the C code above. Here it is:
for (int n = sizeof(greydata); n > 0; n -= vl, src_B += vl, src_G += vl, src_R += vl, dst += vl) {
vl = vsetvl_e16m1(n);//set the vl
vint16m1_t vec_B = vle16_v_i16m1(src_B, vl);//src_B is the data of Blue channal
vint16m1_t vec_G = vle16_v_i16m1(src_G, vl);//Green
vint16m1_t vec_R = vle16_v_i16m1(src_R, vl);//Red
vec_B = vmul_vx_i16m1(vec_B, 28, vl);//multiply vector with scalar
vec_G = vmul_vx_i16m1(vec_G, 151, vl);
vec_R = vmul_vx_i16m1(vec_R, 77, vl);
vec_B = vadd_vv_i16m1(vec_B, vec_G, vl);//add vec_G and vec_R to vec_B
vec_B = vadd_vv_i16m1(vec_B, vec_R, vl);
vec_B = vsrl_vi_i16m1(vec_B, 8, vl);//logical shift right 8 bit
vse16_v_i16m1(dst, vec_B, vl);
}
the data range is [0-255], I initialize the array like this:
int16_t RGBdata_B[1024]__attribute__ ((aligned (8)));
int16_t RGBdata_G[1024]__attribute__ ((aligned (8)));
int16_t RGBdata_R[1024]__attribute__ ((aligned (8)));
int16_t *src_B = RGBdata_B, *src_G = RGBdata_G, *src_R = RGBdata_R;
int16_t greydata[1024]__attribute__ ((aligned (8)));
int16_t *dst = greydata;
Considering that the data will be out of int8_t range, I used the int16_t data type and replace the '8' in the vector instructions as you write in test.c with '16', like vle8_v_i8m1 to vle16_v_i16m1.
so, my questions:
- Does the way I initialize the array correct? Especially 'attribute ((aligned (8)))'. I have learned that it's used to align memory address, but I don't know how it affects the vector instruction work.
- Does the vector mode code I write correct? Especially does the way I replace vle8_v_i8m1 with vle16_v_i16m1 correct?
- I have encontered an error with the vsrl instruction, the error;
test.c:76:25: error: assigning to 'vint16m1_t' (aka '__rvv_int16m1_t') from incompatible type 'int'
vec_B = vsrl_vi_i16m1(vec_B, 8, vl);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
maybe I make some mistake in the code. I refer to the vsrl_16.S in /vicuna/test/alu/. How to fix it?
Thanks a lot!
from vicuna.
- Does the way I initialize the array correct? Especially 'attribute ((aligned (8)))'. I have learned that it's used to align memory address, but I don't know how it affects the vector instruction work.
Memory alignment is a performance optimization. Memory base addresses used in vector load and store instructions should be aligned to the width of the memory interface. Note that the aligned
attribute takes an alignment value in bytes, whereas the width of the memory interface is specified in bits.
- Does the vector mode code I write correct? Especially does the way I replace vle8_v_i8m1 with vle16_v_i16m1 correct?
At first glance the code appears correct. Test it and you will see whether it does what you want. Note that you could avoid the vadd_vv_i16m1
instructions by using the fused multiply-add. Also, since your data is only in the range 0-255, you could use 8-bit data and do widening multiplication/multiply-add followed by a narrowing right shift.
- I have encontered an error with the vsrl instruction, the error;
I cannot find that particular intrinsic, are you certain that it is actually defined? A logical right shift on signed data does not appear to make much sense. You are probably looking for vsra_vx_i16m1
if you want to stick with signed data or vsrl_vx_u16m1
for unsigned data.
from vicuna.
Thank you very much! With your advice I have fixed the problem and benefited a lot from them. So I closed it, Thank you again, Thanks.
from vicuna.
Related Issues (20)
- Vicuna accepts instructions for which source registers are not valid.
- Wrong result generated by multiply unit (probably control logic related) HOT 1
- Wrong operand for `vwmacc(u|us|su).vx`
- narrowing instructions are never popped from the instruction queue HOT 1
- Vicuna + Ibex and WFI
- No way to clear a cache error?
- fail to set VREG_W=2048 HOT 2
- Floating point support.
- Combinatorial Loop Alert while Generating bitstream for vicuna using CV32E40X as a scalar core
- Illegal configuration created by config.mk HOT 2
- Asking for help about extending MEM_W to 64 bit HOT 1
- Error in Synthesizing the vicuna on Genesys2 board HOT 4
- Question about alignment and SRecord HOT 2
- Signal stability issue on result interface HOT 4
- Suggestion for vectorizing MaxPool and Convolution Layer HOT 1
- Rounding issue for `vasub(u).(vv|vx)`
- Tail-undisturbed policy violation for comparison instructions.
- Masking not working
- `vslidedown.(vx|vi)` issue when VLMAX is exceeded
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vicuna.