parallella / pal Goto Github PK
View Code? Open in Web Editor NEWAn optimized C library for math, parallel processing and data movement
License: Apache License 2.0
An optimized C library for math, parallel processing and data movement
License: Apache License 2.0
Hi,
I'm trying to run examples. But for instance:
./simple_example
Running p_wait
Running p_close (0xffffffea)
Running p_finalize(0xffffffda)
Do you know how to make it work?
It would be better to have faster/approximate math functions section in PAL, which is specially designed for running math kernel on Epiphany with selectable precision.
For example,
expf()
from newlib.e_approx_exp()
e_fast_exp()
I think this design is much applicable for actual application programmers.
And here's validated approximate exp() implementation for Epiphany.
https://github.com/syoyo/parallella-playground/blob/master/math_exp/e_fast_exp.c
25 ~ 74 clocks for each exp(x)
evaluation within 1.0e-5 relative error.
Issue:
There's 0.0f
as a starting comparison value.
Fix:
Substitute current value with maximum float value.
Headers of those functions have just float*
, while implementation have const float*
for input vector.
In response to build fail in PR #9
Hi, I would like to implement the mentioned functions, but the description of the arguments is unclear to me. My guess (for the 8x8 case; for 16x16 analog):
SAD works on an input source image x with dimensions rows x cols. There is a second reference image with the same dimensions as x. m is a pointer to the upper left corner of a 8x8 block within this second image. (Therefore m is not an 8x8 array.) The result vector r is a 2D array with dimensions (rows - 7) x (cols - 7) holding the absolute differences for the possible offsets of the 8x8 block within the source image.
Please tell me if these assumptions are correct. Thank you.
Greetings, Alexander
I suggest to add the key word "const" to the type specifiers for parameters like the following.
Would you like to apply the advices from an article to more places in your source files?
It's all about inconsistencies between header/impl.
Having 43 implementation files, merely 38 of them suffer this issue.
One of them even has '32f' both in header and implementation, which is rather puzzling.
Would you like to replace more defines for constant values by enumerations to stress their relationships?
Actually it's a bug where two constants are named M_DIV15 and have the same value.
Issue:
Scalar values being used as vectors, eg. iterated through.
Fix:
Stop treating them like vectors 😉
This Issue tracks my contribution of p_median() to the math library.
p_sin_f32 uses 6 terms to approximate the sine function, while p_cos_f32 uses 5 terms. This results in p_cos_f32 being significantly worse than p_sin_f32.
When you run ./bootstrap, you get a bunch of things that git should ignore, and which should be added to .gitignore
.
I passionately dislike libraries, which put the user API documentation into the c files. It clearly belongs into the header files, where the user (or users IDE) can find it, even when precompiled lib + headers are used. In addition there might be several implementations for different architectures in the future, each having its c file. Still you want a common interface definition for them maintained in a single set of headers.
This is of course a kind suggestion, but with a strong opinion ;-)
Task:
-Improve documentation
Guideliens:
-documentation at the beginning of each function source file
-descriptions should be as short as possible but not shorter
-doxygen compatible
-explain basics and anything unusual about function
For:
-anyone
There should be no dependencies in the PAL math library. A first priority is to remove those dependencies. Once we have a clean starting point, we will start with optimization.
One function at a time....
Would this be possible?
-Build and report code size for ARM, x86, Epiphany
-Would require developers to install ARM and Epihany tool chain dependencies.
Some of the current contributions are too x86 centric, this might help bring everyone done by one level?
In OpenCV, the output image is the same size as the input image. The indices for x,y in the output image corresponds to the x,y in the input.
But in Pal, the 3x3 methods have output sizes that are two pixels smaller in width and height than the input size. This is more economical on memory but does not have corresponding x,y locations. Is this the preferred method?
The size of the output images should be added to the comments and documentation for these methods.
Refactoring include:
32f
to f32
suffix changeIs p_a_inv needed ? It has many branches and maybe not so good precision.
p_inv is already fast.
Some platforms have hardware division that is faster than software division.
I think we can either make an inline function like this
inline PAL_DIV(float a,float b){
#ifdef HAVE_DIV
return a/b;
#else
/* compute inverse */
union {
float f;
uint32_t x;
} u = {b};
/* First approximation */
u.x = 0x7EEEEBB3 - u.x;
/* Refine */
u.f = u.f * (2 - u.f * cur);
u.f = u.f * (2 - u.f * cur);
u.f = u.f * (2 - u.f * cur);
return a*u.f
#endif
}
and use it when division is needed
or use division in every case and let the compiler to do the division
Do we really need this function?
And can't we just let the compiler decide the rounding mode?
Reference:
https://github.com/parallella/pal/blob/master/src/math/p_ftoi.c
Task:
-create API reference manual using doxygen
Questions:
-example of best practices?
-one per dir or one per project?
-ability to create well structured linked pdf automatically?
For:
-anyone
🚧 Work in progress 🚧
Following on from the discussion in pull request #105, should those 2 functions have a hard limit on how many iterations or should it just keep iterating until convergence?
Personally I think there should be a pair of functions instead for each of those functions, one with a limit (lower accuracy) and one that keeps iterating until an accuracy set by the user (e.g. iterate until each iteration only changes by 1% or less).
@aolofsson , it'd also be great if you could tell us which one would be better
Most of my own work will be focused on the Epiphany. but clearly the goal for the project is to make the PAL something universal. There is really nothing equivalent out there so it does make sense
Guidelines:
-POSIX (not OpenMP) to maximize portability
-Start with "good enough" C but assume that you will eventually need to use assembly...
One at a time....
Besides grabbing the lowest hanging fruit ( #82 ), I actually have a criticism of the design of the math library. The majority of the library are functions that output multiple values, and but there are a few that output single values from multiple inputs (min, max, sum, etc). Especially given that this is a parallel math library, I'd argue that these single output functions are wasting valuable names better given to fully parallel operations.
For example, min and max. Presently, these functions are returning a single min or max value for all values in the input array. This contradicts other functions like abs, add, mul, and many other functions that output multiple values.
I'd recommend instead naming functions like p_min_f32
to p_min_value_f32
or p_minv_f32
. That way you know from the name that it's outputting a single value instead of many.
This will then free up p_min_f32
and p_max_f32
to operate on two arrays of input, and output an array of min or max values, consistent with the rest of the functions. This also means the namespace is more open to adding other useful functions like clamp, sign, floor, and ceil to name a few.
I can make runtest, but runtest requires an argument which is another program and what that should be is unclear. I don't know how all the gold.h stuff comes in to play.
Our goal is continuous integration in terms of build, test, and measure.
As part of that goal, we need to build a framework for measuring and displaying (tabulating) the code size and performance of all of the functions across all of the supported platforms.
The results should be published and readable in markdown in the root directory of PAL.
During the code development I found it very annoying to have the compiler generated files along with the tracked ones.
As I don't know how to improve it using autoconf, my suggestion is the use of CMake. Using it, it's possible to have all generated files in any other directory. Additionally, and the best part of it, it's possible to generate all kind of project files (eclipse, vs, codeblocks, makefile and etc.) on demand. You can also find all *.c files in your project instead of declaring each one in some list.
Claims to normalize to [-pi..pi]
Bad example:
M_NORMALIZE_RADIANS(3.141593)=-21.991148
In this PR #117 I have alternate function.
First of all, p_sincos
has one input and two outputs, which is different from most functions in pal. It must have a proper test implementation.
Secondly, current (single) reference vector for testing purposes hold entirely wrong values, since it's calculated using log10f
function.
#include <math.h>
#include "simple.h"
void generate_ref(float *out, size_t n)
{
size_t i;
for (i = 0; i < n; i++)
out[i] = log10f(ai[i]); // copy paste mistake
}
Namely in this line.
Issue:
Every value change is written to global memory pointer.
Fix:
Introduce temporary variable which is assigned to global memory pointer only once - after function operation finishes.
This Issue tracks my contribution of p_mode() to the math library.
Would you like to replace any double quotes by angle brackets around file names for include statements?
pal_math.h
defined constants to be used
The current test environment is quite primitive. It works with a set of golden vectors in a tabular text format.
Example:
pal/src/math/test/p_log10_f32.dat
Currently some of the functions are missing test vectors, and certainly all functions need more exhaustive testing...open to suggestions with respect to framework. (I am sure there is a lot out there).
I have had great success with plain text based unit testing in the past. (at least as an intermediate format).
The current pal/src/test_main.c is VERY primitive. What I would prefer not having is a personalized test function for each function that is copy pasted or auto generated by some other program (been there done that...) Since all of the functions are quite similar and math oriented, it seems that a common single test framework with input data and expected data is the way to go...
🚧 Work in progress 🚧
Refers to p_popcount.
In function definition there is one argument missing (compared to declaration and doc).
Issue to keep track of my contribution of p_median to the math library.
In many functions there are parameters which are of type int, but whose value will always be unsigned (processor count, width, height, etc.). In order to reduce overhead by parameter validity checks (which btw. I have not seen anywhere), I suggest to change these to unsigned int.
For further optimization on bit level it might also be nice to use fixed size types from stdint.h everywhere e.g. uint32_t. This should improve portability of low level optimization.
As a basic building block, we need a very fast optimized linear algebra call that runs single threaded. The parallel framework will be built on top of this basic building block.
A great starting point is BLIS from the University of Texas:
Major tasks:
-Create the optimized assembly macro needed at the base (basically a 4x4 matrix multiply)
-Run BLIS through the epiphany tool chain to create the library. (sounds easy doesn't it...)
I'm opening this issue to track my contribution of p_sort()
to the math library. PR will follow shortly.
Issue:
There's a magic number as a starting comparison value.
Fix:
Substitute current value with minimum float value.
p_memcpy(void*, void*, int, int)
needs to be implemented.
There's a lot of implementations of memcpy
-- newlib has a fantastically huge one for each platform.
Some notes from the source:
* 0. User space
* 1. Specific to bsb/device/os
* 2. Should be lightning fast
* 3. Add "safety" compile switch for error checking
* 4. Function should not call any
* 5. Need a different call per chip, board, O/S?
and some personal observations:
FAIL_FAST
but others would name it __SAFE_MEMCPY
.Memcpy isn't a terribly special function, but gets a lot of press for being a pusher of bits.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.