Giter Club home page Giter Club logo

pal's People

Contributors

6thimage avatar algimantasb avatar anael-seghezzi avatar aolofsson avatar damjandakic93 avatar dflicker avatar ebadi avatar gbuella avatar haneefmubarak avatar hiovidiu avatar icaven avatar jar avatar jeff-ayars avatar kfitch avatar lbguilherme avatar lfochamon avatar mafleming avatar mansourmoufid avatar mateunho avatar micheletogni avatar nplanel avatar olajep avatar peteasa avatar pfalcon avatar pmeles avatar psiegl avatar shimshir avatar twocs avatar warrenm avatar wesleyceraso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pal's Issues

Running examples

Hi,
I'm trying to run examples. But for instance:

./simple_example
Running p_wait
Running p_close (0xffffffea)
Running p_finalize(0xffffffda)

Do you know how to make it work?

Fast/Approximate math function section

It would be better to have faster/approximate math functions section in PAL, which is specially designed for running math kernel on Epiphany with selectable precision.

For example,

  • Full precision/Low performance(reference) : e.g., expf() from newlib.
  • Middle precision/Middle performance(20 ~ 100 clocks. general use) : e.g, e_approx_exp()
  • Low precision/Faster performance(~20 clocks. DSP use): e.g, e_fast_exp()
  • Short vectorized version for Middle and Low precision.

I think this design is much applicable for actual application programmers.

And here's validated approximate exp() implementation for Epiphany.

https://github.com/syoyo/parallella-playground/blob/master/math_exp/e_fast_exp.c

25 ~ 74 clocks for each exp(x) evaluation within 1.0e-5 relative error.

please help: unclear interface of p_sad8x8_f32, p_sad16x16_f32

Hi, I would like to implement the mentioned functions, but the description of the arguments is unclear to me. My guess (for the 8x8 case; for 16x16 analog):

SAD works on an input source image x with dimensions rows x cols. There is a second reference image with the same dimensions as x. m is a pointer to the upper left corner of a 8x8 block within this second image. (Therefore m is not an 8x8 array.) The result vector r is a 2D array with dimensions (rows - 7) x (cols - 7) holding the absolute differences for the possible offsets of the 8x8 block within the source image.

Please tell me if these assumptions are correct. Thank you.
Greetings, Alexander

math: Change all '32f' suffixes to 'f32'

It's all about inconsistencies between header/impl.
Having 43 implementation files, merely 38 of them suffer this issue.
One of them even has '32f' both in header and implementation, which is rather puzzling.

move function documentation to header

I passionately dislike libraries, which put the user API documentation into the c files. It clearly belongs into the header files, where the user (or users IDE) can find it, even when precompiled lib + headers are used. In addition there might be several implementations for different architectures in the future, each having its c file. Still you want a common interface definition for them maintained in a single set of headers.

This is of course a kind suggestion, but with a strong opinion ;-)

HELP: Improve documentation

Task:
-Improve documentation

Guideliens:
-documentation at the beginning of each function source file
-descriptions should be as short as possible but not shorter
-doxygen compatible
-explain basics and anything unusual about function

For:
-anyone

Completing math/dsp function list

There should be no dependencies in the PAL math library. A first priority is to remove those dependencies. Once we have a clean starting point, we will start with optimization.

One function at a time....

Expand report code size reporting?

Would this be possible?

-Build and report code size for ARM, x86, Epiphany
-Would require developers to install ARM and Epihany tool chain dependencies.

Some of the current contributions are too x86 centric, this might help bring everyone done by one level?

Image filter methods, size of output

In OpenCV, the output image is the same size as the input image. The indices for x,y in the output image corresponds to the x,y in the input.

But in Pal, the 3x3 methods have output sizes that are two pixels smaller in width and height than the input size. This is more economical on memory but does not have corresponding x,y locations. Is this the preferred method?

The size of the output images should be added to the comments and documentation for these methods.

Is p_a_inv needed ?

Is p_a_inv needed ? It has many branches and maybe not so good precision.
p_inv is already fast.

Some platforms have hardware division that is faster than software division.
I think we can either make an inline function like this

inline PAL_DIV(float a,float b){
#ifdef HAVE_DIV
    return a/b;
#else
    /* compute inverse */
    union {
        float f;
        uint32_t x;
    } u = {b};
    /* First approximation */
    u.x = 0x7EEEEBB3 - u.x;
    /* Refine */
    u.f = u.f * (2 - u.f * cur);
    u.f = u.f * (2 - u.f * cur);
    u.f = u.f * (2 - u.f * cur);
    return a*u.f
#endif
}

and use it when division is needed
or use division in every case and let the compiler to do the division

HELP: Fix core/doxygen.cfg

Task:
-create API reference manual using doxygen

Questions:
-example of best practices?
-one per dir or one per project?
-ability to create well structured linked pdf automatically?

For:
-anyone

When should sqrt and invsqrt stop iterating?

Following on from the discussion in pull request #105, should those 2 functions have a hard limit on how many iterations or should it just keep iterating until convergence?

Personally I think there should be a pair of functions instead for each of those functions, one with a limit (lower accuracy) and one that keeps iterating until an accuracy set by the user (e.g. iterate until each iteration only changes by 1% or less).

@aolofsson , it'd also be great if you could tell us which one would be better

Create ARM/x86 Neon/SMP optimized versions of the compute functions

Most of my own work will be focused on the Epiphany. but clearly the goal for the project is to make the PAL something universal. There is really nothing equivalent out there so it does make sense

Guidelines:
-POSIX (not OpenMP) to maximize portability
-Start with "good enough" C but assume that you will eventually need to use assembly...

One at a time....

Math functions that output single vs multiple values should be named differently

Besides grabbing the lowest hanging fruit ( #82 ), I actually have a criticism of the design of the math library. The majority of the library are functions that output multiple values, and but there are a few that output single values from multiple inputs (min, max, sum, etc). Especially given that this is a parallel math library, I'd argue that these single output functions are wasting valuable names better given to fully parallel operations.

For example, min and max. Presently, these functions are returning a single min or max value for all values in the input array. This contradicts other functions like abs, add, mul, and many other functions that output multiple values.

I'd recommend instead naming functions like p_min_f32 to p_min_value_f32 or p_minv_f32. That way you know from the name that it's outputting a single value instead of many.

This will then free up p_min_f32 and p_max_f32 to operate on two arrays of input, and output an array of min or max values, consistent with the rest of the functions. This also means the namespace is more open to adding other useful functions like clamp, sign, floor, and ceil to name a few.

Script for measuring performance and code size

Our goal is continuous integration in terms of build, test, and measure.

As part of that goal, we need to build a framework for measuring and displaying (tabulating) the code size and performance of all of the functions across all of the supported platforms.

The results should be published and readable in markdown in the root directory of PAL.

Suggestion: CMake

During the code development I found it very annoying to have the compiler generated files along with the tracked ones.
As I don't know how to improve it using autoconf, my suggestion is the use of CMake. Using it, it's possible to have all generated files in any other directory. Additionally, and the best part of it, it's possible to generate all kind of project files (eclipse, vs, codeblocks, makefile and etc.) on demand. You can also find all *.c files in your project instead of declaring each one in some list.

test:p_sincos Fix reference test function implementation

First of all, p_sincos has one input and two outputs, which is different from most functions in pal. It must have a proper test implementation.
Secondly, current (single) reference vector for testing purposes hold entirely wrong values, since it's calculated using log10f function.

#include <math.h>
#include "simple.h"

void generate_ref(float *out, size_t n)
{
    size_t i;

    for (i = 0; i < n; i++)
        out[i] = log10f(ai[i]); // copy paste mistake
}

Namely in this line.

Improve math test environment and vectors

The current test environment is quite primitive. It works with a set of golden vectors in a tabular text format.

Example:
pal/src/math/test/p_log10_f32.dat

Currently some of the functions are missing test vectors, and certainly all functions need more exhaustive testing...open to suggestions with respect to framework. (I am sure there is a lot out there).

I have had great success with plain text based unit testing in the past. (at least as an intermediate format).

The current pal/src/test_main.c is VERY primitive. What I would prefer not having is a personalized test function for each function that is copy pasted or auto generated by some other program (been there done that...) Since all of the functions are quite similar and math oriented, it seems that a common single test framework with input data and expected data is the way to go...

tighten constraints on parameters

In many functions there are parameters which are of type int, but whose value will always be unsigned (processor count, width, height, etc.). In order to reduce overhead by parameter validity checks (which btw. I have not seen anywhere), I suggest to change these to unsigned int.

For further optimization on bit level it might also be nice to use fixed size types from stdint.h everywhere e.g. uint32_t. This should improve portability of low level optimization.

Create a single threaded BLAS library using BLIS

As a basic building block, we need a very fast optimized linear algebra call that runs single threaded. The parallel framework will be built on top of this basic building block.

A great starting point is BLIS from the University of Texas:

https://github.com/flame/blis

Major tasks:
-Create the optimized assembly macro needed at the base (basically a 4x4 matrix multiply)
-Run BLIS through the epiphany tool chain to create the library. (sounds easy doesn't it...)

p_memcpy implementation

p_memcpy(void*, void*, int, int) needs to be implemented.

There's a lot of implementations of memcpy -- newlib has a fantastically huge one for each platform.

Some notes from the source:

 * 0. User space
 * 1. Specific to bsb/device/os
 * 2. Should be lightning fast
 * 3. Add "safety" compile switch for error checking
 * 4. Function should not call any
 * 5. Need a different call per chip, board, O/S?

and some personal observations:

  1. Of course it's userspace.
  2. Why? How? (If it's specific to a bsp/dev/os, shouldn't it be dev_ops?)
  3. This needs to be clearly defined; newlib variants can be found with fantastical memcpy implementations, optimizing for all sorts of things, and OpenBSD has it in libkern.
  4. How should this be named? Kernel style doesn't dictate those names (as I can tell). I would rather a compile flag along the lines of FAIL_FAST but others would name it __SAFE_MEMCPY.
  5. Did Candlejack come b--
  6. Should this be a special thing? (also, somewhat contradicts item 1)

Memcpy isn't a terribly special function, but gets a lot of press for being a pusher of bits.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.