Giter Club home page Giter Club logo

ilgpu.algorithms's Introduction

ILGPU.Algorithms (! MOVED !)

Please note that the ILGPU.Algorithms library has been merged with the ILGPU repository. Refer to the ILGPU repository (https://github.com/m4rs-mt/ILGPU) for updates and new releases.

ILGPU.Algorithms Library (pre 1.0 Version)

Real-world applications typically require a standard library and a set of standard algorithms that "simply work". The ILGPU Algorithms library meets these requirements by offering a set of auxiliary functions and high-level algorithms (e.g. sorting or prefix sum). All algorithms can be run on all supported accelerator types. The CPU accelerator support is especially useful for kernel debugging.

Build instructions

ILGPU.Algorithms requires Visual Studio 2019 or higher.

Make sure to init/update the ILGPU git submodule using git submodule update --init before building the algorithms library.

License information

ILGPU.Algorithms is licensed under the University of Illinois/NCSA Open Source License. Detailed license information can be found in LICENSE.txt.

Copyright (c) 2019-2020 ILGPU Algorithms Project. All rights reserved. Copyright (c) 2016-2018 ILGPU Lightning Project. All rights reserved.

License information of required dependencies

Different parts of ILGPU.Algorithms require different third-party libraries.

Detailed copyright and license information of these dependencies can be found in LICENSE-3RD-PARTY.txt.

Credits

This work was supported by the Deutsches Forschungszentrum f�r K�nstliche Intelligenz GmbH (DFKI; German Research Center for Artificial Intelligence).

DFKI Logo

ilgpu.algorithms's People

Contributors

albosc avatar dfki-mako avatar ifzen avatar jgiannuzzi avatar m4rs-mt avatar moftz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ilgpu.algorithms's Issues

ILGPU 0.8 compatibility issue

Hi,
I experienced an error with the very last ILGPU 0.8 related to ILGPU.Algorithms.
ILGPU.Algorithms compatibility is broken by a missing method PTXCodeGenerator.AllocatePrimitive()

When can ilgpu provide parallel syntax function similar to alea GPU framework?

When can ilgpu provide parallel syntax function similar to alea GPU framework, which can support implicit conversion between variables and arrayview
like:
Alea.Parallel.GpuExtension.For(gpu, 0, Points.Count, i =>
{
xComponent[i] = xComponent[i] - minX;
yComponent[i] = yComponent[i] - minY;
zComponent[i] = zComponent[i] - minZ;
});

Custom reduce

I'm trying to write my custom reducer which basically should do: y = x1x1 + x2x2 + etc.

The reducer code looks like:

public readonly struct MyReducer : IScanReduceOperation<int>
{
	public string CLCommand { get; }
	public int Identity { get => 0; }

	public int Apply(int first, int second)
	{
		return first + second * second;
	}
	//
	public void AtomicApply(ref int target, int value)
	{
		Atomic.Add(ref target, value);
	}
}

accl.Reduce<int, MyReducer>(accl.DefaultStream, buffer.View, target.View);

I was deciphering how to implement this from the code so it could be incorrect. This works when I'm testing it with CPU accelerator.

  • CPU: for buffer = [0, 1, 2, 3] it returns 14
  • Cuda: (GeForce card) it returns 6740
  • OpenCL: (Intel card) it crashes - exception message "An internal compiler error has been detected"

Could you help me how to write the custom reducer correctly?

Many thanks! :-)

PS: Finally a good GPU C# library :-)

Feature request: Histograms and distinct lists

Histograms are quite commonly used for image analysis.
Creating distinct lists aka a sorted or unsorted list where every element is unique from an arbitrary list can also be used for everything from lossless image encoding to data analysis.

Cublas Scaling Linearly With Streams

Hi,

Just wondering if CUBLAS is really streaming correctly? Code below scales linearly with number of streams (1 to 100), and it scales linearly whether it is a 2,2,2 matrix or 3000,200,200 matrix. So irrespective of load.


                        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
                        sw.Start();
                        Parallel.ForEach(streamPackages, p =>
                        {
                            using (var blas = new CuBlas(accelerator))
                            {
                                blas.Stream = p.Stream;
                                blas.Gemm(CuBlasOperation.NonTranspose, CuBlasOperation.NonTranspose, m, n, k, 1, p.A, m, p.B, k, 0, p.C, m);
                            }
                        });

                     // wait for finished

                        foreach (var p in streamPackages)
                        {
                            p.Stream.Synchronize();
                        }
                        accelerator.Synchronize();
                 
                       sw.Stop();
                        System.Diagnostics.Debug.WriteLine( sw.ElapsedMilliseconds + "ms");

XMath.Pow, XMath.Exp etc. don't work

@m4rs-mt

if XMath.Pow or Math.Pow, and XMath.Exp etc. are used in kernel, "too many resources requested" error will be thrown if there are too many threads (e.g. 1024 in my case). No error thrown if I decrease threads (say, 256). but kernel running will hang on for ever. There should be something wrong with those complex math function in ILGPU.

Attached is code for replication the problem.

WindowsFormsApp1.zip

TanF, TanhF

Hi, I'm trying to add Tan, Tanh, Log and Exp to my code, but I get the error:

"The function 'TanF' does not have an intrinsic implementation for this backend. 'EnableAlgorithms' from the Algorithms library not invoked?"

Enable Elgorithms has been invoked. Sin/Cos worked

I tried looking at the code for ILGPU.Algorithms, but getting it to compile is a challenge. But from the CUDA documentation, it seems to be natively supported;

// https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__SINGLE.html
device ​ float tanf ( float x )
Calculate the tangent of the input argument.

cheers,
/m

Min reduction on unsigned integer produces incorrect result.

When using the Reduce algorithm with the MinUInt32 or MinUInt64 reduction, the output is not the expected value.

The following code should output Reduced[0] = 0, but instead outputs Reduced[0] = 4294967295.

using ILGPU;
using ILGPU.Algorithms;
using ILGPU.Algorithms.ScanReduceOperations;
using ILGPU.Algorithms.Sequencers;
using ILGPU.Runtime.OpenCL;
using System;
using System.Linq;

namespace AlgorithmsReduce
{
    class Program
    {
        static void Main()
        {
            using var context = new Context();
            context.EnableAlgorithms();
            using var accelerator = new CLAccelerator(context, CLAccelerator.CLAccelerators.First());
            using var buffer = accelerator.Allocate<uint>(64);
            using var target = accelerator.Allocate<uint>(1);

            accelerator.Sequence(accelerator.DefaultStream, buffer.View, new UInt32Sequencer());
            accelerator.Reduce<uint, MinUInt32>(
                accelerator.DefaultStream,
                buffer.View,
                target.View);

            var data = target.GetAsArray(accelerator.DefaultStream);
            for (int i = 0, e = data.Length; i < e; ++i)
                Console.WriteLine($"Reduced[{i}] = {data[i]}");
        }
    }
}

NOTE: Only applies to the OpenCL accelerator, affecting ILGPU.Algorithms v0.9.2 and v0.10.0-beta1.

Delete i th Element of ArrayView

Hi @m4rs-mt ,

I am wondering whether there is a way to delete an i_th Element of ArrayView. For instance, my method scale() converts the captured data frame information to the meter. However, the returned value also can be (0,0,0). In this case, I want to delete the i_th element so that this zero information is not forwarded to CPU. I couldn't find a delete function. Is there any work around?

`private static void ApplyKernel(
            Index index, /* The global thread index (1D in this case) */
            ArrayView<CameraSpacePoint> pixelArray, /* A view to a chunk of memory (1D in this case)*/
            ArrayView<Point3d> pixelArray_pt /* A view to a chunk of memory (1D in this case)*/
            )
        {
            Point3d tmp = scale(pixelArray[index]);
            if (XMath.Abs(tmp.X) > 0.0001 || XMath.Abs(tmp.Y) > 0.0001 || XMath.Abs(tmp.Z) > 0.0001)
            {
                pixelArray_pt[index] = tmp;
            }
            else
            {
                _pixelArray_pt.delete(index);_
            }
     
        }`

Problem with Algorithm.ScanExtensions

Algorithm of "ScanInclusive" is not stable. If the number of input array is more than 20000, output of "scan inclusive" can give different result if repeating calculation on the same input array for several times. It can't be used for scanning a large amount of array. Attached is sample code, I am using iLGPU beta2, and Nvidia Qudro P1000.

SimpleStructures.zip

Use Tanh PTX intrinsic on SM_75 or higher

The Tanh PTX intrinsic for float32 is available on SM_75 or higher. ILGPU.Algorithms should be modified to NOT register XMath.Tanh(float) as a replacement intrinsic.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.