5.Hybridizer HOWTO — Printf and Builtins

The concept of intrinsics allows extensions and more control on how the code is generated. We also extend this concept for existing methods for which we do not have control on the code. The equivalent of the attribute is described in a builtin file. This allows the use of Console.Out.Write and generate printf. It also allows us to use System.Math.Exp which would be replaced by exp from cmath.


[EntryPoint("TestPrintf")]
public void testPrintf()
{
    Console.Out.WriteLine(
            "Comment from Thread {0} Block {1}",
        threadIdx.x, blockIdx.x);
}

[EntryPoint("TestExp")]
public void TestExp()
{
    exp = System.Math.Exp(1.0);
}

6.Hybridizer HOWTO — Resident Memory

From one kernel call to another, we might want to have some data resident on the device (of course this mainly applies when device memory is physically different from host memory). This is done using an interface: IResidentArray, and some attributes. This way, we can dramatically reduce the amount of memcpy and restrict to the minimal needed, still using automated memory management.

7.Hybridizer HOWTO — Virtual Functions

Hybridizer supports virtual functions. If an implementation/override of a virtual needs to be available on the GPU, it has to be flagged with a Kernel attribute.


public interface ISimple
{
    int f();
}

public class Answer : ISimple
{
    [Kernel]
    public int f()
    {
        return 42 ;
    }
}

public class Other : ISimple
{
    [Kernel]
    public int f()
    {
        return 12;
    }
}

8.Hybridizer HOWTO — Generics

Virtual functions come with a significant performance penalty. In order to overcome this, we map generics to templates. The generated source code can then be inlined, and the flexibility of objects can still be used with performance.

Expm1 GFLOPS GCFLOPS usage
Local 975 538 92%
Dispatch 478 263 45%
peak 1174 587

Template concepts in C++ are not expressed, as compiler tells whether the type is compliant or not. In dot net, the concept is expressed by constraints on the generic type. The following example illustrated.


[HybridTemplateConcept]
public interface IMyArray {
    double this[int index] { get; set; }
}

[HybridRegisterTemplate(Specialize=typeof(MyAlgorithm<MyArray>))]
public struct MyArray : IMyArray
{
    double[] _data;
    [Kernel] public double this[int index] {
        get { return _data[index]; }
        set { _data[index] = value; }
    }
}

public class MyAlgorithm<T> where T : struct, IMyArray
{
    T a, b;
    [Kernel] public void Add(int n) {
        for (int k = threadIdx.x + blockDim.x * blockIdx.x;
            k < n; k += blockDim.x * gridDim.x)
            a[k] += b[k];
    }
}

Using this approach, we restore performances at a level very similar to performances we obtain without any polymorphism.

Expm1 GFLOPS GCFLOPS usage
Local 975 538 92%
Dispatch 478 263 45%
Generics 985 544 93%
peak 1174 587