![](http://www.altimesh.com/wp-content/uploads/2016/03/gtc-2017-poster-thumb-1.png)
When submitting small tasks to the GPU, grid scheduling and synchronization costs may be much higher than computations, even on a CPU. In this case, the benefit of GPU computing is lost. Leveraging runtime compilation, we illustate an approach that generates source code to replace a list of library API calls into a single kernel […]
![](http://www.altimesh.com/wp-content/uploads/2016/03/gtc-2017-poster-thumb-1.png)