0% found this document useful (0 votes)

49 views4 pages

OpenCL to CUDA Translator Guide

The document provides instructions for installing and using SnuCL-Tr, an OpenCL to CUDA translator. It describes downloading and compiling the required software components. The translator works by taking OpenCL device code and converting it to CUDA device code. It implements OpenCL host APIs as wrapper functions around CUDA driver APIs. The document includes a sample Makefile template and shows the original and translated kernel code for a sample application.

Uploaded by

xieliwei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views4 pages

OpenCL to CUDA Translator Guide

Uploaded by

xieliwei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SnuCL-Tr: OpenCL to CUDA

Quick Start Guide

Requirements
The OpenCL to CUDA translator in SnuCL-Tr requires that you have the latest CUDA Toolkit installed. You
can download the CUDA Toolkit from [Link]

How to Install
Perform the following steps to install the OpenCL to CUDA translator in SnuCL-Tr and verify the installation.

1. Untar Source Code

Download the SnuCL-Tr source code from [Link] Untar it on your

preferred location.

$ tar xvzf [Link]

2. LLVM

You need to configure the LLVM compiler first and then compile the program. It may take a long
time to compile for LLVM.

$ cd opencl2cuda/build
$ ../[Link]/configure
$ make BUILD_EXAMPLES=1

Note: If your system already has Clang, then you need to configure and build the LLVM compiler
manually with the following flags.

CC=gcc CXX=g++

Example:

$ ../[Link]/configure CC=gcc CXX=g++

$ make CC=gcc CXX=g++ BUILD_EXAMPLES=1

3. The Runtime Library

Build the runtime library (i.e., wrapper functions) at the location below.

$ cd opencl2cuda/common/common/
$ make

As a result, the shared library will be created in the following location:

 opencl2cuda/common/lib/libsnuclOC.a

4. Set Environment Variables

There are two environment variables that have to be set to use the OpenCL to CUDA translator:
OPENCL_TO_CUDA and OPENCL_TO_CUDA_GPU_ARCH (i.e., your GPU’s compute capability). You
can check the GPU’s compute capability at [Link]

Open your .bashrc to edit.

$ vi $(HOME)/.bashrc

At the bottom of the file, insert the two lines shown below.

export OPENCL_TO_CUDA=$(HOME)/opencl2cuda
export OPENCL_TO_CUDA_GPU_ARCH=compute_xx

For example, if your GPU’s compute capability is 3.0, then modify the
OPENCL_TO_CUDA_GPU_ARCH variable as shown below.

export OPENCL_TO_CUDA_GPU_ARCH=compute_30

To apply modified environment variables, go to your home directory and execute the following
command.

$ source .bashrc

Understanding the OpenCL to CUDA Translator

In OpenCL, the host code and device code are separated. Hence, we translate them separately. The
OpenCL device code (e.g., [Link]) is translated to the CUDA device code (e.g., [Link]) by our
source-to-source translator. The OpenCL host API functions are implemented as wrapper functions. We
use CUDA driver API functions to implement the wrappers.
How to Build a Program Using the OpenCL to CUDA Translator
 Makefile Template

In sample directory, a Makefile template for the OpenCL to CUDA translator is provided with a
sample application. Makefile can be written as you deem fit, but there are four things you have
to follow to use the translator.

 Use g++ compiler

CC = g++

 Add CUDA and the runtime library path to search

-L$(CUDA_INSTALLED_PATH)/lib64 –L$(OPENCL_TO_CUDA)/common/lib

 Link CUDA and the runtime library

-lcuda –lcudart –lsnuclOC -lpthread

 Add CUDA header file path

-I$(CUDA_INSTALLED_PATH)/include

Translation Result for the Sample Application

When you build your program with the OpenCL to CUDA translator, translated source files will be named
“*.cu”. For example, when you build the sample application, “__temp_kernel.cu” will be generated. You
can open it with text-editor to see how it is translated. The figures shown below are the original device
code and the translated device code of the sample application.

 [Link] (original kernel code)

global void vecAdd(

__global int* A, __global int* B, __global int* C, const int N) {
int i = get_global_id(0);
C[i] = A[i] + B[i];
}
 __temp_kernel.cu (translated kernel code)

constant char __snucl_const_mem[16384];

extern __shared__ char __snucl_shared_mem[];
__device__ unsigned int __snucl_group_id_offs[2] = {0, 0};
__device__ int get_global_id( int index) {
switch (index){
case 0: return (blockIdx.x + __snucl_group_id_offs[0]) *
blockDim.x + threadIdx.x;
case 1: return blockIdx.y * blockDim.y + threadIdx.y;
case 2: return blockIdx.z * blockDim.z + threadIdx.z;
default: return -1;
};
}

extern “C” __global__ void vecAdd(int* A, int* B, int* C, const int N){
int id = get_global_id(0);
C[i] = A[i] + B[i];
}

Introduction to GPU Programming with CUDA
100% (1)
Introduction to GPU Programming with CUDA
40 pages
CUDA C Programming FAQs and Solutions
No ratings yet
CUDA C Programming FAQs and Solutions
12 pages
Understanding CUDA Programming Basics
No ratings yet
Understanding CUDA Programming Basics
116 pages
Verilog Nonblocking Assignments Demystified
100% (2)
Verilog Nonblocking Assignments Demystified
3 pages
Top 25 AI Agent Solutions Overview
No ratings yet
Top 25 AI Agent Solutions Overview
8 pages
Introduction To OpenCL Programming (201005)
No ratings yet
Introduction To OpenCL Programming (201005)
132 pages
OpenCL Programming Guide
No ratings yet
OpenCL Programming Guide
19 pages
QBASIC Programs for Class IX Students
No ratings yet
QBASIC Programs for Class IX Students
10 pages
Object-Oriented Erlang Extension
No ratings yet
Object-Oriented Erlang Extension
67 pages
Machine Learning Workload Essentials
No ratings yet
Machine Learning Workload Essentials
2 pages
LLVM Code Generation Overview
No ratings yet
LLVM Code Generation Overview
4 pages
Modern C++ Tutorial: C++11/14/17/20
No ratings yet
Modern C++ Tutorial: C++11/14/17/20
92 pages
SFML Class Overview and Descriptions
No ratings yet
SFML Class Overview and Descriptions
3 pages
Mastering CMake for C++ Projects
No ratings yet
Mastering CMake for C++ Projects
296 pages
Comparing LoRA and QLoRA Techniques
No ratings yet
Comparing LoRA and QLoRA Techniques
5 pages
Understanding Inverted Indexes in Search Engines
No ratings yet
Understanding Inverted Indexes in Search Engines
38 pages
MOELoRA: Efficient Fine-tuning for Medical LLMs
No ratings yet
MOELoRA: Efficient Fine-tuning for Medical LLMs
11 pages
Parallel Programming Concepts in IS1200
No ratings yet
Parallel Programming Concepts in IS1200
34 pages
Visual Odometry and Egomotion Techniques
No ratings yet
Visual Odometry and Egomotion Techniques
40 pages
Oxford CUDA Programming Overview
No ratings yet
Oxford CUDA Programming Overview
21 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
21 pages
CUDA Memory Types Overview
No ratings yet
CUDA Memory Types Overview
27 pages
Crash N' Burn: Writing Linux Application Fault Handlers
100% (4)
Crash N' Burn: Writing Linux Application Fault Handlers
25 pages
Compiler Design Overview Guide
No ratings yet
Compiler Design Overview Guide
34 pages
TensorFlow Basics for Beginners
No ratings yet
TensorFlow Basics for Beginners
35 pages
CUDA C Programming Guide PDF
No ratings yet
CUDA C Programming Guide PDF
301 pages
Automatic Image Captioning Techniques
No ratings yet
Automatic Image Captioning Techniques
26 pages
TypeScript Fundamentals and Exercises
No ratings yet
TypeScript Fundamentals and Exercises
49 pages
RNNs, Transformers, and GANs Explained
No ratings yet
RNNs, Transformers, and GANs Explained
24 pages
Vector Databases and ANN Techniques
No ratings yet
Vector Databases and ANN Techniques
82 pages
Multithreading in Python Explained
No ratings yet
Multithreading in Python Explained
52 pages
Overview of SHRDLU Chatbot System
No ratings yet
Overview of SHRDLU Chatbot System
6 pages
Bigloo
No ratings yet
Bigloo
334 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
Register Allocation and Graph Coloring
No ratings yet
Register Allocation and Graph Coloring
50 pages
PyCUDA: GPU Matrix Addition Tutorial
No ratings yet
PyCUDA: GPU Matrix Addition Tutorial
16 pages
The Practice of Programming Guide
100% (1)
The Practice of Programming Guide
4 pages
GPU Programming Guide for BU Cluster
No ratings yet
GPU Programming Guide for BU Cluster
18 pages
Introduction to C Programming Language
100% (1)
Introduction to C Programming Language
174 pages
LSTM and GRU: Illustrated Guide
No ratings yet
LSTM and GRU: Illustrated Guide
15 pages
Understanding MVC in TypeScript Frameworks
No ratings yet
Understanding MVC in TypeScript Frameworks
36 pages
Webots ROS2 Tutorial Overview
No ratings yet
Webots ROS2 Tutorial Overview
15 pages
C++ Programming Basics Guide
100% (1)
C++ Programming Basics Guide
383 pages
Understanding AI and AGI Differences
No ratings yet
Understanding AI and AGI Differences
9 pages
IREE Compiler Code Generation Overview
No ratings yet
IREE Compiler Code Generation Overview
31 pages
UM0892 User Manual: STM32 ST-LINK Utility Software Description
No ratings yet
UM0892 User Manual: STM32 ST-LINK Utility Software Description
54 pages
Vision-Language Modeling Overview
No ratings yet
Vision-Language Modeling Overview
76 pages
Introduction to CUDA C Programming
No ratings yet
Introduction to CUDA C Programming
76 pages
CUDA C Programming Guide
100% (1)
CUDA C Programming Guide
552 pages
CPP Basic Syntax
No ratings yet
CPP Basic Syntax
4 pages
CUDA Programming Fundamentals Guide
No ratings yet
CUDA Programming Fundamentals Guide
37 pages
Custom CUDA Kernels with Numba in Python
No ratings yet
Custom CUDA Kernels with Numba in Python
19 pages
Introduction to CUDA Programming
No ratings yet
Introduction to CUDA Programming
24 pages
GPU Programming
No ratings yet
GPU Programming
85 pages
CUDA and OpenCL for Multi-Core Programming
No ratings yet
CUDA and OpenCL for Multi-Core Programming
167 pages
What CUDA Stands For and Its Basics
No ratings yet
What CUDA Stands For and Its Basics
42 pages
Introduction to CUDA Programming
No ratings yet
Introduction to CUDA Programming
50 pages
CUDA Vector Addition Guide
No ratings yet
CUDA Vector Addition Guide
38 pages
AMD GPU Programming Tutorial
No ratings yet
AMD GPU Programming Tutorial
3 pages
Stack Operations Implementation Guide
No ratings yet
Stack Operations Implementation Guide
33 pages
Formal Specification Techniques Overview
No ratings yet
Formal Specification Techniques Overview
9 pages
Comprehensive List of Informatica Functions
No ratings yet
Comprehensive List of Informatica Functions
8 pages
Shell Programming Basics in CMPSC 311
No ratings yet
Shell Programming Basics in CMPSC 311
14 pages
High Performance Computing Concepts Explained
No ratings yet
High Performance Computing Concepts Explained
2 pages
CI/CD Quiz Review and Insights
No ratings yet
CI/CD Quiz Review and Insights
4 pages
Understanding Mutex: A Practical Guide
No ratings yet
Understanding Mutex: A Practical Guide
6 pages
CSE301 Homework 2 Overview
No ratings yet
CSE301 Homework 2 Overview
2 pages
Pursuing CS: A Path to Machine Learning
No ratings yet
Pursuing CS: A Path to Machine Learning
2 pages
Java I/O and Generic Array Handling
No ratings yet
Java I/O and Generic Array Handling
165 pages
Interactive Sorting Algorithm Visualizer
No ratings yet
Interactive Sorting Algorithm Visualizer
33 pages
DynamoDB Integration for Logstash Pipeline
No ratings yet
DynamoDB Integration for Logstash Pipeline
3 pages
C++ File I/O: Streams and Operations
No ratings yet
C++ File I/O: Streams and Operations
34 pages
Oracle API Invocation Error: ORA-06531
No ratings yet
Oracle API Invocation Error: ORA-06531
3 pages
MSBI Index Structures Explained
No ratings yet
MSBI Index Structures Explained
45 pages
Self-Quiz Review for CS 4403 Unit 7
No ratings yet
Self-Quiz Review for CS 4403 Unit 7
1,674 pages
OOAD MCQs for Unified Process
100% (4)
OOAD MCQs for Unified Process
85 pages
Class 11 Computer Science Assessment
No ratings yet
Class 11 Computer Science Assessment
3 pages
To-Do List App: HTML, CSS, JavaScript
No ratings yet
To-Do List App: HTML, CSS, JavaScript
6 pages
Salesforce VisualForce Pages Developers Guide
No ratings yet
Salesforce VisualForce Pages Developers Guide
802 pages
BCA Course: Analysis of Algorithms
No ratings yet
BCA Course: Analysis of Algorithms
2 pages
8086 Assembly Language Tutorial Part 2
No ratings yet
8086 Assembly Language Tutorial Part 2
28 pages
JSON Exception in MiuiMultiWindowUtils
No ratings yet
JSON Exception in MiuiMultiWindowUtils
18 pages
Plotting Decision Regions with mlxtend
No ratings yet
Plotting Decision Regions with mlxtend
5 pages
C Language Data Types Overview
No ratings yet
C Language Data Types Overview
7 pages
Flutter: Cross-Platform UI Development Guide
No ratings yet
Flutter: Cross-Platform UI Development Guide
11 pages
ECS7005P Coursework Quiz Overview
No ratings yet
ECS7005P Coursework Quiz Overview
4 pages
Asymptotic Notation and Linked Lists in Python
No ratings yet
Asymptotic Notation and Linked Lists in Python
24 pages
VisionFund Tanzania Job Openings
No ratings yet
VisionFund Tanzania Job Openings
8 pages
Understanding RDF in the Semantic Web
No ratings yet
Understanding RDF in the Semantic Web
14 pages

OpenCL to CUDA Translator Guide

Uploaded by

OpenCL to CUDA Translator Guide

Uploaded by

SnuCL-Tr: OpenCL to CUDA

Quick Start Guide

1. Untar Source Code

Download the SnuCL-Tr source code from [Link] Untar it on your

$ tar xvzf [Link]

$ ../[Link]/configure CC=gcc CXX=g++

3. The Runtime Library

As a result, the shared library will be created in the following location:

4. Set Environment Variables

Open your .bashrc to edit.

Understanding the OpenCL to CUDA Translator

 Use g++ compiler

 Add CUDA and the runtime library path to search

 Link CUDA and the runtime library

 Add CUDA header file path

Translation Result for the Sample Application

 [Link] (original kernel code)

__global__ void vecAdd(

__constant__ char __snucl_const_mem[16384];

You might also like

global void vecAdd(

constant char __snucl_const_mem[16384];