Important:OpenCL was deprecated in macOS 10.14. To create high-performance code on GPUs, use the Metal framework instead. See Metal.
Support built into Xcode in OS X v10.7 and later makes developing OpenCL applications much easier than it used to be. This chapter describes how to create an OpenCL project in Xcode. (You don’t have to regenerate OpenCL projects that are already working.)
Creating An Application That Uses OpenCL In Xcode
Download os x lion free os x lion 10.7 DMG free download Mac OSX Lion 10.7.2 DMG Free Download Clean Official Mac Store Release DVD. It is complete bootable Mac OSX Lion 10.7.2 DMG Download in Single Direct Link. The header you need (cl.hpp) isn't included on OS X systems by default, so you'll have to download it directly from the the Khronos OpenCL Registry (make sure you get the version listed under 1.2). You then just need to include this header from your code.
- To create a project that uses OpenCL in OS X v10.7 or later: Create your OpenCL project in Xcode as a new OS X project (empty is fine). Place your kernel code in one or more.cl files in your Xcode project. You can place all your kernels into a single.cl file, or you can separate them as you choose. You can also include non-kernel code that.
- OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs. Using the OpenCL API, developers can launch compute kernels written using a limited subset of the C programming language on a GPU.
To create a project that uses OpenCL in OS X v10.7 or later:
- Create your OpenCL project in Xcode as a new OS X project (empty is fine).
- Place your kernel code in one or more .cl files in your Xcode project. You can place all your kernels into a single .cl file, or you can separate them as you choose. You can also include non-kernel code that will run on the same OpenCL device as the kernel in each .cl file.Each .cl file is compiled by default into three files containing bitcode for i386, x86_64, and gpu_32 architectures. (You can change which bitcodes are generated using the OpenCL Architectures build setting.)At runtime your host application discovers the available devices and determines which of the compiled kernels to enqueue and execute. Figure 1-1 shows a very simple OpenCL project in Xcode.
- You can set the following build settings for your OpenCL apps:
- OpenCL—Build
- OpenCL Architectures. The default is that the product is built for all three architectures. The dropdown allows you to choose up to three of
-triple i386-applecl-darwin
,-triple x86_64-applecl-darwin
, and-triple gpu_32-applecl-darwin
. - OpenCL Compiler Version. The default is OpenCL C 1.1.
- OpenCL—Code Generation
- Auto-vectorizer. Choose
Yes
to turn the autovectorizer on orNo
to turn the autovectorizer off. This setting takes effect only for the CPU. - Double as single. If you set this parameter to
Yes
, the compiler treats double-precision floating-point expressions as single-precision floating-point expressions. This option is available for GPUs only. The default isNo
. - Flush denorms to zero. This Boolean controls how single- and double-precision denormalized numbers are handled. If you set this parameter to
Yes
, the compiler may flush single-precision denormalized numbers to zero; it may also flush double-precision denormalized numbers to zero if the optional extension for double-precision is supported. This is intended to be a performance hint and the OpenCL compiler can choose not to flush denorms to zero if the device supports single-precision (or double-precision) denormalized numbers (that is, if theCL_FP_DENORM
bit is not set inCL_DEVICE_SINGLE_FP_CONFIG
). This flag only applies for scalar and vector single-precision floating-point variables and computations on these floating-point variables inside a program. It does not apply to reading from or writing to image objects. The default isNo
. - Optimization Level. You can choose between several types of optimization from fastest performance to smallest code size. The default is to optimize for fastest performance.
- Relax IEEE Compliance. If you set this parameter to
Yes
, the compiler allows optimizations for floating-point arithmetic that may violate the IEEE 754 standard and the OpenCL numerical compliance requirements defined in section 7.4 for single-precision floating-point, section 9.3.9 for double-precision floating-point, and edge case behavior as defined in section 7.5 of the OpenCL 1.1 specification. This is intended to be a performance optimization. This option causes the preprocessor macro__FAST_RELAXED_MATH__
to be defined in the OpenCL program. The default isNo
. - Use MAD. If you set this parameter to
Yes
, you allow expressions of the typea * b + c
to be replaced by a Multiply-Add (MAD) instruction. If MAD is enabled, multistep instructions in the forma * b + c
are performed in a single step, but the accuracy of the results may be compromised. For example, to optimize performance, some OpenCL devices implement MAD by truncating the result of thea * b
operation before adding it toc
. The default for this parameter isNo
.
- OpenCL—PreprocessingYou can enter a space-separated list of preprocessor macros of the form
“foo”
or“foo=bar”
here if you wish.
- Place your host code in one or more .c files in your Xcode project. Compiling From the Command Line shows host code in an Xcode project.Note: When you first include your Xcode-generated header that contains the kernel declaration, your kernel will not have been compiled yet. The
mykernel.cl.h
file will be flagged as missing. Themykernel.cl.h
file is generated by Xcode when you build the application. - Link to the OpenCL framework in your project. See Adding an Existing Framework to a Project.
- Build.Because you are compiling your host and your kernel code before you run them, you can see compiler errors and warnings before you run your application.
- Run.Results are shown in the Xcode output pane as shown in Figure 1-3.
See Basic Programming Sample for a line-by-line description of the host and kernel code in the Hello World sample project.
Debugging
Here are a few hints to help you debug your OpenCL application:
- Run your kernel on the CPU first. There is no memory protection on some GPUs. If an index goes out of bounds on the GPU, it is likely to take the whole system down. If an index goes out of bounds on the CPU, it may crash the program that’s running, but it will not take the whole system down.
- You can use the
printf
function from within your kernel. - You can use the gdb debugger to look at the assembly code once you’ve built your program. See GDB website.
- On the GPU, use explicit address range checks to look for out-of-range address accesses. (Remember: there is no memory protection on some GPUs.)
Compiling From the Command Line
You can also compile and run your OpenCL application outside of Xcode.
To compile from the command line, call
openclc
.You can set the following compile line parameters:
- OpenCL Compiler VersionHeroes of might and magic iii mac download. The OpenCL C compiler version supported by the platform. To set this parameter from the command line, use:
-cl-std=CL1.1
The default is OpenCL C 1.1. - OpenCL—ArchitecturesA
StringList
specifying the list of the architectures for which the product will be built. This is usually set to a predefined build setting provided by the platform. To set this parameter from the command line, list up to three of the following, separated by whitespace:-triple i386-applecl-darwin
-triple x86_64-applecl-darwin
-triple gpu_32-applecl-darwin
For example, to compile for i386 and x86_64, your list would look like this:-triple i386-applecl-darwin
-triple x86_64-applecl-darwin
The default is that the product is built for all three architectures. - Auto-vectorizerThis switch enables or disables autovectorization of OpenCL kernels compiled for the CPU. To set this parameter from the command line, use:
-cl-auto-vectorize-enable
to enable the autovectorizer.-cl-autovectorize-disable
to disable the autovectorizer. Free google download for office 365 for macs.
The default is-cl-auto-vectorize-enable
. - Double as singleIf enabled, double-precision floating-point expressions are treated as single-precision floating-point expressions. This option is available for GPUs only. To enable this parameter from the command line, use:
-cl-double-as-single
By default, this parameter is disabled. - Flush denorms to zeroThis Boolean controls how single- and double-precision denormalized numbers are handled. If specified as a build option, the single-precision denormalized numbers may be flushed to zero; double-precision denormalized numbers may also be flushed to zero if the optional extension for double-precision is supported. This is intended to be a performance hint and the OpenCL compiler can choose not to flush denorms to zero if the device supports single-precision (or double-precision) denormalized numbers (that is, if the
CL_FP_DENORM
bit is not set inCL_DEVICE_SINGLE_FP_CONFIG
). This flag only applies for scalar and vector single-precision floating-point variables and computations on these floating-point variables inside a program. It does not apply to reading from or writing to image objects.To enable this parameter from the command line, use:-cl-denorms-are-zero
By default, this parameter is disabled. - Optimization LevelSpecifies whether to optimize for fastest performance or smallest code size.To set this parameter from the command line, use:
-Os
- optimize for smallest code size-O1
- optimize for fast performance-O2
- optimize for faster performance-O3
- optimize for fastest performance-O0
- do not optimize
The default is fast-O1
optimization. - Relax IEEE ComplianceIf enabled, allows optimizations for floating-point arithmetic that may violate the IEEE 754 standard and the OpenCL numerical compliance requirements defined in section 7.4 for single-precision floating-point, section 9.3.9 for double-precision floating-point, and edge case behavior in section 7.5 of the OpenCL 1.1 specification. This is intended to be a performance optimization. This option causes the preprocessor macro
__FAST_RELAXED_MATH__
to be defined in the OpenCL program.To enable this parameter from the command line, use:-cl-fast-relaxed-math
By default, this parameter is disabled. - Use MADIf enabled, allows expressions of the type
a * b + c
to be replaced by a Multiply-Add (MAD) instruction. If MAD is enabled, multistep instructions in the forma * b + c
are performed in a single step, but the accuracy of the results may be compromised. For example, to optimize performance, some OpenCL devices implement MAD by truncating the result of thea * b
operation before adding it toc
.To enable this parameter from the command line, use: Can i download spotify music to my mac.-cl-mad-enable
By default, this parameter is disabled. - OpenCL—PreprocessingSpace-separated list of preprocessor macros of the form
“foo”
or“foo=bar”
. To specify preprocessor macros from the command line, prefix the string of macros with:-D
Copyright © 2018 Apple Inc. All Rights Reserved. Terms of Use | Privacy Policy | Updated: 2018-06-04
Important:OpenCL was deprecated in macOS 10.14. To create high-performance code on GPUs, use the Metal framework instead. See Metal.
Tools provided on OS X let you include OpenCL kernels as resources in Xcode projects, compile them along with the rest of your application, invoke kernels by passing them parameters just as if they were typical functions, and use Grand Central Dispatch (GCD) as the queuing API for executing OpenCL commands and kernels on the CPU and GPU.
If you need to create OpenCL programs at runtime, with source loaded as a string or from a file, or if you want API-level control over queueing, see The OpenCL Specification, available from the Khronos Group at http://www.khronos.org/registry/cl/.
Concepts
In the OpenCL specification, computational processors are called devices. An OpenCL device has one or more compute units. A workgroup executes on a single compute unit. A compute unit is composed of one or more processing elements and local memory.
A Mac computer always has a single CPU. It may not have any GPUs or it may have several. The CPU on a Mac has multiple compute units, which is why it is called a multicore CPU. The number of compute units in a CPU limits the number of workgroups that can execute concurrently.
CPUs usually contain between two and eight compute units, sometimes more. A graphics processing unit (GPU) typically contains many compute units-GPUs in current Mac systems feature tens of compute units, and future GPUs may contain hundreds. To OpenCL the number of compute units is irrelevant. OpenCL considers a CPU with eight compute units and a GPU with 100 compute units each to be a single device.
The OS X v10.7 implementation of the OpenCL API facilitates designing and coding data parallel programs to run on both CPU and GPU devices. In a data parallel program, the same program (or kernel) runs concurrently on different pieces of data and each invocation is called a work item and given a work item ID. The work item IDs are organized in up to three dimensions (called an N-D range).
A kernel is essentially a function written in the OpenCL language that enables it to be compiled for execution on any device that supports OpenCL. However, a kernel differs from a function called by another programming language because when you invoke “a” kernel, what actually happens is that many instances of the kernel execute, each of which processes a different chunk of data.
The program that calls OpenCL functions to set up the context in which kernels run and enqueue the kernels for execution is known as the host application. The host application is run by OS X on the CPU. The device on which the host application executes is known as the host device. Before it runs the kernels, the host application typically:
![Mac Mac](/uploads/1/3/3/2/133274625/764495384.jpg)
- Determines what compute devices are available, if necessary.
- Selects compute devices appropriate for the application.
- Creates dispatch queues for selected compute devices.
- Allocates the memory objects needed by the kernels for execution. (This step may occur earlier in the process, as convenient.)
Note: The host device (the CPU) can itself be an OpenCL device. Both the host application and kernels may run on the same CPU.
The host application can enqueue commands to read from and write to memory objects that are also accessible by kernels. See Memory Objects in OS X OpenCL. Memory objects are used to manipulate device memory. There are two types of memory objects used in OpenCL: buffer objects and image objects. Buffer objects can contain any type of data; image objects contain data organized into pixels in a given format.
Although kernels are enqueued for execution by host applications written in C, C++, or Objective-C, a kernel must be compiled separately to be customized for the device on which it is going to run. You can write your OpenCL kernel source code in a separate file or include it inline in your host application source code.
OpenCL kernels can be:
- Compiled at compile time, then run when queued by the host application.or
- Compiled and then run at runtime when queued by the host application.or
- Run from a previously-built binary.
A work item is a parallel execution of a kernel on some data. It is analogous to a thread. Each kernel is executed upon hundreds of thousands of work items.
A workgroup is a set of work items that execute concurrently and share data. Each workgroup is executed on a compute unit.
Workgroup dimensions determine how kernels operate upon input in parallel. The application usually specifies the dimensions based on the size of the input. There are constraints; for example, there may be a maximum number of work items that can be launched for a certain kernel on a certain device.
Essential Development Tasks
As of OS X v10.7, the OpenCL development process includes these major steps:
- Identify the tasks to be parallelized.Determining how to parallelize your program effectively is often the hardest part of developing an OpenCL program. See Identifying Parallelizable Routines.
- Write your kernel functions.
- See How the Kernel Interacts With Data in OS X OpenCL.
- The Basic Kernel Code Sample shows how you can store your kernel code in a file that can be compiled using Xcode.
- Write the host code that will call the kernel(s).
- See Using Grand Central Dispatch With OpenCL for information about how the host can use GCD to enqueue the kernel.
- See Memory Objects in OS X OpenCL for information about how the host passes parameters to and retrieves results from the kernel.
- See Sharing Data Between OpenCL and OpenGL for information about how the OpenCL host can share data with OpenGL applications.
- See Controlling OpenCL / OpenGL Interoperation With GCD for information about how the OpenCL host can synchronize processing with OpenGL applications using GCD.
- See Using IOSurfaces With OpenCL for information about how the OpenCL host can use IOSurfaces to exchange data with a kernel.
- The Basic Host Code Sample shows how you can store your host code in a file that can be compiled with Xcode.
- Compile using Xcode.See Hello World!.
- Execute.
- Debug (if necessary).See Debugging.
- Improve performance (if necessary):
- If your kernel(s) will be running on a CPU, see Autovectorizer and, for suggestions about additional optimizations, see Improving Performance On the CPU.
- If your kernel(s) will be running on a GPU, see Tuning Performance On the GPU.
Opencl 2.2-3 Download Mac Os X 10.13
Opencl 2.2-3 Download Mac Os X 10.11
Copyright © 2018 Apple Inc. All Rights Reserved. Terms of Use | Privacy Policy | Updated: 2018-06-04