WelcomeName
Font Size:
Site Colors:
Accesskey
S - Skip navigation
1 - Home page
4 - Search
Accessibility Statement
Find Product:
Featuers

Image acquisition and graphics acceleration on a GPU

In the past, JPEG image compression was performed on either host PCs or
digital signal processors (DSPs).
Today, with the advent of graphics processors such as the TITAN and GEFORCE
series of graphics processors from NVIDIA (Santa Clara, CA, USA;
www.nvidia.com) that contain hundreds of processor cores, image compression
can be performed much faster.
Using the company’s Compute Unified Device Architecture (CUDA), developers
can now use an application programming interface (API) to build image
compression applications using C/C++.

Because the CUDA provides a software abstraction of the GPUs underlying
hardware and the baseline JPEG compression standard can be somewhat
parallelized,
the baseline JPEG compression process can be split into threads that acts as
individual programs, working in the same memory space and executing
concurrently.

Before an image can be compressed, however, it must be transferred to the
CPU’s host memory and then to the GPU memory.
To capture image data, standards such as GenICam from the European Machine
Vision Association (Barcelona, Spain) [http://www.emva.org/] provide a
generic interface for GigE Vision, USB3 Vision, CoaXPress, Camera Link HS,
Camera Link and 1394 DCAM-based cameras that allow such data acquisition to
be easily made.
When available, a GenTL Consumer interface can be used to link to the camera
manufacturer’s GenTL Producer.

Using Microsoft Windows this is provided as a Dynamic-Link Library (DLL)
with a .CTI file extension.
If there is no GenTL Producer, then it is usually mandatory to use the
native camera or a frame grabber manufacturer’s application programming
interface (API) for retrieving camera data.
Once captured, image data can be transferred to the GPU card’s multiple
Gigabyte memory using the cudaMemcpy function.
Here is an example of how this can be achieved.[
https://www.microway.com/hpc-tech-tips/cuda-host-to-device-transfers-and-dat
a-movement/
]

However, using the cudaMemcpy function is not the fastest method of
transferring image data to the GPU memory.
A higher bandwidth can be achieved between CPU memory and GPU memory by
using page-locked (or “pinned”) memory.
Since the GPU cannot access data directly from pageable host memory because
it has no control over when the host operating system may choose to move
such data,
when a data transfer from pageable host memory to device memory is invoked,
the CUDA driver must first allocate a temporary page-locked, or “pinned”,
host array,
copy the host data to the pinned array, and then transfer the data from the
pinned array to device memory.
This can be achieved by allocating pinned host memory in CUDA C/C++.
[https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda
-cc/
]

There is also a third approach to overcoming the data transfer speed between
the host and GPU memory.
Allowing a frame grabber and the GPU to share the same system memory,
eliminates the CPU memory copy to GPU memory copy time and can be achieved
using NVIDIA’s Direct-for-Video (DVP) technology.
However, because NVIDIA DVP is not currently available for general use,
BitFlow (Woburn, MA, USA) [http://www.bitflow.com/], a frame grabber
manufacturer,
has chosen to publish BFDVP (BitFlow Direct-for-Video Protocol)[
http://www.bitflow.com/technology/support-for-gpu-direct-for-video/], a
wrapper of NVIDIA DVP,
designed to enable integration of its frame grabber API with NVIDIA GPUs.

Go to Search
Contact Us