WelcomeName
Font Size:
Site Colors:
Accesskey
S - Skip navigation
1 - Home page
4 - Search
Accessibility Statement
Find Product:
Featuers

Color space conversion on a GPU

Color data can be reduced by transforming the RGB images into the YUV color
space (or more accurately the Y'CbCr color space where Y’ represents
luminance and the U and V components represent chroma or color difference
values.
Y'CbCr values can be derived by subtracting the Y’ value from the R and B
components of the original image using a simple formula.

YUV supports different conversion and subsampling modes known YUV (4:4:4),
YUV (4:2:2), YUV (4:2:0) and YUV (4:1:1).
Unfortunately, the notation used to describe these conversion modes is not
intuitively obvious since the subsampling patterns are not described in
terms of how pixels are subsampled in horizontal and vertical directions.

Generally, the YUV (4:4:4) format, which samples each YUV component equally,
is not used in lossy JPEG image compression schemes since the chrominance
difference channels (Cr and Cb) can be sampled at half the sample rate of
the luminance without any noticeable image degradation.
In YUV (4:2:2), U and V are sampled horizontally at half the rate of the
luminance while in YUV (4:2:0), Cb and Cr are sub-sampled by a factor of 2
in both the vertical and horizontal directions.

Perhaps the most commonly used mode, for JPEG image compression, the YUV
(4:2:2) mode, can thus reduce the amount of data in the image by one-third
over the original RGB image before any image compression occurs while
providing little visual difference to the original image. Source code to
perform this conversion in C++ can be found here.[
https://www.codeproject.com/Articles/402391/RGB-to-YUV-conversion-with-diffe
rent-chroma-sampli
]

To perform within a GPU this color space conversion (and other functions
associated with baseline JPEG compression), a knowledge of the architecture
of the graphics processor is required.
Suffice to say that threads (the smallest sequence of programmed
instructions that can be managed independently by the scheduler) are
organized in blocks.
These blocks are then executed by a multiprocessing unit (MPU).
To perform high-speed calculations, data must be organized efficiently in
memory so that the block of threads can operate autonomously.
Thus, fetching data between blocks will slow performance as will the use of
IF branches.

Bayer interpolation, color balancing and color space conversion can all be
performed on the GPU.
To perform these tasks, the image in the GPU is split into a number of
blocks during which they are unpacked from 8-bit to 32-bit integer format
(the native CUDA data type).
For each block, pixels are organized in 32 banks of 8 pixels as this fits
the shared memory architecture in the GPU.
This allows 4 pixels to be fetched simultaneously by the processor cores of
the GPU (stream processors) with each thread reading 4 pixels and processing
one pixel at a time.

In such implementations, it can be assumed that the color balance required
will be consistent over time especially if lighting conditions are
controlled.
By capturing a template image into host memory, a look-up-table can be
created that can then be uploaded to the GPU memory to apply a scaling
factor to each pixel so that the resultant image has an equal amount of each
color.
After this task is performed by the GPU, each pixel is converted to a 32-bit
floating point value and converted to the YUV color space.
Blocks are then saved to the GPU global memory.

Go to Search
Contact Us