Y:/Rsh/Matlab/cuda/cuda_hist.h File Reference

Contains definition of C functions for 1D and 2D histogram calculation implemented on the GPU. More...

This graph shows which files directly or indirectly include this file:


Classes

struct  cudaHistOptions
 The structure defines execution options and is used by histogram functions. More...

Functions

double cudaHista (float *src, float *hist, int length, int bins, cudaHistOptions *p_options=NULL, bool device=false)
 Calculates a 1D histogram based on the first method described in:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.
double cudaHistb (float *src, float *hist, int length, int bins, cudaHistOptions *p_options=NULL, bool device=false)
 Calculates a 1D histogram based on the second method described in:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.
double cudaHist_Approx (float *src, float *hist, int length, int bins, cudaHistOptions *p_options=NULL, bool device=false)
 Calculates a 1D histogram based on method described in:
R. Shams and N. Barnes, "Speeding up mutual information computation using NVIDIA CUDA hardware," Proc. Digital Image Computing: Techniques and Applications (DICTA), Adelaide, Australia, Dec. 2007, pp. 555-560.
void cudaHist2Da (float *src1, float *src2, float *hist, int length, int xbins, int ybins, cudaHistOptions *p_options=NULL, bool device=false)
 Calculates a 2D (joint) histogram based on the first method described in:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.
void cudaHist2Db (float *src1, float *src2, float *hist, int length, int xbins, int ybins, cudaHistOptions *p_options=NULL, bool device=false)
 Calculates a 2D (joint) histogram based on the second method described in:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.
void cudaHist2D_Approx (float *src1, float *src2, float *hist, int length, int xbins, int ybins, cudaHistOptions *p_options=NULL, bool device=false)
 Calculates a 2D (joint) histogram based on the second method described in:
R. Shams and N. Barnes, "Speeding up mutual information computation using NVIDIA CUDA hardware," Proc. Digital Image Computing: Techniques and Applications (DICTA), Adelaide, Australia, Dec. 2007, pp. 555-560.

Detailed Description

Contains definition of C functions for 1D and 2D histogram calculation implemented on the GPU.

cuda_hist.h

Contains definition of C functions for 1D and 2D histogram calculation implemented on the GPU. The methods are based on the following two publications:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.
R. Shams and N. Barnes, "Speeding up mutual information computation using NVIDIA CUDA hardware," Proc. Digital Image Computing: Techniques and Applications (DICTA), Adelaide, Australia, Dec. 2007, pp. 555-560.


Function Documentation

void cudaHist2D_Approx ( float *  src1,
float *  src2,
float *  hist,
int  length,
int  xbins,
int  ybins,
cudaHistOptions p_options = NULL,
bool  device = false 
)

Calculates a 2D (joint) histogram based on the second method described in:
R. Shams and N. Barnes, "Speeding up mutual information computation using NVIDIA CUDA hardware," Proc. Digital Image Computing: Techniques and Applications (DICTA), Adelaide, Australia, Dec. 2007, pp. 555-560.

Parameters:
src1 Pointer to the input array where the data is stored.
src2 Pointer to the input array where the data is stored.
hist Pointer to the output array where the computed histogram is to be stored. Output array must be allocated and freed by the caller.
length Number of the input array's elements.
bins Number of the histogram bins or output array's elements.
p_options A structure which defines the execution configuration.
device A flag which indicates whether input/output arrays are allocated on the host (CPU) memory or the device (GPU) memory.
Remarks:
Integrating the 2D histrogram along the x-axis gives the 1D histogram of src1 and along the y-axis gives the 1D histogram of src2.

When the function is being called inside a loop or multiple times, for best performance allocate the input and output arrays on device memory to avoid unnecessary allocation, memory transfers and deallocation by the function.

When device flag is set, the caller needs to allocate memory for hist but does not need to initialize the array. The initialization will be done by the function itself.

Input data must be normalized between 0 and 1. The behavior of the function for values outside this range is undefined and is most likely to cause memory corruption.

Returns:
The execution time in milliseconds excluding any time spent in allocating input data from host to global memory and storing the results back to the host memory. The time spent in creating and initializing any internal objects is considered.
See also:
cudaHist_Approx
As the name suggestes an approximation to the histogram is calculated by this function. The ratio of the histogram bins are approximated by the method not the actual number of elements in each bin. As such, the method is an approximation of the probability mass function (pmf) of the input data. If you are interested in the exact histogram information you should use cudaHist2Da or cudaHist2Db. Refer to the paper for more information.

void cudaHist2Da ( float *  src1,
float *  src2,
float *  hist,
int  length,
int  xbins,
int  ybins,
cudaHistOptions p_options = NULL,
bool  device = false 
)

Calculates a 2D (joint) histogram based on the first method described in:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.

Parameters:
src1 Pointer to the input array where the data is stored.
src2 Pointer to the input array where the data is stored.
hist Pointer to the output array where the computed histogram is to be stored. Output array must be allocated and freed by the caller.
length Number of the input array's elements. Both input arrays must have the same length.
xbins Number of the histogram bins along the x-axis of the 2D histogram.
ybins Number of the histogram bins along the y-axis of the 2D histogram.
p_options A structure which defines the execution configuration.
device A flag which indicates whether input/output arrays are allocated on the host (CPU) memory or the device (GPU) memory.
Remarks:
Integrating the 2D histrogram along the x-axis gives the 1D histogram of src1 and along the y-axis gives the 1D histogram of src2.

Size of hist, allocated by the caller, must be sizeof(float)*xbins*ybins.

When the function is being called inside a loop or multiple times, for best performance allocate the input and output arrays on device memory to avoid unnecessary allocation, memory transfers and deallocation by the function.

When device flag is set, the caller needs to allocate memory for hist but does not need to initialize the array. The initialization will be done by the function itself.

C wrapper function that calculates a 2D histogram with any number of bins based on cudaHista.

See also:
cudaHista

void cudaHist2Db ( float *  src1,
float *  src2,
float *  hist,
int  length,
int  xbins,
int  ybins,
cudaHistOptions p_options = NULL,
bool  device = false 
)

Calculates a 2D (joint) histogram based on the second method described in:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.

Parameters:
src1 Pointer to the input array where the data is stored.
src2 Pointer to the input array where the data is stored.
hist Pointer to the output array where the computed histogram is to be stored. Output array must be allocated and freed by the caller.
length Number of the input array's elements. Both input arrays must have the same length.
xbins Number of the histogram bins along the x-axis of the 2D histogram.
ybins Number of the histogram bins along the y-axis of the 2D histogram.
p_options A structure which defines the execution configuration.
device A flag which indicates whether input/output arrays are allocated on the host (CPU) memory or the device (GPU) memory.
Remarks:
Integrating the 2D histrogram along the x-axis gives the 1D histogram of src1 and along the y-axis gives the 1D histogram of src2.

Size of hist, allocated by the caller, must be sizeof(float)*xbins*ybins.

When the function is being called inside a loop or multiple times, for best performance allocate the input and output arrays on device memory to avoid unnecessary allocation, memory transfers and deallocation by the function.

When device flag is set, the caller needs to allocate memory for hist but does not need to initialize the array. The initialization will be done by the function itself.

Input data must be normalized between 0 and 1. The behavior of the function for values outside this range is undefined and is most likely to cause memory corruption.

C wrapper function that calculates a 2D histogram with any number of bins based on cudaHistb.

See also:
cudaHistb

double cudaHist_Approx ( float *  src,
float *  hist,
int  length,
int  bins,
cudaHistOptions p_options = NULL,
bool  device = false 
)

Calculates a 1D histogram based on method described in:
R. Shams and N. Barnes, "Speeding up mutual information computation using NVIDIA CUDA hardware," Proc. Digital Image Computing: Techniques and Applications (DICTA), Adelaide, Australia, Dec. 2007, pp. 555-560.

Parameters:
src Pointer to the input array where the data is stored. The input values must be between 0 and 1.
hist Pointer to the output array where the computed histogram is to be stored. Output array must be allocated and freed by the caller.
length Number of the input array's elements.
bins Number of the histogram bins or output array's elements.
p_options A structure which defines the execution configuration.
device A flag which indicates whether input/output arrays are allocated on the host (CPU) memory or the device (GPU) memory.
Remarks:
When the function is being called inside a loop or multiple times, for best performance allocate the input and output arrays on device memory to avoid unnecessary allocation, memory transfers and deallocation by the function.

When device flag is set, the caller needs to allocate memory for hist but does not need to initialize the array. The initialization will be done by the function itself.

Input data must be normalized between 0 and 1. The behavior of the function for values outside this range is undefined and is most likely to cause memory corruption.

Returns:
The execution time in milliseconds excluding any time spent in allocating input data from host to global memory and storing the results back to the host memory. The time spent in creating and initializing any internal objects is considered.
See also:
cudaHista, cudaHistb
As the name suggestes an approximation to the histogram is calculated by this function. The ratio of the histogram bins are approximated by the method not the actual number of elements in each bin. As such, the method is an approximation of the probability mass function (pmf) of the input data. If you are interested in the exact histogram information you should use cudaHista or cudaHistb. Refer to the paper for more information.

double cudaHista ( float *  src,
float *  hist,
int  length,
int  bins,
cudaHistOptions p_options = NULL,
bool  device = false 
)

Calculates a 1D histogram based on the first method described in:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.

Parameters:
src Pointer to the input array where the data is stored. The input values must be between 0 and 1.
hist Pointer to the output array where the computed histogram is to be stored. Output array must be allocated and freed by the caller.
length Number of the input array's elements.
bins Number of the histogram bins or output array's elements.
p_options A structure which defines the execution configuration.
device A flag which indicates whether input/output arrays are allocated on the host (CPU) memory or the device (GPU) memory.
Remarks:
When the function is being called inside a loop or multiple times, for best performance allocate the input and output arrays on device memory to avoid unnecessary allocation, memory transfers and deallocation by the function.

When device flag is set, the caller needs to allocate memory for hist but does not need to initialize the array. The initialization will be done by the function itself.

Input data must be normalized between 0 and 1. The behavior of the function for values outside this range is undefined and is most likely to cause memory corruption.

Returns:
The execution time in milliseconds excluding any time spent in allocating input data from host to global memory and storing the results back to the host memory. The time spent in creating and initializing any internal objects is considered.
C wrapper function that calculates a histogram with any number of bins using a method for synchronizing updates to histogram memory by simulating atomic operations in the GPU's shared memory. The function provides superior performance compared to cudaHistb, if the input data has a uniform or normal distribution and performs worse that cudaHistb if the distribution is degenerate (i.e. all elements or most of them are the same). Refer to the paper for more information.

double cudaHistb ( float *  src,
float *  hist,
int  length,
int  bins,
cudaHistOptions p_options = NULL,
bool  device = false 
)

Calculates a 1D histogram based on the second method described in:
R. Shams and R. A. Kennedy, "Efficient histogram algorithms for NVIDIA CUDA compatible devices," Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), Gold Coast, Australia, Dec. 2007, pp. 418-422.

Parameters:
src Pointer to the input array where the data is stored. The input values must be between 0 and 1.
hist Pointer to the output array where the computed histogram is to be stored. Output array must be allocated and freed by the caller.
length Number of the input array's elements.
bins Number of the histogram bins or output array's elements.
p_options A structure which defines the execution configuration.
device A flag which indicates whether input/output arrays are allocated on the host (CPU) memory or the device (GPU) memory.
Remarks:
When the function is being called inside a loop or multiple times, for best performance allocate the input and output arrays on device memory to avoid unnecessary allocation, memory transfers and deallocation by the function.

When device flag is set, the caller needs to allocate memory for hist but does not need to initialize the array. The initialization will be done by the function itself.

Input data must be normalized between 0 and 1. The behavior of the function for values outside this range is undefined and is most likely to cause memory corruption.

Returns:
The execution time in milliseconds excluding any time spent in allocating input data from host to global memory and storing the results back to the host memory. The time spent in creating and initializing any internal objects is considered.
C wrapper function that calculates a histogram with any number of bins using a method for separating updates to histogram memory into multiple partial histograms. The function has a lower performance compared to cudaHista unless the input data distribution is degenerate or close to degenerate (i.e. many inputs are the same or fall within the same bin). Refer to the paper for more information.


Generated on Fri Aug 8 16:17:38 2008 for mi_example by  doxygen 1.5.4