This convention enables the individual developer to make smart choices about memory management that minimize the number of memory transfers. It also allows the user the maximum flexibility regarding which of the various memory transfer mechanisms offered by the CUDA runtime is used, e.g. synchronous or asynchronous memory transfers, zero-copy and pinned memory, etc.
The most basic steps involved in using NPP for processing data is as follows:
cudaMemCpy(...)
cudaMemCpy(...)
Throughout NPP there are a number of functions that require the use of host pointers. E.g. various <Primitiv>GetBufferSize(...) functions. Those functions compute the minimum size of (scratch memory) buffer that some primitives require. This buffer size is returned via a host pointer. Since these buffers are allocated via CUDA runtime functions, it would make no sense to place those size values in device memory by default.
In addition to the flavor suffix, all NPP functions are prefixed with by the letters "npp". Primitives belonging to NPP's image-processing module add the letter "i" to the npp prefix, i.e. are prefixed by "nppi". Similarly signal-processing primitives are prefixed with "npps".
The general naming scheme is:
npp<module info><PrimitiveName>_<data-type info>[_<additional flavor info>](<parameter list>)
The data-type information uses the same names as the Basic NPP Data Types. For example the data-type information "8u" would imply that the primitive operates on Npp8u data.
If a primitive consumes different type data from what it produces, both types will be listed in the order of consumed to produced data type.
Details about the "additional flavor information" is provided for each of the NPP modules, since each problem domain uses different flavor information suffixes.