Real-Time Data Assessment with Parallel Computing

For online data inspection, evaluation, and control it is desirable to process the raw data as fast as possible. This results in challenging demands for bandwidth and computation power. We propose to address this issue by establishing an appropriate parallel computing infrastructure: a) CPU clusters; b) GPUs. We propose to develop an extensible set of library functions to meet the requirements of the different application fields. The goal is to develop a stack of software tools, which allow rapid development and deployment of a large range of parallel data processing algorithms. A candidate for a universal implementation language might be OpenCL, a standard for programming parallel architectures.

We propose a three layer structure: the first layer implements the core part of the stack which is a set of APIs which would automate the usage of the defined data streams and computing platforms. This shall include: the access to standard data formats, transportation to the processing units, decompression and display of results. The next layer is a collection of parallel primitives for image processing and numerical computation using parallel architectures. The top level of the framework should structure the development process. A task scheduler should distribute the computation tasks between available GPU and CPU cores. The developer is expected to provide only the processing, visualization, and quality check plugins, while everything else is automated by the framework. In parallel this task will propose suitable parallel computing platforms. The hardware needs to be highly modular to meet the very different resource requirements for the application fields. If the data transfer is the limiting factor a solution within a single standard PC will be preferable. An alternative to CPU clusters is modern graphic processors (GPU), which are being used more and more in scientific computing. Each GPU provides up to 240 cores with in sum 1 TFlops. There have been solutions published with up to 13 GPUs and 12 TFlops in a single PC. For comparison: conventional CPUs like AMDs Opterons reach up to about 4 GFlops/core. Thus a conventional setup with 16 cores and ~60 GFlops total can be accelerated by a factor of 100 or more. The challenge will be to apply the outlined framework for a parallel computing architecture to the task of real-time data assessment.