Today’s systems often present a massive amount of very fast data at the front end that needs to be sampled and decimated quickly, typical of a system with a lot of data channels in play like satellite radio or cable head-end systems. Sample rates run into the gigahertz range, putting them outside the range of FPGA clock speeds if the constraint is one sample per clock.
Parallelism comes to the rescue in a Synopsys FFT IP implementation. In state-of-the-art FPGAs, there is plenty of room to create parallel computational blocks, coordinating operations across multiple inputs. Instead of forcing a single block to run faster to keep up with data, a parallel approach allows data to be sampled and processed faster.
The key to this is the Radix2 multipath delay commutator, a modular architecture which keeps the pipeline in sync between data elements. Flow control is implemented without a big timing penalty and reduction in throughput.
The following chart illustrates a simple case of what parallelism can achieve on a relatively small FFT. When I asked him what the architecture is capable of, Chris Eddington mentioned this IP can do a 16k point FFT operating on 32 parallel inputs, but keep in mind there is some reduction in system clock frequency as more parallel channels are stitched together with flow control.
Besides sampling faster, parallelism also decreases computational latency. The cost of this approach is of course area and multiplier utilization, but with 1120 to 3600 DSP slices in a Xilinx Virtex-7, this still fits very comfortably.
Of course, both Altera and Xilinx have capable FFT IP blocks, but there are a few key differences in the Synopsys implementation beyond the parallelism features. The Synphony MC tools are vendor independent and can target various FPGAs. The flow integrates with MATLAB Simulink for designers who prefer to work in high level architecture. Not every designer is an FFT expert, so being able to operate in high level tools is a plus.
The flow also instantiates RTL and System C, giving designers the flexibility and visibility needed to integrate the code into their system and tune things if necessary. Rather than being a “black box” implementation, this allows designers to simulate performance and power using other tools.
Speaking of tuning and power, one other way to use a parallel approach would be to consume reasonably fast data in lower power. Instead of going for the peak sample rates possible, parallel channels would allow a good sample rate at a lower FPGA clock speed.
You can read more insights on this FFT architecture from Chris and the Synopsys team in their article:
“Multi-Gigahertz FPGA Signal Processing”.