Array

Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

I/O Bandwidth with Tensilica Cores

I/O Bandwidth with Tensilica Cores
by Paul McLellan on 08-17-2012 at 3:00 pm
Categories: Uncategorized

It is obviously a truism that somewhere in an SoC there is something limiting a further increase in performance. One area where this is especially noticeable is when a Tensilica core is used to create a highly optimized processor for some purpose. The core performance may be boosted by a factor of 10 or even as much as 100. Once the core itself is no longer the limiting factor, I/O bandwidth to get data to and from the core often comes to the head of the line. Traditional bus-centric design just cannot handle the resulting increase in data traffic.

A long time ago processors had a single bus for everything. Modern processors separate that so that they have one or more local buses to access ROM and RAM and perhaps other memories, leaving a common bus to access peripherals. But that shared bus to access the peripherals becomes the bottleneck if the processor performance is high.

Tensilica’s Xtensa processors can have direct port I/O and FIFO queue interfaces to offload overused buses. There can be up to 1024 ports and each can have up to 1024 signals, boosting I/O bandwidth by thousands of times relative to a few conventional 32 or 64 bit buses.

But wait, there’s more. Since Tensilica’s flexible length instruction extension (FLIX) allows designers to add separate parallel execution units to handle concurrent computational tasks. Each user-defined execution unit can have its own direct I/O without affecting the bandwidth available to other parts of the processor.

While plain I/O ports are ideal for fast transfer of control and status information, Xtensa also allows designers to add FIFO-like queues. This allows the transfer of data between the processor and other parts of the system that may be producing or consuming data at different speeds. To the programmer these look just like traditional processor registers but without the bandwidth limitations of shared memory buses. Queues can sustain data rates as high as one transfer per clock cycle or 350Gb/s for each queue. Custom instructions can perform multiple queue operations per cycle so even this is not the cap on overall bandwidth from the processor core. This allows Xtensa processors to be used not just for computationally intensive tasks but for applications with extreme data rates.

It is no good adding powerful capabilities if they are too hard to use. I/O ports are declared with simple one-line declarations (or a check-box configuration option). A check-box configuration is also used to define a basic queue interface although a handful of commands can be used to create a special function queue.

Ports and queues are automatically added to the processor and, of course, are completely modeled by the Xtensa processor generator, reflected in the custom software development tools, instruction set simulator (ISS), bus functional model and EDA scripts.

A white paper with more details is here.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

Real men have fabs!
Samsung is the dominant foundry and capex is likely spent more effective in conjunction with the IDM part of the…

— Claus Aasholm on April 24, 2024
Real men have fabs!
We track many more companies than what you see in the chart. Our insight model requires input data and Siemens…

— Claus Aasholm on April 24, 2024
Real men have fabs!
Under EDA you should add Siemens, Keysight EDA, etc.

— Daniel Payne on April 24, 2024
Real men have fabs!
Curious about the last chart - why is Korea's CapEx:Revenue ratio for foundry so much lower than all the others…

— Peter Bennet on April 24, 2024
Huawei’s and SMIC’s Requirement for 5nm Production: Improving Multipatterning Productivity
Yes, with damascene the cuts are blocks. The arrangement of the two etch materials like oxide and nitride comes about…

— Fred Chen on April 23, 2024
Huawei’s and SMIC’s Requirement for 5nm Production: Improving Multipatterning Productivity
Interesting round up Fred. Given how the Damascene process works, my understanding is that you can't do cuts you have…

— nghanayem on April 23, 2024
ASML- Soft revenues & Orders – But…China 49% – Memory Improving
What does "the street" base their projections on? Would "the street" like to see ASML jacking up their prices as…

— Barnsley on April 22, 2024
ASML- Soft revenues & Orders – But…China 49% – Memory Improving
DUV will not go away, KrF sales show that. Even the I-line is still present. High-NA EUV has not been…

— Fred Chen on April 20, 2024

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

Recent Forum Threads

Recent Article Comments