You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

  • HLS with ARM and FPGA Technologies Boosts SoC Performance

    The way SoC size and complexity are increasing; new ways of development and verification are also evolving with innovative automated tools and environment for SoC development and optimization. IP based SoC development methodology has proved to be the most efficient for large SoCs. This needs collaboration among multiple players including IP developers, SoC vendors, EDA tool providers, foundries, FPGA providers, and others.

    ARM connected community has more than 1200 partners and ARM TechCon is one of the best forums to learn about new innovations in IP development and SoC integration. Although I couldn’t attend the conference, I came across a presentation made by Hardent on how to boost performance from ‘C’ software to extremely high (Sky-high) level with hardware acceleration. The methodology uses SoC with ARM’s ACP (Accelerator Coherency Port) and ACE (AXI Coherency Extension) interfaces and Xilinx FPGA technologies.

    Article: The Scariest Graph I've Seen Recently-c_to_sky.jpg

    The hard-IP acceleration is targeted towards particular applications and comes from co-processors and accelerators fixed in silicon. The soft-IP acceleration is more generic in nature and is scalable; achieved through programmable logic customized according to the application need. Both hard and soft IP are needed to optimize the SoC.

    Article: The Scariest Graph I've Seen Recently-code_annotate.jpg

    Above is an example of code annotation in ‘C’ program which can direct the HLS (High-Level Synthesis) tool to synthesize the ‘for’ loop into pipeline architecture. The pipelining increases throughput and resource utilization, thus enhancing performance of the function. Similarly, there are various types of memories that balance between throughput and capacity. Appropriate memory is used for an application to interface between software and hardware.

    At the top level, the system looks like a combination of processing system, programmable logic, and an interface between them that can be best implemented with ARM’s AMBA (Advanced Microcontroller Bus Architecture) for efficient data movement. An important consideration for high-performance and high-throughput data movement is lower latency with coherent interfaces.
    Article: The Scariest Graph I've Seen Recently-coherency.jpg
    This is an example of a soft-IP directly interfacing with cache memory through AMBA cache coherent interfaces, ACP or ACE. The benefit of using coherent interfaces to access data in cache is limited by the capacity of the cache sub-system. Regular (non-cache-coherent) AXI interfaces will still provide comparable latencies for sufficiently large data sets. Several mechanisms are used to avoid cache misses and also to save power. The function for acceleration must be chosen carefully that can provide performance gain in data processing as against bottleneck in data movement.

    The above procedure is described in most simplistic manner. In practice it requires a lot of hardware as well as software expertise. Availability of ARM processors and interfaces, HLS tools, and tools for partitioning software and hardware has reduced the development effort by a large extent. Yet there are other pieces such as drivers and other hardware to handle data movement that need to be integrated to complete the SoC.

    Xilinx has a new SDSoC development environment that can be used to optimize and deploy programmable SoCs much easily. The SDSoC front-end is an Eclipse based C/C++ IDE and the back-end can call upon many hardware design tools.
    Article: The Scariest Graph I've Seen Recently-sdsoc.jpg
    Hardent recommends this flow using SDSoC to quickly optimize the custom hardware components. The application can run on any ARM Cortex-A/R processor; both bare-metal and Linux applications are supported. The profiling tool integrated into the IDE interfaces with non-invasive ARM debug components built into an SoC. ARM CoreSight technology provides excellent debug and trace system. The hardware estimates can be optimized through iteration over micro-architecture and macro-architecture.

    This flow provides an environment in which a complete system can be described in C/C++, migrate appropriate functions to soft-IP by using HLS, and integrate the soft-IP into the system. The processor and the programmable logic are tightly integrated with AMBA interfaces into the SoC.

    Hardent is an active member of VESA (Video Electronics Standard Association) and MIPI (Mobile Industry Processor Interface) Alliance and provides IP for display. Hardent also provides several training courses for SoC development based on ARM processors, Xilinx Zynq and HLS. The latest in the offering is “Embedded C/C++ SDSoC Development Environment and Methodology”.

    The SDSoC appears to be an excellent, efficient environment for SoC development and optimization. See this, less than 4 minutes video at Xilinx website HERE.


    Pawan Kumar Fangaria
    Founder & President at www.fangarias.com