You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

  • Verifying a RISC-V in 1 Page of Code!

    This is it! This is the single page of code that Ńkos Hadnagy wrote this summer to formally verify WARP-V, an open-source RISC-V CPU core.

    Article: Building Energy-Efficient ICs from the Ground Up-risc-v-verification.jpg

    Iíve been told the average software developer writes just five lines of code per day. As my mentee this summer in Google Summer of Code, Ńkos wrote much less. You might think, as his mentor, I wouldnít be too happy about this. But I am happy. Iím ecstatic! Had he written 20K lines of code, no one would care about his work. CODE IS BAD! It carries bugs. You write it because you have to in order to get stuff done, and Ńkos got a lot done with this little bit of code. He formally verified not just a single CPU core, but a whole family of them. And thatís why, between exams, you can find Ńkos presenting his work at conferences like ORConf (video) and VSDOpen.

    In this post, Iíll explain the magic behind this page of code--a page which corresponds to 20(!!!) in SystemVerilog. Thatís for one particular implementation. This page of code results in unique SystemVerilog code for any number of different configurations of WARP-V! I guess, in a sense, Ńkos did write more than five lines per day.

    The Origins of Yet Another RISC-V
    I am the proud founder of Redwood EDA. Weíre the guys behind Our attention is focused on Transaction-Level Verilog (TL-Verilog), the technology that makes this work possible. Itís a follow-on to SystemVerilog that represents a giant leap forward in both simplicity and capability. Itís a refreshing change of scenery in a domain that has essentially lain dormant for the past 30 years.

    Back in December (2017), I needed a good showcase for TL-Verilog. So I put together a RISC-V model, and I scrambled for the Jan. 15 submission deadline for DAC (the Design Automation Conference). Iíve never gotten anything accepted into a conference before based on a week and a half of work, but everyone loves RISC-V, and the fact that I could implement one in a week and a half was kinda the point. And, as I mentioned, it wasnít just a single CPU core implementation, it was a flexible CPU core generator. More flexible than any other Iíve been able to find. That was also the point. Startups have been funded with less to show. (My slides from DAC are here: ďOvercoming RTL: The Most-Adaptable Open-Source RISC-V Core.Ē)

    Ńkosís work this summer takes this showcase to the next level, showing that TL-Verilog isnít only beneficial for flexible modeling of logic; itís also great for flexible verification modeling.

    Letís take a quick look at the design. The microarchitecture of WARP-V is nothing special. Itís pretty much straight from any college textbook. CPU design was my career, and I find it amusing that technology scaling trends and power are driving us back to these simple cores. But I digress. Hereís a high-level block diagram of WARP-V:

    Article: Building Energy-Efficient ICs from the Ground Up-risc-v-verification-2.jpg

    Whatís most unique about WARP-V is its flexible pipeline. The stages labeled in green are actually ďvirtual pipeline stages.Ē A specific implementation can control the depth of the pipeline by mapping these virtual stages into physical ones. In RISC-V, decode is very simple, so we might want decode, branch target calculation, and register read to all be done in the same stage. So we map the virtual stages @dec, @tgt, and @rd to the same physical stage. Finer-grained tweaks can be made safely with minor code changes. TL-Verilog tools turn this partitioned logic into RTL, providing the necessary flip-flops and staged signals to implement the pipeline. At a faster frequency, or for a different ISA we might choose, instead, to stretch this logic across different stages. Oh, right, yes I did say, ďfor a different ISA.Ē Yeah, you can do that. Thatís another unique thing about WARP-V, enabled by TL-Verilog, but letís not confuse matters with that.

    Formal First
    Now, lest you get the wrong impression, Ńkos wasnít just twiddling his thumbs all summer, writing half a line of code each day. He was helping to transform my slapped-together demonstration into a robust design. And to do that, we skipped traditional simulation altogether and jumped straight to formal verification, specifically, bounded model checking. In fairness, thereís more than a page of code involved here, but we didnít have to write most of it. Ńkos used an infrastructure called riscv-formal. This is from an amazing guy named Clifford Wolf, who is nearly single-handedly driving a revolution toward open-source EDA.

    Article: Building Energy-Efficient ICs from the Ground Up-risc-v-verification-3.jpg

    Riscv-formal provides the checkers to ensure conformance to the RISC-V ISA. It is depicted by the red box in the diagram. Heading into this box is the complete characterization of each retiring instruction.

    Ńkosís one page of code is the verification harness that provides this input. It collects together information about each instruction from signals that exist in different stages of the flexible pipeline. Signals must be staged the proper number of times to align them for the checks. (This staging is represented by the upper red path in the diagram.) In RTL, the staging of these signals is significant and leaves plenty of room for error, especially when the pipeline is changed and the staging must be changed to match. Here, TL-Verilog tools manage this staging. All that is left to provide is the hookup to the riscv-formal checks through a small amount of combinational logic to present the inputs in their expected form. Simply using a signal in this hookup is enough to pull it through the pipeline. The pipeline can be adjusted all you like, and the verification harness doesnít change at all!

    A Headache Averted--Loads
    But Iíve glossed over a wrinkle here. Thereís a big red loop in the diagram through the memory that I havenít explained. This is for load instructions, which are a bit problematic. Loads go off to memory and take as long as they need to find their data and return it to the core (though the current memory implementation has a parameterized fixed latency). And we canít present the load instruction to riscv-formal without the load data. While we wait for the data, other independent instructions in the shadow of the load are allowed to retire.

    Fortunately, Clifford is a bright guy, so he built riscv-formal with the flexibility to accept instructions out of order. Instructions behind the load are presented to riscv-formal as they retire, and when the load returns, we present it. In order to present it, we need to have all of the signals to present. Either we store them somewhere while we wait for the load data, or we carry them along with the load as it goes to memory and back. Our approach is the latter. In RTL, dragging these signals around is a bug-prone pain-in-the-rear. Youíll see that in TL-Verilog, itís practically free.

    In RTL, your design is just a jumble of logic and wires. A TL-Verilog model has more structure. Thereís a construct for a pipeline with stages. Instructions in WARP-V are ďtransactionsĒ that traverse a CPU pipeline. But the transactions need not be confined to a single pipeline. For loads, in WARP-V, a ďtransaction flowĒ is defined from the CPU pipeline into the memory and back. Memory access happens along this flow, which is easily defined by the statements below. First, we feed the instruction in the CPU (|fetch) pipeline into the memory pipeline with this statement in the memory pipeline:

    Article: Building Energy-Efficient ICs from the Ground Up-risc-v-verification-4.png

    Then, from the end of the memory pipeline, we feed back into the CPU pipeline into a new context within the CPU pipeline, called /original_ld.

    Article: Building Energy-Efficient ICs from the Ground Up-risc-v-verification-5.png

    I wonít interpret every letter, but the point is itís simple to define this transaction flow. And now, when load data returns into the CPU pipeline, we have direct access to any signals needed from the original load instruction that produced the data. (If I were at a podium, I would pause at this point to allow the significance of this to sink in.)

    Letís see how this is used in WARP-V. For the returning load, we must write the load data into the register file. To do this, we need the load data and the destination register index of the original load. We access these as /original_ld$ld_data and /original_ld$dest_reg. As a result of these references, $ld_data is pulled from the memory pipeline, and $dest_reg is pulled the full way around the purple path shown below from the original load instruction. Just a few statements produce all the necessary flops and staged signals; our design is well organized and less likely to contain bugs; and the pipeline, er... pipelines, remain flexible.

    Article: Building Energy-Efficient ICs from the Ground Up-risc-v-verification-6.jpg

    This same loopback flow that is valuable for the logic is even more valuable for the verification harness. While the design only needed two signals from this flow, the verification harness needs to present all information about the load instruction to riscv-formal via this flow. The first four lines of Ńkosís code (two of which only exist because of a current limitation in the tool) provide the MUX in red. (If youíd like a closer look, you can open the design. The harness is at the end.) The MUX output is the instruction presented to riscv-formal, either straight from the pipeline, or recirculated through memory for loads. The remaining code, once again, is just the hookup--purely combinational logic that sits at the input to riscv-formal. Any signals that are needed are pulled in automatically through the defined flow.

    Well, thatís it. Thatís the magic. Once you get it, RTL seems archaic. If I wasnít clear or youíd like to try this for yourself, Iíve been busy putting together tutorials and other resources in Makerchip.

    Wow. Whatís Next?
    Well, personally, Iíve got a hundred or so very exciting directions to go with this technology. Iím not sure how much of my focus will be on WARP-V going forward. Thereís a business to be made out of it for sure, and if you say ďRISC-V,Ē VCs will pop up, but for me, itís likely to remain a showcase. Iíll probably make it into a multicore to show off transaction flows even more. There are several folks planning to explore the physical implementation, among other things. The license is permissive, so youíre free to do with it as you wish (but letís collaborate). And, if you are a student, I expect there will be some great opportunities again in Google Summer of Code for 2019.

    Here are some related links:
    Ńkosís ORConf 2018 presentation video and slides
    Our VSDOpen 2018 slides and paper, in case you want to see these benefits quantified
    Clifford's riscv-formal repo
    The WARP-V repo:
    My DAC 2018 slides
    WARP-V in Makerchip:
    Makerchip (for all things TL-Verilog):