Caches are a notoriously hard area to get right. Too much cache and you waste too much area and leakage power, too little and the performance is not what you expect. Not to mention the power, since cache misses are a lot more expensive from a power point of view than hits. Caches are very complex these days, with multiple masters, GPUs, snooping for coherency and so on. The caches also interact very closely with decisions made about interconnect (buses, NoC etc) in non-obvious ways.
Another difficult area is software/hardware tradeoffs. In a prior version of a design, it might have been necessary to use a special handcrafted RTL block to achieve the performance necessary. But in a later process node this might be better implemented either in software on the main control processor or perhaps in software on a specialized offload processor.
So how do you make these decisions? Obviously it is too complicated to actually put all the RTL together for the entire chip just to decide if that is really the RTL you need. Besides, RTL is too slow to run a full load of software (Android for example) and these days the purpose of many SoCs is just to run the software as efficiently as possible so it is not possible to do an analysis just looking closely at the hardware without actually running realistic software scenarios.
The answer is to use virtual platforms, which can quickly be configured to swap IP blocks in and out, vary the size of the cache, switch from ARM Cortex-A15 to A-9 and so on. And all while running fast enough that you can boot the operating system, run apps, run standard benchmarks, run test software and generally perform analysis at whatever depth you want.
Then, when you have made all your decisions, you have a virtual platform ready to deliver to the software team so that they can start work in parallel with the SoC design. Since there are typically more software engineers than IC design engineers on a project these days, this is especially important. Without having a virtual platform, it is easy for software engineers to "pretend to program" since it is impossible to be effective without being able to run the code immediately.
Carbon CTO Bill Neifert's blog on this subject is here. Andy Meier's blog on CPU selection is here.