Traditionally, David Letterman style, we should really have the top 10 reasons for wasting power in semiconductor design, but here are the five big ones.
Starting with reason #5: Lack of a power gating strategy
Leakage power is a huge proportion of total power and the only way to save leakage power (apart from low leakage cells when they can be used) is to turn off the power. Of course this doesn't just save leakage power, it saves dynamic power too. Your cell-phone battery wouldn't last very long if the transmit/receive logic was kept powered up all the time even when you weren't making a call. This is not something that can easily be automated. The design needs to be partitioned into power regions and control signals created (usually under software control) to handle the power down and restore (and retain register values if necessary). CPF and UPF devote a lot of their specifications for making sure the boundaries of blocks like this are correctly handled.
Reason #4: Poor local register enable conditions
Synthesis tools will replace recirculating muxes with clock gates. But often a register can be gated much more frequently since either the value in the register will never be used or else it is clear from some other aspect of the design that the value in the register will not change. In both these cases power will be saved by gating the clock to the register. As always, the easiest way to waste power is to do work that is not required to be done.
Reason #3: Inefficient design architecture
It is widely known that tradeoffs made at the higher levels of abstraction can result in larger impacts on performance, power and area. Choosing the number of pipleline stages in a datapath, for example, can have a major impact on power. Having one part of the chip that forces the clock frequency higher than required for the rest of the block can waste a lot of power. Almost any aspect of memory organization (size, number, type) has a big impact on power.
Reason #2: Inefficient design implementation
This is a combination of user problems and tool problems. There are many suboptimal ways to implement things, such as having high-frequency nets longer than necessary (and thus with excess capacitance). Excessively tight timing constraints during synthesis can result in higher powered cells than necessary being selected. Almost always there is a tradeoff between performance and power and demanding unnecessarily high performance or specifying unnecessarily tight constraints can result in power being wasted.
And, drum roll please, the top reason for wasting power: Missed global clock gating opportunities
Local register-level clock gating has been automated in synthesis tools (replacing recirculating muxes with a clock gate). But there are more opportunties than this, although they required that you understand the design intent and thus know when clocks must run and when they can be stopped. For example, redundant memory reads and writes (reading the same address or writing the same data to the same address) are huge wastes of power.
See Will Ruby's more extensive discussion of these issues here.