Greetings Readers !!!
Once we are familiar with the basic concepts of clock tree synthesis (refer CTS - part1 and CTS - part 2 blogs), now it is the time to discuss on various types of clock tree structures and to understand which structure is the most suitable to build a clock tree. In this blog, readers can gain information on types of clock tree structure and how to implement a clock tree in your design. Finally there is a section describing which reports to check on completion of the CTS stage.
Clock Structure Considerations
Before discussing types of clock structures, let us have a look on deciding factors of a good clock structure. Below are the few parameters which helps to choose an appropriate clock structure.
i. Skew
A clock structure should be build in order to achieve zero skew ideally.
Practically zero skew is not possible thus choose a clock structure in which we can attain as minimum skew as possible.
ii. Delay Balancing
An uneven clock structure can impose different delays for the sink pins belonging to the same skew group. This can lead to more uncertainty in skew calculation.
Clock path delay, requirement of routing resources, and power consumption are all increased with more insertion delay.
It is highly important to build a clock structure with balanced fanout loading and insertion delay in order to achieve clock latency target.
iii. On-Chip Variation (OCV)
OCV plays a very important role in skew calculation. This also gives rise to uncertainty in clock delays.
Clock path OCV can be controlled by implementing a clock structure which has minimum amount of clock cells in uncommon clock path.
Uncommon clock path is the point in a clock path from where the launch clock path and the capture clock path becomes separate.
In short, a longer common path reduces the effect of OCV.
iv. Routability and Power Consumption of Clock Structure
Clock structure should be such that tool should be able to route clock path without any shorts or open in routes.
Around 5% of the total routing resources are consumed by clock tree and around 30%-40% of total power is consumed by clock tree.
Thus routability of the clock structure cannot be ignored while deciding for the clock implementation.
Clock Tree Structures
There are various types of clock tree structures proposed till date in various researches. Each clock tree structure has its own pros and cons. In this section we will discuss most widely used clock tree structures in the industry.
i. Clock Mesh :
Clock mesh structure can be divided into three sections namely - Global tree or Mesh drivers, Clock mesh network, and Local tree. An example of clock mesh is shown in figure 1. In clock mesh structure, clock path is common till global tree thus making this structure more robust against on-chip variations.
Main advantages of using clock mesh structure are listed below.
Most accurate skew and delay balancing
All the sink pins has almost same route lengths
More robust to OCV as common clock path (till the input of clock mesh) is longer
Major disadvantages of clock mesh structure are:
It requires relatively more routing resource, increases congestion in the design.
Power consumption is more in clock mesh as lots of mesh buffers are required to drive the mesh.
ii. H-Tree Clock Structure :
This is the most popular clock structure used in industry. The connection of clock cell forms a shape of alphabetical letter H hence it is termed as H-tree clock structure. An ideal H-tree structure has capability to achieve zero skew clock tree. A typical H-tree structure is shown in figure 2.
As it can be seen from above figure, there is a central huge buffer which drives the tree further. Central buffer is connected to four other buffers and tree goes on like this till sink pin. In H-tree structure, sub-tree is approximately equal to the square root of the length of its parent tree. Also the width of metal layer is more near the central buffer and it reduces as we move towards the sink pin.
It is impractical to achieve an ideal or pure H-tree structure because registers are not evenly placed in reality. Also there are routing blockages and route guides used in the design which makes it impossible to create a pure H-tree. By pure H-tree it means all the branches of a tree must be of equal length and should have the same number of buffers in each sub-branch. Major advantages of H-tree clock structure are listed below.
A well balanced tree structure
A pure H-tree with evenly balanced load can achieve zero skew
Require relatively less routing resources compared to clock mesh structure
Improved power consumption as compared to clock mesh structure
Few disadvantages of H-tree structure are.
Less resistant to on-chip variations
If tree size grows longer then insertion delay increases and skew is likely to degrade
Huge central driver requires more area and power
The drawbacks of H-tree can be overcome up to a great extent by integrating fish-bone structure along with H-tree structure. In a fish-bone structure, a single buffer can drive multiple registers in place of a single buffer driving a single register as in case of H-tree. A typical fish-bone structure looks like as shown in figure 3.
Both the structures shown in figure 2 and figure 3 can be combined in order to achieve desired CTS targets. There are other clock trees as well proposed by various researchers. Like X-tree structure in which clock nets has to be routed diagonally and they crosses each other which makes it impractical to implement. Most widely used clock structures used in industry are H-tree, fish bone structure and a clock mesh structure.
Clock Tree Implementation
Clock trees are build or implemented in 3 steps namely clock tree synthesis, clock tree optimisation, and clock routing. Let us discuss these 3 stages one-by-one.
i. Clock Tree Synthesis
In this stage tool will roughly place the clock cells in core region and estimates the route length for clock network. During this process tool estimates the RC delay of nets, propagation delay of the clock cells and calculate the clock skew and insertion delay. Aim of this stage is to only estimate the clock network delay and to create a blue print how clock tree will be implemented. Tool does not try to meet any constraint at this stage hence this stage has a lower run time.
ii. Clock Tree Optimisation
In this stage tool will perform global routing on clock network. Then tool estimates RC delay of the clock nets. Based upon these delay tool calculates skew and insertion delay. If skew and latency targets are not achieved then tool tries to optimise the design either by moving the registers or by sizing the clock buffers/inverters or by removing or adding clock buffers/inverters.
Along with skew and latency tool also tries to improve setup and hold timing, clock transition and capacitance parameters. This is the major stage in CTS as entire CTS QoR is optimised at this stage. Thus, designer should thoroughly check the settings for CTS optimisation prior to beginning it.
iii. Clock Routing
After optimising the CTS QoR, tool finally performs detail routing. At this stage entire clock network will get connected and tool again performs an incremental optimisation. If any parameter is degraded after detail routing then that can be fixed during this incremental optimisation. Incremental optimisation is the final stage in CTS during which all PPA parameters are optimised to achieve desired results.
Analysing the CTS QoR
Parameters which should be checked after CTS completion are as below.
Skew (local and global) and latency values for all the major clocks
Design utilisation and clock buffers/inverters inserted during CTS
Design congestion
Clock tree power consumption
Setup and hold timing
Maximum clock transition and capacitance values
Any shorts of clock net with any other nets and opens in clock nets
One can find the value of clock skew and latency targets, max tran and max cap, uncertainty set during CTS stage either in CTS file or in clock specification file. Tool will consider values specified in this file as ideal value during optimisation and report generation.
Quick Questions
After reading this blog one can answer the below questions.
Which are the deciding parameters for a good CTS?
How to implement a clock tree?
What is the difference between clock mesh and clock tree?
Which checks are performed to justify the implementation of clock tree?
Hope you find this post interesting and informative as well. You can leave your suggestion or any query in comment section below.
Thank You !!! :)
How long common path reduces the effect of ocv?