*Q1:

components 和 buffer 的面積資訊如何得知？


*A1:
We assume each component is an ideal point on the coordinate, and its
area is zero. The contestant don't need to take the area of inserted clock
buffers into account during clock tree optimization. But, the coordinate of the inserted
clock buffer can not be outside the die area.


==========================================================================
*Q2:

1. Design file(design.def)中的PINS是IO PAD或是component的pins?
   COMPONENTS 包含那些資訊( Flip-Flop、Buffer、Gate…)?

2. Static timing report file(timing.inf)中的s_clk和e_clk是否可由   
   design.def和clkbuf.lib這二個檔案的資訊中求得？s_clk和e_clk是否就是clock 
   latency？                                                               

3. "The contestant don't need to take the area of inserted clock   
   buffers into account during clock tree optimization.” 根據這句話來看，我
   是否能insert clock buffers在同一個位置上？                              
                                                                        
*A2:
1. 
  a. The pin names specified in PINS section of design.def are primary  
     IO pins. They are not component pins.                                             
  b. The COMPONENTS section contains the locations of all components used 
     including clock buffers.  It doesn't show the cell types such as 
     flip-flop, buffer, NAND gate, etc.                                                 
  c. The contestant can identify the existing clock buffers from (1) cell 
     name (shown in clkbuf.lib), and (2) net type (shown in NET section).                    
  d. The contestant can calculate the net load of a clock tree based on 
     component locations, and then use the set load for calculating clock 
     buffer's timing. He/She doesn't need to calculate the timings of 
     other cells (flip-flop, gate, ..) because them had already given in                                       
     STA report (timing.inf).                                     

2.
  Yes, s_clk and e_clk are clock latencies. They can be calculated based
  on design.def and clkbuf.lib.                                                       

3. 
  Yes, the clock buffers can be placed at the same location.            
                                                                        

==========================================================================
*Q3:

題目當中沒有提及是否可以改變buffer的"位址"(座標)
如果改變buffer的位址合法嗎？(在不超出Die-area的前提之下)
在計算delay時是以 rise 還是 fall 的表格為主？

*A3:
 a. The contestant can move clock buffers to desired locations within die
    area constraint.
 b. All flip-flops are triggered by positive edge clock. Basically,
    the contestant can consider rise delay table only if the synthesized 
    clock tree contains clock buffers only and without clock inverters. 
    In clkbuf.lib, the timing tables of both clock buffers and clock 
    inverters are given.


==========================================================================
*Q4:
請問 clock cycle time 是已知的嗎？
如是是的話，是要從timing.inf 的資訊中推出來？


*A4:
Clock cycle time will not be given, but it is a known value.
The contestant can find it from timing.inf based on the following calculation.

     Clock cycle time = path_delay + setup + slack - (e_clk - s_clk)


==========================================================================
*Q5:
In the introduction you have shown the calculation of the slack, it's match
the third, fourth, and fifth line in file "timing.inf".
Can you tell us how to calculate the slack of the second(data_in[0]) and
the final(u1/u10/F2) line in "timing.inf"?


*A5:
**************************************************************************************************
   #start_point   end_point   path_delay  setup cap   s_clk e_clk slack
   data_in[0]           u0/rg_1           4.1         0.3   0.02  0.0
   1.80     2.6
   u1/u10/F1            u1/u10/F2   10.4        0.1   0.03  2.0   2.0
   -0.5
   u1/u10/F2            add_out[5]  2.7         0.0   0.0   2.0   0.0   0.6

To calculate the slacks of the first and the last paths shown above, the
contestant has to calculate clock cycle time, input delay, and output delay first. 
Clock cycle time can be derived based on a path between two flip-flops as follows.

clock cycle time = path_delay + setup +slack - (e_clk - s_clk)
                 = 10.4 + 0.1 + (-0.5) - (2.0 - 2.0) = 10

The input delay of data_in[0] can be calculated as follows.
Ta_u0/rg_1.D (arrival time of u0/rg_1) = input_delay + s_clk + path_delay
= input_delay + 4.1
Tr_u0/rg_1.D (require time of u0/rg_1) = clock_cycle_time + e_clk - setup
= 10 + 1.8 - 0.3 = 11.5
slack = Tr_u0/rg_1.D - Ta_u0/rg_1.D = 11.5 - (input_delay + 4.1) = 2.6
=> input_delay = 4.8

The output delay of add_out[5] can be calculated as follows.
Ta_FF.D (arrival time of an external FF) = s_clk + path_delay +
output_delay = 2.0 + 2.7 = 4.7 + output_delay
Tr_FF.D (require time of an external FF) = clock_cycle_time + e_clk - setup
= 10 + 0 - 0 = 10
slack = Tr_FF.D - Ta_FF.D = 10 - (4.7 + output_delay) = 0.6
=> output_delay = 4.7

Based on the calculated input delays and output delays, the contestant can
calculate the
slacks of desired input-to-FF and FF-to-output paths with above equations.


==========================================================================
*Q6:

請問一下，
為何我用 design.de f和 clkbuf.lib 這二個檔案算出來的 latency，
和 timing.inf 給的不同呢？可否告知我哪裡算錯了？
(以下是我用case1 算I_124 latency 的結果)


【Case 1】
timing.inf 給 I_124 的 latency
I_124 = 5.635400


由 design.de f和 clkbuf.lib 算出來的
CLK_L1_I1：
    Net Load：172.14 * Φ(0.00015)         transition time：0
    Pin Capacitance：0.002988
    Output net capacitance：(Net Load)+(Pin Capacitance) = 0.028809

    經由 clkbuf.lib 給的model (內插 and 外插) 可求得
    cell_rise：0.117863657
    rise_transition：0.078744273

CLK_L2_I1：
    Net Load：9148.2 * Φ(0.00015)         transition time：0.078744273
    Pin Capacitance：0.273776
    Output net capacitance：(Net Load)+(Pin Capacitance) = 1.646006

    經由 clkbuf.lib 給的model (內插 and 外插) 可求得
    cell_rise：5.68014608

經由計算出來 I_124 的 latency
I_124 = (CLK_L1_I1 delay) + (CLK_L2_I1 delay)
          = 0.117863657 + 5.68014608
          = 5.798009737


*A6:
After checking the result reported by PrimeTime and that calculated
manually,
the cell_rise and rise_transition of CLK_L1_I1 reported by contestant are
correct,
and the transition time and output loading of CLK_L2_I1 are correct as
well.
But, the cell_rise of CLK_L2_I1 is incorrect, and it should be 5.51749335.
Thus, the clock latency is 0.117863657 + 5.51749335 = 5.635357

Please let the contestant pays attention to that the cell name of CLK_L2_I1
is CLKBUFX2.


==========================================================================
*Q7:

1. 同一個Flip Flop 的setup time 和 clock pin capacitance 是一樣的嗎？
   我從 timing.inf 看到end_point 同樣是 I_589，但他的 setup time 卻是不同 
   的，why？
   Line       #start_point     end_point     path_delay     setup             cap                 s_clk            e_clk            slack
   20          I_105               I_589           0.564000       0.457000      0.002687       5.635400     5.635400       8.979000
   102        I_173               I_589           3.555000       0.375000      0.002687       5.635400     5.635400       6.070000

2. clkbuf.lib 的 index_1 和 index_2 要如何查 delay
       例：
                ( index_1 , index_2 ) = ( 0.05 , 0.231 )  →       1.  values = 1.505177 
                                                               or  2.  values = 0.255711
                 which one is right？
 
 
        index_1 ("0.05, 0.15, 0.6, 1.4, 2.3, 3.3, 4.5");
        index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115");
        values ( \
          "0.096541, 0.227131, 0.333834, 0.610805, 0.994130, 1.505177, 1.994913", \
          "0.118745, 0.249279, 0.356042, 0.633083, 1.016444, 1.527508, 2.017252", \
          "0.170806, 0.307333, 0.414499, 0.691462, 1.074822, 1.585893, 2.075644", \
          "0.212524, 0.358003, 0.464958, 0.742533, 1.126148, 1.637133, 2.126846", \
          "0.238505, 0.394594, 0.502665, 0.780531, 1.164368, 1.675734, 2.165385", \
          "0.255711, 0.422713, 0.532885, 0.812993, 1.197127, 1.708624, 2.198647", \
          "0.267472, 0.446227, 0.559155, 0.843292, 1.229352, 1.741061, 2.231102");
      }


*A7:

1. 同一個 Flip Flop 的clock pin capacitance 是一樣的, 但 Data setup time
   在不同的 data pin 是不一樣的.(這個例子的flip flop實際上是有兩個 data pin, 
   所以有兩種 setup time.  因為參賽者不知道flip flop的data pin, 在實作上可以
   將setup time 視為因timing path而不同).

2. index_1代表row#,  index_2代表column#. values = 1.505177 是正確的.

==========================================================================
*Q8:

在問題A5中，
您的回答是當某path為PI連至FF時，slack = Tr_u0/rg_1.D - Ta_u0/rg_1.D
= (clock_cycle_time + e_clk - setup)-(input_delay + s_clk + path_delay)
但在測試檔中的"timing.inf"所給的slack似乎跟此不一致，
如case1的timing.inf某一path "N_6 I_173 3.106000 0.412000 0.002687 0.000000
5.635400 12.117400" (此path為PI連至FF)
如用此方式計算slack= (10+5.6354-0.412)-(1+0+3.106)=11.1174!=所給的
slack12.1174
(此部分所用的input_delay為1，為由timing.con所得)

*A8:

We are sorry the test cases we released didn't take the input delay into
account when calculating timing slack. Attached please find the updated 
test cases. Again, We are sorry for your inconvenience.

==========================================================================
*Q9:

我想請問一下，cell library中的buffer應該是driver能力愈強，電容愈大
但CLKBUFX1的電容比CLKBUFX2的大，是不是很奇怪
還是說，這次比賽的driver能力跟cell 名字後的X1,X2無關，要自行判斷

cell name            capacitance
CLKBUFX1        0.003392
CLKBUFX2        0.002988
CLKBUFX3        0.003306
CLKBUFX4        0.004064
CLKBUFX8        0.007664
CLKBUFX12      0.018513
CLKBUFX16      0.002988
CLKBUFX20      0.030486


*A9:

The driving strength of a cell is independent of its input capacitance in
timing model.
Basically, the driving strengths of the released clock buffers have the
following relationship.

CLKBUFX1 < CLKBUFX2  < CLKBUFX3 < CLKBUFX4 < CLKBUFX8 < CLKBUFX12 <
CLKBUFX16 < CLKBUFX20

But, the actual relationship should be based on timing model.


==========================================================================
*Q10:
在timing.con 檔案內的 Clock_cycle 是否只會有一個？

*A10:
Yes. There is only one Clock_cycle in a timing.con file.

==========================================================================
*Q11:
當一個 clock 訊號，
  直接由clock pad 連接到flip-flop 的clock pin (path 中無任何的 buffer)，
  那麼它的 latency 是否為零？


*A11:
   Yes. We assume the resistance of a net is zero(R=0), and thus there is
no interconnect delay.
Only cell delay is taken account. If there is no any buffer on the path,
the latency is zero.

==========================================================================
*Q12:
在timing.inf 中：
#start_point     end_point     path_delay     setup             cap
s_clk             e_clk             slack
  I_117             N_149         3.400000       0.000000       0.002687
5.635400       0.000000       -0.035400

cap: the clock pin capacitance of the end_point if the end_point is a
flip-flop.
當end_point 為Output Pad時，則cap 的值又代表什麼呢？


*A12:
If the end point is an output port, the cap should be zero.
We have updated all static timing report files (timing.inf) of
all test cases as attachment.

==========================================================================
*Q13:
在比賽中最後評分的標準為
所以負slack的總和
並且latence越少越好
那假設一下，如果以下情況發生
clock constrain is 10
Before optimize :
    Total slack = 100 , 有一百個負的slack
    每個負的有-1，則 clock period = 10 + 1 = 11
After optimize:
    Total slack = 50, 有5個負的slack  (假設的)
    每個負的有-10，則 clock period = 10 + 10 = 20

那這樣是算好的結果嗎？


*A13:
We have updated the evaluation criteria in the attached file.
The updated criteria limit that the maximum negative slack
after optimization should not larger than that before optimization.
Basically, the first two evaluation criteria are equally important.


==========================================================================
*Q14:
在test_case3 的 timing.inf 中，
I_99 I_46 5.096000 0.199000 0.002912 4.578500 4.733800 4.860200

slack = Tr_u0/rg_1.D - Ta_u0/rg_1.D
         = (clock_cycle_time + e_clk - setup) - (s_clk + path_delay)
         = (10 + 4.7338 - 0.199 ) - (4.5785 + 5.096)
         = 4.860300
算出來的值並不等於input給的4.860200

有好幾條path 都有這種問題，
請問這時候是以算出來的值為準，
還是以input給的值為準？


*A14:
此誤差是由於primetime計算時四捨五入所產生的誤差
請以實際算出來的值為準。評分時將考慮此因素，
只取到小數點以下第三位


==========================================================================
*Q15:
1. 請問一下，
在 clkbuf.lib 中，同一個cell (CLKBUFX1)中 的
   cell_rise
   rise_transition
   cell_fall
   fall_transition
這4個table 的 index_1 和 index_2 中的值是否一樣？
        index_1 ("0.05, 0.15, 0.6, 1.4, 2.3, 3.3, 4.5");
        index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115");

2. slack = Tr_u0/rg_1.D - Ta_u0/rg_1.D
         = (output_delay + clock_cycle_time + e_clk - setup)-(input_delay +
s_clk + path_delay)
如果start_point 是input PAD 及 end_point 是 output PAD 的話，上面的式子對
嗎?

*A15:
1. Yes

2. The contestant can ignore this kind of combinational paths, because
their timings are
independent of clock tree optimization.


==========================================================================
*Q16:
    在timing.inf檔案中，
    有不少的Slack時大於一個clock cycle，
    這應該是不允許的吧，
    是否程式結果應該使Slack小於一個clock cycle呢？
    在evaluation中似乎沒提到這點...


*A16:
Slack可能大於一個clock cycle,
以題目中的Fig2來說明:
例如Fig2 的clock cycle time=10. 但因clock skew2, F1->F2 的cycle time
C1=10.5
正確地說, F1->F2 的Slack 不會大於10.5, 而不是10.

==========================================================================
*Q17:
    在前面,你們曾說不同的buffer的位置(座標)可以重複,
    那加入的buffer的位置(座標)可以跟Design file中COMPONENTS
    的位置(座標)重複嗎?

*A17:
Yes.

==========================================================================
*Q18:
1.有規定"推力小的clock buffer" 不行推動 "推力大的clock buffer" 嗎？
   例：
    　　CLKBUFX1 -> CLKBUFX20

2.題目 Net load file 的算法是不是 driving cell 到所有的 driving cell 的線
長乘上 capacitance factor (Φ)就行了
   例：
            driving cell location: (x, y)
            driven cells location: (x1, y1)  (x2, y2)  (x3, y3)

            Net load = { ( |x - x1| + |y - y1| ) + ( |x - x2| + |y - y2| )
+ ( |x - x3| + |y - y3| ) } * Φ


*A18:
1. No limitation.

2. Yes.
==========================================================================
*Q19:
1. 請問,在Q2中,你們曾經回答 "Yes, the clock buffers can be placed at
   the same location." 那如果有兩個clock buffer在同一位置(座標),那該
   如何計算delay呢? 要以哪一顆為主,還是兩顆相加起來呢.

2. 請問,如果只用到clock buffers only,沒用到clock inverters,那是不是只需看
  (只用到) buffer內的 cell rise 和 rise transition 這兩欄valuse值就夠了

*A19:
1.delay是否相加與physical位置無關, 應該看這兩顆buffer是否有邏輯上的串聯關係.

2.Yes.
==========================================================================
*Q20:
  請問
          在6/2交件之後,你們會去測試保留的3-5個測試檔,
       那你們所用的的library檔(clkbuf.lib),和現在我們
       目前手邊所使用的library檔,是一模一樣的嗎?

*A20:
The answer is YES. We will use the same library for all test cases.

==========================================================================
*Q21:
在 Evaluation 中有一項
"The maximum negative slack of setup time violations"
The maximum negative slack after optimizaiton should be less than or equal to that before optimization.
 
在 case 2 中，
Before optimization 得到的這個值是    -0.495000
如果在 optimization 後，
這個值增加了(例：-0.600000)，但The number of setup time violation paths 減少了，
請問這是被允許的嗎？


*A21:
The answer is NO.
==========================================================================
*Q22:

optimum 後 output 出來的design及net load，
是否可以直接 import 到PrimeTime 產生Static timing report？

如果不行，如何驗証 output 的檔案是正確的？


*A22:

We will use the procedure below to verify the output results.
1. Translate design_opt.def (design file with optimized clock tree) to
verilog netlist format.
2. Translate net_load.rpt (net load file) to set_load format.
3. Translate timing.con (timing constraint file) to PrimeTime format.
4. Import timing models of all library cells (not released) to PrimeTime
5. Import above files into PrimeTime.
6. Import SDF file (not released) to PrimeTime. (The SDF file has excluded
the timing data of clock tree).
7. Use PrimeTime to get the clock latency of each leaf pin and the timing
slack.

Because timing models of all library cells and SDF file are not released,
the attendant cannot do timing verification as above procedure. However,
they still can use
PrimeTime to verify their output results. For example, one of the methods
is as below:
1. Translate the extracted clock tree to verilog netlist format.
2. Translate the net loads of clock tree including the leaf pins pin_cap to
set_load format.
3. Translate timing.con (timing constraint file) to PrimeTime format.
4. Import above files to PrimeTime.
5. Use PrimeTime to get the clock latency of each end point.

The attendant can thus calculate the clock latencies of new clock tree and
compare
them to the original clock latencies. This can help the attendant checking
timing correctness.


==========================================================================