*Q1: components 和 buffer 的面積資訊如何得知? *A1: We assume each component is an ideal point on the coordinate, and its area is zero. The contestant don't need to take the area of inserted clock buffers into account during clock tree optimization. But, the coordinate of the inserted clock buffer can not be outside the die area. ========================================================================== *Q2: 1. Design file(design.def)中的PINS是IO PAD或是component的pins? COMPONENTS 包含那些資訊( Flip-Flop、Buffer、Gate…)? 2. Static timing report file(timing.inf)中的s_clk和e_clk是否可由 design.def和clkbuf.lib這二個檔案的資訊中求得?s_clk和e_clk是否就是clock latency? 3. "The contestant don't need to take the area of inserted clock buffers into account during clock tree optimization.” 根據這句話來看,我 是否能insert clock buffers在同一個位置上? *A2: 1. a. The pin names specified in PINS section of design.def are primary IO pins. They are not component pins. b. The COMPONENTS section contains the locations of all components used including clock buffers. It doesn't show the cell types such as flip-flop, buffer, NAND gate, etc. c. The contestant can identify the existing clock buffers from (1) cell name (shown in clkbuf.lib), and (2) net type (shown in NET section). d. The contestant can calculate the net load of a clock tree based on component locations, and then use the set load for calculating clock buffer's timing. He/She doesn't need to calculate the timings of other cells (flip-flop, gate, ..) because them had already given in STA report (timing.inf). 2. Yes, s_clk and e_clk are clock latencies. They can be calculated based on design.def and clkbuf.lib. 3. Yes, the clock buffers can be placed at the same location. ========================================================================== *Q3: 題目當中沒有提及是否可以改變buffer的"位址"(座標) 如果改變buffer的位址合法嗎?(在不超出Die-area的前提之下) 在計算delay時是以 rise 還是 fall 的表格為主? *A3: a. The contestant can move clock buffers to desired locations within die area constraint. b. All flip-flops are triggered by positive edge clock. Basically, the contestant can consider rise delay table only if the synthesized clock tree contains clock buffers only and without clock inverters. In clkbuf.lib, the timing tables of both clock buffers and clock inverters are given. ========================================================================== *Q4: 請問 clock cycle time 是已知的嗎? 如是是的話,是要從timing.inf 的資訊中推出來? *A4: Clock cycle time will not be given, but it is a known value. The contestant can find it from timing.inf based on the following calculation. Clock cycle time = path_delay + setup + slack - (e_clk - s_clk) ========================================================================== *Q5: In the introduction you have shown the calculation of the slack, it's match the third, fourth, and fifth line in file "timing.inf". Can you tell us how to calculate the slack of the second(data_in[0]) and the final(u1/u10/F2) line in "timing.inf"? *A5: ************************************************************************************************** #start_point end_point path_delay setup cap s_clk e_clk slack data_in[0] u0/rg_1 4.1 0.3 0.02 0.0 1.80 2.6 u1/u10/F1 u1/u10/F2 10.4 0.1 0.03 2.0 2.0 -0.5 u1/u10/F2 add_out[5] 2.7 0.0 0.0 2.0 0.0 0.6 To calculate the slacks of the first and the last paths shown above, the contestant has to calculate clock cycle time, input delay, and output delay first. Clock cycle time can be derived based on a path between two flip-flops as follows. clock cycle time = path_delay + setup +slack - (e_clk - s_clk) = 10.4 + 0.1 + (-0.5) - (2.0 - 2.0) = 10 The input delay of data_in[0] can be calculated as follows. Ta_u0/rg_1.D (arrival time of u0/rg_1) = input_delay + s_clk + path_delay = input_delay + 4.1 Tr_u0/rg_1.D (require time of u0/rg_1) = clock_cycle_time + e_clk - setup = 10 + 1.8 - 0.3 = 11.5 slack = Tr_u0/rg_1.D - Ta_u0/rg_1.D = 11.5 - (input_delay + 4.1) = 2.6 => input_delay = 4.8 The output delay of add_out[5] can be calculated as follows. Ta_FF.D (arrival time of an external FF) = s_clk + path_delay + output_delay = 2.0 + 2.7 = 4.7 + output_delay Tr_FF.D (require time of an external FF) = clock_cycle_time + e_clk - setup = 10 + 0 - 0 = 10 slack = Tr_FF.D - Ta_FF.D = 10 - (4.7 + output_delay) = 0.6 => output_delay = 4.7 Based on the calculated input delays and output delays, the contestant can calculate the slacks of desired input-to-FF and FF-to-output paths with above equations. ========================================================================== *Q6: 請問一下, 為何我用 design.de f和 clkbuf.lib 這二個檔案算出來的 latency, 和 timing.inf 給的不同呢?可否告知我哪裡算錯了? (以下是我用case1 算I_124 latency 的結果) 【Case 1】 timing.inf 給 I_124 的 latency I_124 = 5.635400 由 design.de f和 clkbuf.lib 算出來的 CLK_L1_I1: Net Load:172.14 * Φ(0.00015) transition time:0 Pin Capacitance:0.002988 Output net capacitance:(Net Load)+(Pin Capacitance) = 0.028809 經由 clkbuf.lib 給的model (內插 and 外插) 可求得 cell_rise:0.117863657 rise_transition:0.078744273 CLK_L2_I1: Net Load:9148.2 * Φ(0.00015) transition time:0.078744273 Pin Capacitance:0.273776 Output net capacitance:(Net Load)+(Pin Capacitance) = 1.646006 經由 clkbuf.lib 給的model (內插 and 外插) 可求得 cell_rise:5.68014608 經由計算出來 I_124 的 latency I_124 = (CLK_L1_I1 delay) + (CLK_L2_I1 delay) = 0.117863657 + 5.68014608 = 5.798009737 *A6: After checking the result reported by PrimeTime and that calculated manually, the cell_rise and rise_transition of CLK_L1_I1 reported by contestant are correct, and the transition time and output loading of CLK_L2_I1 are correct as well. But, the cell_rise of CLK_L2_I1 is incorrect, and it should be 5.51749335. Thus, the clock latency is 0.117863657 + 5.51749335 = 5.635357 Please let the contestant pays attention to that the cell name of CLK_L2_I1 is CLKBUFX2. ========================================================================== *Q7: 1. 同一個Flip Flop 的setup time 和 clock pin capacitance 是一樣的嗎? 我從 timing.inf 看到end_point 同樣是 I_589,但他的 setup time 卻是不同 的,why? Line #start_point end_point path_delay setup cap s_clk e_clk slack 20 I_105 I_589 0.564000 0.457000 0.002687 5.635400 5.635400 8.979000 102 I_173 I_589 3.555000 0.375000 0.002687 5.635400 5.635400 6.070000 2. clkbuf.lib 的 index_1 和 index_2 要如何查 delay 例: ( index_1 , index_2 ) = ( 0.05 , 0.231 ) → 1. values = 1.505177 or 2. values = 0.255711 which one is right? index_1 ("0.05, 0.15, 0.6, 1.4, 2.3, 3.3, 4.5"); index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115"); values ( \ "0.096541, 0.227131, 0.333834, 0.610805, 0.994130, 1.505177, 1.994913", \ "0.118745, 0.249279, 0.356042, 0.633083, 1.016444, 1.527508, 2.017252", \ "0.170806, 0.307333, 0.414499, 0.691462, 1.074822, 1.585893, 2.075644", \ "0.212524, 0.358003, 0.464958, 0.742533, 1.126148, 1.637133, 2.126846", \ "0.238505, 0.394594, 0.502665, 0.780531, 1.164368, 1.675734, 2.165385", \ "0.255711, 0.422713, 0.532885, 0.812993, 1.197127, 1.708624, 2.198647", \ "0.267472, 0.446227, 0.559155, 0.843292, 1.229352, 1.741061, 2.231102"); } *A7: 1. 同一個 Flip Flop 的clock pin capacitance 是一樣的, 但 Data setup time 在不同的 data pin 是不一樣的.(這個例子的flip flop實際上是有兩個 data pin, 所以有兩種 setup time. 因為參賽者不知道flip flop的data pin, 在實作上可以 將setup time 視為因timing path而不同). 2. index_1代表row#, index_2代表column#. values = 1.505177 是正確的. ========================================================================== *Q8: 在問題A5中, 您的回答是當某path為PI連至FF時,slack = Tr_u0/rg_1.D - Ta_u0/rg_1.D = (clock_cycle_time + e_clk - setup)-(input_delay + s_clk + path_delay) 但在測試檔中的"timing.inf"所給的slack似乎跟此不一致, 如case1的timing.inf某一path "N_6 I_173 3.106000 0.412000 0.002687 0.000000 5.635400 12.117400" (此path為PI連至FF) 如用此方式計算slack= (10+5.6354-0.412)-(1+0+3.106)=11.1174!=所給的 slack12.1174 (此部分所用的input_delay為1,為由timing.con所得) *A8: We are sorry the test cases we released didn't take the input delay into account when calculating timing slack. Attached please find the updated test cases. Again, We are sorry for your inconvenience. ========================================================================== *Q9: 我想請問一下,cell library中的buffer應該是driver能力愈強,電容愈大 但CLKBUFX1的電容比CLKBUFX2的大,是不是很奇怪 還是說,這次比賽的driver能力跟cell 名字後的X1,X2無關,要自行判斷 cell name capacitance CLKBUFX1 0.003392 CLKBUFX2 0.002988 CLKBUFX3 0.003306 CLKBUFX4 0.004064 CLKBUFX8 0.007664 CLKBUFX12 0.018513 CLKBUFX16 0.002988 CLKBUFX20 0.030486 *A9: The driving strength of a cell is independent of its input capacitance in timing model. Basically, the driving strengths of the released clock buffers have the following relationship. CLKBUFX1 < CLKBUFX2 < CLKBUFX3 < CLKBUFX4 < CLKBUFX8 < CLKBUFX12 < CLKBUFX16 < CLKBUFX20 But, the actual relationship should be based on timing model. ========================================================================== *Q10: 在timing.con 檔案內的 Clock_cycle 是否只會有一個? *A10: Yes. There is only one Clock_cycle in a timing.con file. ========================================================================== *Q11: 當一個 clock 訊號, 直接由clock pad 連接到flip-flop 的clock pin (path 中無任何的 buffer), 那麼它的 latency 是否為零? *A11: Yes. We assume the resistance of a net is zero(R=0), and thus there is no interconnect delay. Only cell delay is taken account. If there is no any buffer on the path, the latency is zero. ========================================================================== *Q12: 在timing.inf 中: #start_point end_point path_delay setup cap s_clk e_clk slack I_117 N_149 3.400000 0.000000 0.002687 5.635400 0.000000 -0.035400 cap: the clock pin capacitance of the end_point if the end_point is a flip-flop. 當end_point 為Output Pad時,則cap 的值又代表什麼呢? *A12: If the end point is an output port, the cap should be zero. We have updated all static timing report files (timing.inf) of all test cases as attachment. ========================================================================== *Q13: 在比賽中最後評分的標準為 所以負slack的總和 並且latence越少越好 那假設一下,如果以下情況發生 clock constrain is 10 Before optimize : Total slack = 100 , 有一百個負的slack 每個負的有-1,則 clock period = 10 + 1 = 11 After optimize: Total slack = 50, 有5個負的slack (假設的) 每個負的有-10,則 clock period = 10 + 10 = 20 那這樣是算好的結果嗎? *A13: We have updated the evaluation criteria in the attached file. The updated criteria limit that the maximum negative slack after optimization should not larger than that before optimization. Basically, the first two evaluation criteria are equally important. ========================================================================== *Q14: 在test_case3 的 timing.inf 中, I_99 I_46 5.096000 0.199000 0.002912 4.578500 4.733800 4.860200 slack = Tr_u0/rg_1.D - Ta_u0/rg_1.D = (clock_cycle_time + e_clk - setup) - (s_clk + path_delay) = (10 + 4.7338 - 0.199 ) - (4.5785 + 5.096) = 4.860300 算出來的值並不等於input給的4.860200 有好幾條path 都有這種問題, 請問這時候是以算出來的值為準, 還是以input給的值為準? *A14: 此誤差是由於primetime計算時四捨五入所產生的誤差 請以實際算出來的值為準。評分時將考慮此因素, 只取到小數點以下第三位 ========================================================================== *Q15: 1. 請問一下, 在 clkbuf.lib 中,同一個cell (CLKBUFX1)中 的 cell_rise rise_transition cell_fall fall_transition 這4個table 的 index_1 和 index_2 中的值是否一樣? index_1 ("0.05, 0.15, 0.6, 1.4, 2.3, 3.3, 4.5"); index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115"); 2. slack = Tr_u0/rg_1.D - Ta_u0/rg_1.D = (output_delay + clock_cycle_time + e_clk - setup)-(input_delay + s_clk + path_delay) 如果start_point 是input PAD 及 end_point 是 output PAD 的話,上面的式子對 嗎? *A15: 1. Yes 2. The contestant can ignore this kind of combinational paths, because their timings are independent of clock tree optimization. ========================================================================== *Q16: 在timing.inf檔案中, 有不少的Slack時大於一個clock cycle, 這應該是不允許的吧, 是否程式結果應該使Slack小於一個clock cycle呢? 在evaluation中似乎沒提到這點... *A16: Slack可能大於一個clock cycle, 以題目中的Fig2來說明: 例如Fig2 的clock cycle time=10. 但因clock skew2, F1->F2 的cycle time C1=10.5 正確地說, F1->F2 的Slack 不會大於10.5, 而不是10. ========================================================================== *Q17: 在前面,你們曾說不同的buffer的位置(座標)可以重複, 那加入的buffer的位置(座標)可以跟Design file中COMPONENTS 的位置(座標)重複嗎? *A17: Yes. ========================================================================== *Q18: 1.有規定"推力小的clock buffer" 不行推動 "推力大的clock buffer" 嗎? 例:   CLKBUFX1 -> CLKBUFX20 2.題目 Net load file 的算法是不是 driving cell 到所有的 driving cell 的線 長乘上 capacitance factor (Φ)就行了 例: driving cell location: (x, y) driven cells location: (x1, y1) (x2, y2) (x3, y3) Net load = { ( |x - x1| + |y - y1| ) + ( |x - x2| + |y - y2| ) + ( |x - x3| + |y - y3| ) } * Φ *A18: 1. No limitation. 2. Yes. ========================================================================== *Q19: 1. 請問,在Q2中,你們曾經回答 "Yes, the clock buffers can be placed at the same location." 那如果有兩個clock buffer在同一位置(座標),那該 如何計算delay呢? 要以哪一顆為主,還是兩顆相加起來呢. 2. 請問,如果只用到clock buffers only,沒用到clock inverters,那是不是只需看 (只用到) buffer內的 cell rise 和 rise transition 這兩欄valuse值就夠了 *A19: 1.delay是否相加與physical位置無關, 應該看這兩顆buffer是否有邏輯上的串聯關係. 2.Yes. ========================================================================== *Q20: 請問 在6/2交件之後,你們會去測試保留的3-5個測試檔, 那你們所用的的library檔(clkbuf.lib),和現在我們 目前手邊所使用的library檔,是一模一樣的嗎? *A20: The answer is YES. We will use the same library for all test cases. ========================================================================== *Q21: 在 Evaluation 中有一項 "The maximum negative slack of setup time violations" The maximum negative slack after optimizaiton should be less than or equal to that before optimization. 在 case 2 中, Before optimization 得到的這個值是 -0.495000 如果在 optimization 後, 這個值增加了(例:-0.600000),但The number of setup time violation paths 減少了, 請問這是被允許的嗎? *A21: The answer is NO. ========================================================================== *Q22: optimum 後 output 出來的design及net load, 是否可以直接 import 到PrimeTime 產生Static timing report? 如果不行,如何驗証 output 的檔案是正確的? *A22: We will use the procedure below to verify the output results. 1. Translate design_opt.def (design file with optimized clock tree) to verilog netlist format. 2. Translate net_load.rpt (net load file) to set_load format. 3. Translate timing.con (timing constraint file) to PrimeTime format. 4. Import timing models of all library cells (not released) to PrimeTime 5. Import above files into PrimeTime. 6. Import SDF file (not released) to PrimeTime. (The SDF file has excluded the timing data of clock tree). 7. Use PrimeTime to get the clock latency of each leaf pin and the timing slack. Because timing models of all library cells and SDF file are not released, the attendant cannot do timing verification as above procedure. However, they still can use PrimeTime to verify their output results. For example, one of the methods is as below: 1. Translate the extracted clock tree to verilog netlist format. 2. Translate the net loads of clock tree including the leaf pins pin_cap to set_load format. 3. Translate timing.con (timing constraint file) to PrimeTime format. 4. Import above files to PrimeTime. 5. Use PrimeTime to get the clock latency of each end point. The attendant can thus calculate the clock latencies of new clock tree and compare them to the original clock latencies. This can help the attendant checking timing correctness. ==========================================================================