xref: /arm-trusted-firmware/docs/perf/psci-performance-juno.rst (revision 91f16700b400a8c0651d24a598fc48ee2997a0d7)
1*91f16700SchasingluluPSCI Performance Measurements on Arm Juno Development Platform
2*91f16700Schasinglulu==============================================================
3*91f16700Schasinglulu
4*91f16700SchasingluluThis document summarises the findings of performance measurements of key
5*91f16700Schasingluluoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI)
6*91f16700Schasingluluimplementation, using the in-built Performance Measurement Framework (PMF) and
7*91f16700Schasingluluruntime instrumentation timestamps.
8*91f16700Schasinglulu
9*91f16700SchasingluluMethod
10*91f16700Schasinglulu------
11*91f16700Schasinglulu
12*91f16700SchasingluluWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
13*91f16700Schasinglulux Cortex-A57 clusters running at the following frequencies:
14*91f16700Schasinglulu
15*91f16700Schasinglulu+-----------------+--------------------+
16*91f16700Schasinglulu| Domain          | Frequency (MHz)    |
17*91f16700Schasinglulu+=================+====================+
18*91f16700Schasinglulu| Cortex-A57      | 900 (nominal)      |
19*91f16700Schasinglulu+-----------------+--------------------+
20*91f16700Schasinglulu| Cortex-A53      | 650 (underdrive)   |
21*91f16700Schasinglulu+-----------------+--------------------+
22*91f16700Schasinglulu| AXI subsystem   | 533                |
23*91f16700Schasinglulu+-----------------+--------------------+
24*91f16700Schasinglulu
25*91f16700SchasingluluJuno supports CPU, cluster and system power down states, corresponding to power
26*91f16700Schasinglululevels 0, 1 and 2 respectively. It does not support any retention states.
27*91f16700Schasinglulu
28*91f16700SchasingluluGiven that runtime instrumentation using PMF is invasive, there is a small
29*91f16700Schasinglulu(unquantified) overhead on the results. PMF uses the generic counter for
30*91f16700Schasinglulutimestamps, which runs at 50MHz on Juno.
31*91f16700Schasinglulu
32*91f16700SchasingluluThe following source trees and binaries were used:
33*91f16700Schasinglulu
34*91f16700Schasinglulu- TF-A [`v2.9-rc0`_]
35*91f16700Schasinglulu- TFTF [`v2.9-rc0`_]
36*91f16700Schasinglulu
37*91f16700SchasingluluPlease see the Runtime Instrumentation :ref:`Testing Methodology
38*91f16700Schasinglulu<Runtime Instrumentation Methodology>`
39*91f16700Schasinglulupage for more details.
40*91f16700Schasinglulu
41*91f16700SchasingluluProcedure
42*91f16700Schasinglulu---------
43*91f16700Schasinglulu
44*91f16700Schasinglulu#. Build TFTF with runtime instrumentation enabled:
45*91f16700Schasinglulu
46*91f16700Schasinglulu    .. code:: shell
47*91f16700Schasinglulu
48*91f16700Schasinglulu        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
49*91f16700Schasinglulu            TESTS=runtime-instrumentation all
50*91f16700Schasinglulu
51*91f16700Schasinglulu#. Fetch Juno's SCP binary from TF-A's archive:
52*91f16700Schasinglulu
53*91f16700Schasinglulu    .. code:: shell
54*91f16700Schasinglulu
55*91f16700Schasinglulu        curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \
56*91f16700Schasinglulu            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin
57*91f16700Schasinglulu
58*91f16700Schasinglulu#. Build TF-A with the following build options:
59*91f16700Schasinglulu
60*91f16700Schasinglulu    .. code:: shell
61*91f16700Schasinglulu
62*91f16700Schasinglulu        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
63*91f16700Schasinglulu            BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \
64*91f16700Schasinglulu            ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip
65*91f16700Schasinglulu
66*91f16700Schasinglulu#. Load the following images onto the development board: ``fip.bin``,
67*91f16700Schasinglulu   ``scp_bl2.bin``.
68*91f16700Schasinglulu
69*91f16700SchasingluluResults
70*91f16700Schasinglulu-------
71*91f16700Schasinglulu
72*91f16700Schasinglulu``CPU_SUSPEND`` to deepest power level
73*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74*91f16700Schasinglulu
75*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
76*91f16700Schasinglulu        parallel (v2.9)
77*91f16700Schasinglulu
78*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
79*91f16700Schasinglulu    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
80*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
81*91f16700Schasinglulu    |    0    |  0   |   104.58  | 241.20 |     5.26    |
82*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
83*91f16700Schasinglulu    |    0    |  1   |   384.24  | 22.50  |    138.76   |
84*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
85*91f16700Schasinglulu    |    1    |  0   |   244.56  | 22.18  |     5.16    |
86*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
87*91f16700Schasinglulu    |    1    |  1   |   670.56  | 18.58  |     4.44    |
88*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
89*91f16700Schasinglulu    |    1    |  2   |   809.36  | 269.28 |     4.44    |
90*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
91*91f16700Schasinglulu    |    1    |  3   |   984.96  | 219.70 |    79.62    |
92*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
93*91f16700Schasinglulu
94*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
95*91f16700Schasinglulu        parallel (v2.10)
96*91f16700Schasinglulu
97*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
98*91f16700Schasinglulu    | Cluster | Core |     Powerdown     | Wakeup | Cache Flush |
99*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
100*91f16700Schasinglulu    |    0    |  0   | 242.66 (+132.03%) | 245.1  |     5.4     |
101*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
102*91f16700Schasinglulu    |    0    |  1   |  522.08 (+35.87%) | 26.24  |    138.32   |
103*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
104*91f16700Schasinglulu    |    1    |  0   |  104.36 (-57.33%) |  27.1  |     5.32    |
105*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
106*91f16700Schasinglulu    |    1    |  1   |  382.56 (-42.95%) | 23.34  |     4.42    |
107*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
108*91f16700Schasinglulu    |    1    |  2   |       807.74      | 271.54 |     4.64    |
109*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
110*91f16700Schasinglulu    |    1    |  3   |       981.36      | 221.8  |    79.48    |
111*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
112*91f16700Schasinglulu
113*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
114*91f16700Schasinglulu        serial (v2.9)
115*91f16700Schasinglulu
116*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
117*91f16700Schasinglulu    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
118*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
119*91f16700Schasinglulu    |    0    |  0   |   236.56  | 23.24  |    138.18   |
120*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
121*91f16700Schasinglulu    |    0    |  1   |   236.86  | 23.28  |    138.10   |
122*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
123*91f16700Schasinglulu    |    1    |  0   |   281.04  | 22.80  |    77.24    |
124*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
125*91f16700Schasinglulu    |    1    |  1   |   100.28  | 18.52  |     4.54    |
126*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
127*91f16700Schasinglulu    |    1    |  2   |   100.12  | 18.78  |     4.50    |
128*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
129*91f16700Schasinglulu    |    1    |  3   |   100.36  | 18.94  |     4.44    |
130*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
131*91f16700Schasinglulu
132*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
133*91f16700Schasinglulu        serial (v2.10)
134*91f16700Schasinglulu
135*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
136*91f16700Schasinglulu    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
137*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
138*91f16700Schasinglulu    |    0    |  0   |   236.84  |  27.1  |    138.36   |
139*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
140*91f16700Schasinglulu    |    0    |  1   |   236.96  |  27.1  |    138.32   |
141*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
142*91f16700Schasinglulu    |    1    |  0   |   280.06  | 26.94  |     77.5    |
143*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
144*91f16700Schasinglulu    |    1    |  1   |   100.76  | 23.42  |     4.36    |
145*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
146*91f16700Schasinglulu    |    1    |  2   |   100.02  | 23.42  |     4.44    |
147*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
148*91f16700Schasinglulu    |    1    |  3   |   100.08  |  23.2  |     4.4     |
149*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
150*91f16700Schasinglulu
151*91f16700Schasinglulu``CPU_SUSPEND`` to power level 0
152*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153*91f16700Schasinglulu
154*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
155*91f16700Schasinglulu        parallel (v2.9)
156*91f16700Schasinglulu
157*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
158*91f16700Schasinglulu    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
159*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
160*91f16700Schasinglulu    |    0    |  0   |   662.34  | 15.22  |     8.08    |
161*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
162*91f16700Schasinglulu    |    0    |  1   |   802.00  | 15.50  |     8.16    |
163*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
164*91f16700Schasinglulu    |    1    |  0   |   385.22  | 15.74  |     7.88    |
165*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
166*91f16700Schasinglulu    |    1    |  1   |   106.16  | 16.06  |     7.44    |
167*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
168*91f16700Schasinglulu    |    1    |  2   |   524.38  | 15.64  |     7.34    |
169*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
170*91f16700Schasinglulu    |    1    |  3   |   246.00  | 15.78  |     7.72    |
171*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
172*91f16700Schasinglulu
173*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
174*91f16700Schasinglulu        parallel (v2.10)
175*91f16700Schasinglulu
176*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
177*91f16700Schasinglulu    | Cluster | Core |     Powerdown     | Wakeup | Cache Flush |
178*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
179*91f16700Schasinglulu    |    0    |  0   |       801.04      | 18.66  |     8.22    |
180*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
181*91f16700Schasinglulu    |    0    |  1   |       661.28      | 19.08  |     7.88    |
182*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
183*91f16700Schasinglulu    |    1    |  0   |  105.9 (-72.51%)  |  20.3  |     7.58    |
184*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
185*91f16700Schasinglulu    |    1    |  1   | 383.58 (+261.32%) |  20.4  |     7.42    |
186*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
187*91f16700Schasinglulu    |    1    |  2   |       523.52      |  20.1  |     7.74    |
188*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
189*91f16700Schasinglulu    |    1    |  3   |       244.5       | 20.16  |     7.56    |
190*91f16700Schasinglulu    +---------+------+-------------------+--------+-------------+
191*91f16700Schasinglulu
192*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.9)
193*91f16700Schasinglulu
194*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
195*91f16700Schasinglulu    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
196*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
197*91f16700Schasinglulu    |    0    |  0   |   99.80   | 15.94  |     5.42    |
198*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
199*91f16700Schasinglulu    |    0    |  1   |   99.76   | 15.80  |     5.24    |
200*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
201*91f16700Schasinglulu    |    1    |  0   |   278.26  | 16.16  |     4.58    |
202*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
203*91f16700Schasinglulu    |    1    |  1   |   96.88   | 16.00  |     4.52    |
204*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
205*91f16700Schasinglulu    |    1    |  2   |   96.80   | 16.12  |     4.54    |
206*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
207*91f16700Schasinglulu    |    1    |  3   |   96.88   | 16.12  |     4.54    |
208*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
209*91f16700Schasinglulu
210*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.10)
211*91f16700Schasinglulu
212*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
213*91f16700Schasinglulu    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
214*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
215*91f16700Schasinglulu    |    0    |  0   |   99.84   | 18.86  |     5.54    |
216*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
217*91f16700Schasinglulu    |    0    |  1   |   100.2   | 18.82  |     5.66    |
218*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
219*91f16700Schasinglulu    |    1    |  0   |   278.12  | 20.56  |     4.48    |
220*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
221*91f16700Schasinglulu    |    1    |  1   |   96.68   | 20.62  |     4.3     |
222*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
223*91f16700Schasinglulu    |    1    |  2   |   96.94   | 20.14  |     4.42    |
224*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
225*91f16700Schasinglulu    |    1    |  3   |   96.68   | 20.46  |     4.32    |
226*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
227*91f16700Schasinglulu
228*91f16700Schasinglulu``CPU_OFF`` on all non-lead CPUs
229*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
230*91f16700Schasinglulu
231*91f16700Schasinglulu``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
232*91f16700Schasinglulucore to the deepest power level.
233*91f16700Schasinglulu
234*91f16700Schasinglulu.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.9)
235*91f16700Schasinglulu
236*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
237*91f16700Schasinglulu    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
238*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
239*91f16700Schasinglulu    |    0    |  0   |   235.76  | 26.14  |    137.80   |
240*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
241*91f16700Schasinglulu    |    0    |  1   |   235.40  | 25.72  |    137.62   |
242*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
243*91f16700Schasinglulu    |    1    |  0   |   174.70  | 22.40  |    77.26    |
244*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
245*91f16700Schasinglulu    |    1    |  1   |   100.92  | 24.04  |     4.52    |
246*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
247*91f16700Schasinglulu    |    1    |  2   |   100.68  | 22.44  |     4.36    |
248*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
249*91f16700Schasinglulu    |    1    |  3   |   101.36  | 22.70  |     4.52    |
250*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
251*91f16700Schasinglulu
252*91f16700Schasinglulu.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.10)
253*91f16700Schasinglulu
254*91f16700Schasinglulu    +---------------------------------------------------+
255*91f16700Schasinglulu    |       test_rt_instr_cpu_off_serial (latest)       |
256*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
257*91f16700Schasinglulu    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
258*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
259*91f16700Schasinglulu    |    0    |  0   |   236.04  | 30.02  |    137.9    |
260*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
261*91f16700Schasinglulu    |    0    |  1   |   235.38  |  29.7  |    137.72   |
262*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
263*91f16700Schasinglulu    |    1    |  0   |   175.18  | 26.96  |    77.26    |
264*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
265*91f16700Schasinglulu    |    1    |  1   |   100.56  | 28.34  |     4.32    |
266*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
267*91f16700Schasinglulu    |    1    |  2   |   100.38  | 26.82  |     4.3     |
268*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
269*91f16700Schasinglulu    |    1    |  3   |   100.86  | 26.98  |     4.42    |
270*91f16700Schasinglulu    +---------+------+-----------+--------+-------------+
271*91f16700Schasinglulu
272*91f16700Schasinglulu``CPU_VERSION`` in parallel
273*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~
274*91f16700Schasinglulu
275*91f16700Schasinglulu.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.9)
276*91f16700Schasinglulu
277*91f16700Schasinglulu    +-------------+--------+-------------+
278*91f16700Schasinglulu    |   Cluster   |  Core  |   Latency   |
279*91f16700Schasinglulu    +-------------+--------+-------------+
280*91f16700Schasinglulu    |      0      |   0    |     1.48    |
281*91f16700Schasinglulu    +-------------+--------+-------------+
282*91f16700Schasinglulu    |      0      |   1    |     1.04    |
283*91f16700Schasinglulu    +-------------+--------+-------------+
284*91f16700Schasinglulu    |      1      |   0    |     0.56    |
285*91f16700Schasinglulu    +-------------+--------+-------------+
286*91f16700Schasinglulu    |      1      |   1    |     0.92    |
287*91f16700Schasinglulu    +-------------+--------+-------------+
288*91f16700Schasinglulu    |      1      |   2    |     0.96    |
289*91f16700Schasinglulu    +-------------+--------+-------------+
290*91f16700Schasinglulu    |      1      |   3    |     0.96    |
291*91f16700Schasinglulu    +-------------+--------+-------------+
292*91f16700Schasinglulu
293*91f16700Schasinglulu.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.10)
294*91f16700Schasinglulu
295*91f16700Schasinglulu    +-------------+--------+----------------------+
296*91f16700Schasinglulu    |   Cluster   |  Core  |       Latency        |
297*91f16700Schasinglulu    +-------------+--------+----------------------+
298*91f16700Schasinglulu    |      0      |   0    |    1.1 (-25.68%)     |
299*91f16700Schasinglulu    +-------------+--------+----------------------+
300*91f16700Schasinglulu    |      0      |   1    |         1.06         |
301*91f16700Schasinglulu    +-------------+--------+----------------------+
302*91f16700Schasinglulu    |      1      |   0    |         0.58         |
303*91f16700Schasinglulu    +-------------+--------+----------------------+
304*91f16700Schasinglulu    |      1      |   1    |         0.88         |
305*91f16700Schasinglulu    +-------------+--------+----------------------+
306*91f16700Schasinglulu    |      1      |   2    |         0.92         |
307*91f16700Schasinglulu    +-------------+--------+----------------------+
308*91f16700Schasinglulu    |      1      |   3    |         0.9          |
309*91f16700Schasinglulu    +-------------+--------+----------------------+
310*91f16700Schasinglulu
311*91f16700SchasingluluAnnotated Historic Results
312*91f16700Schasinglulu--------------------------
313*91f16700Schasinglulu
314*91f16700SchasingluluThe following results are based on the upstream `TF master as of 31/01/2017`_.
315*91f16700SchasingluluTF-A was built using the same build instructions as detailed in the procedure
316*91f16700Schasingluluabove.
317*91f16700Schasinglulu
318*91f16700SchasingluluIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
319*91f16700SchasingluluCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
320*91f16700SchasingluluCPU.
321*91f16700Schasinglulu
322*91f16700Schasinglulu``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
323*91f16700Schasinglulu``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
324*91f16700Schasinglulu
325*91f16700Schasinglulu``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
326*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
327*91f16700Schasinglulu
328*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
329*91f16700Schasinglulu| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
330*91f16700Schasinglulu+=======+=====================+====================+==========================+
331*91f16700Schasinglulu| 0     | 27                  | 20                 | 5                        |
332*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
333*91f16700Schasinglulu| 1     | 114                 | 86                 | 5                        |
334*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
335*91f16700Schasinglulu| 2     | 202                 | 58                 | 5                        |
336*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
337*91f16700Schasinglulu| 3     | 375                 | 29                 | 94                       |
338*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
339*91f16700Schasinglulu| 4     | 20                  | 22                 | 6                        |
340*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
341*91f16700Schasinglulu| 5     | 290                 | 18                 | 206                      |
342*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
343*91f16700Schasinglulu
344*91f16700SchasingluluA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
345*91f16700Schasingluluobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
346*91f16700Schasinglulufor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
347*91f16700Schasingluluthe lock before proceeding.
348*91f16700Schasinglulu
349*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
350*91f16700Schasinglululast CPUs in their respective clusters to power down, therefore both the L1 and
351*91f16700SchasingluluL2 caches are flushed.
352*91f16700Schasinglulu
353*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
354*91f16700Schasinglulubecause the L2 cache size for the big cluster is lot larger (2MB) compared to
355*91f16700Schasingluluthe little cluster (1MB).
356*91f16700Schasinglulu
357*91f16700Schasinglulu``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
358*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
359*91f16700Schasinglulu
360*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
361*91f16700Schasinglulu| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
362*91f16700Schasinglulu+=======+=====================+====================+==========================+
363*91f16700Schasinglulu| 0     | 116                 | 14                 | 8                        |
364*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
365*91f16700Schasinglulu| 1     | 204                 | 14                 | 8                        |
366*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
367*91f16700Schasinglulu| 2     | 287                 | 13                 | 8                        |
368*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
369*91f16700Schasinglulu| 3     | 376                 | 13                 | 9                        |
370*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
371*91f16700Schasinglulu| 4     | 29                  | 15                 | 7                        |
372*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
373*91f16700Schasinglulu| 5     | 21                  | 15                 | 8                        |
374*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
375*91f16700Schasinglulu
376*91f16700SchasingluluThere is no lock contention in TF generic code at power level 0 but the large
377*91f16700Schasingluluvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
378*91f16700Schasingluluplatform code. The platform lock is used to mediate access to a single SCP
379*91f16700Schasinglulucommunication channel. This is compounded by the SCP firmware waiting for each
380*91f16700SchasingluluAP CPU to enter WFI before making the channel available to other CPUs, which
381*91f16700Schasinglulueffectively serializes the SCP power down commands from all CPUs.
382*91f16700Schasinglulu
383*91f16700SchasingluluOn platforms with a more efficient CPU power down mechanism, it should be
384*91f16700Schasinglulupossible to make the ``PSCI_ENTRY`` times smaller and consistent.
385*91f16700Schasinglulu
386*91f16700SchasingluluThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
387*91f16700Schasinglulurequire locks at power level 0.
388*91f16700Schasinglulu
389*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
390*91f16700Schasingluluthe cache associated with power level 0 is flushed (L1).
391*91f16700Schasinglulu
392*91f16700Schasinglulu``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
393*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
394*91f16700Schasinglulu
395*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
396*91f16700Schasinglulu| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
397*91f16700Schasinglulu+=======+=====================+====================+==========================+
398*91f16700Schasinglulu| 0     | 114                 | 20                 | 94                       |
399*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
400*91f16700Schasinglulu| 1     | 114                 | 20                 | 94                       |
401*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
402*91f16700Schasinglulu| 2     | 114                 | 20                 | 94                       |
403*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
404*91f16700Schasinglulu| 3     | 114                 | 20                 | 94                       |
405*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
406*91f16700Schasinglulu| 4     | 195                 | 22                 | 180                      |
407*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
408*91f16700Schasinglulu| 5     | 21                  | 17                 | 6                        |
409*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
410*91f16700Schasinglulu
411*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
412*91f16700Schasingluluare large because all other CPUs in the cluster are powered down during the
413*91f16700Schasinglulutest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
414*91f16700Schasingluluflush of both L1 and L2 caches.
415*91f16700Schasinglulu
416*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
417*91f16700SchasingluluCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
418*91f16700Schasingluluto the little cluster (1MB).
419*91f16700Schasinglulu
420*91f16700SchasingluluThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
421*91f16700SchasingluluCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
422*91f16700Schasinglululevel 0, which only requires L1 cache flush.
423*91f16700Schasinglulu
424*91f16700Schasinglulu``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
425*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
426*91f16700Schasinglulu
427*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
428*91f16700Schasinglulu| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
429*91f16700Schasinglulu+=======+=====================+====================+==========================+
430*91f16700Schasinglulu| 0     | 22                  | 14                 | 5                        |
431*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
432*91f16700Schasinglulu| 1     | 22                  | 14                 | 5                        |
433*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
434*91f16700Schasinglulu| 2     | 21                  | 14                 | 5                        |
435*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
436*91f16700Schasinglulu| 3     | 22                  | 14                 | 5                        |
437*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
438*91f16700Schasinglulu| 4     | 17                  | 14                 | 6                        |
439*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
440*91f16700Schasinglulu| 5     | 18                  | 15                 | 6                        |
441*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
442*91f16700Schasinglulu
443*91f16700SchasingluluHere the times are small and consistent since there is no contention and it is
444*91f16700Schasingluluonly necessary to flush the cache to power level 0 (L1). This is the best case
445*91f16700Schasingluluscenario.
446*91f16700Schasinglulu
447*91f16700SchasingluluThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
448*91f16700Schasinglulufor the CPUs in little cluster due to greater CPU performance.
449*91f16700Schasinglulu
450*91f16700SchasingluluThe ``PSCI_EXIT`` times are generally lower than in the last test because the
451*91f16700Schasinglulucluster remains powered on throughout the test and there is less code to execute
452*91f16700Schasingluluon power on (for example, no need to enter CCI coherency)
453*91f16700Schasinglulu
454*91f16700Schasinglulu``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
455*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
456*91f16700Schasinglulu
457*91f16700SchasingluluThe test sequence here is as follows:
458*91f16700Schasinglulu
459*91f16700Schasinglulu1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
460*91f16700Schasinglulu
461*91f16700Schasinglulu2. Program wake up timer and suspend the lead CPU to the deepest power level.
462*91f16700Schasinglulu
463*91f16700Schasinglulu3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
464*91f16700Schasinglulu
465*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
466*91f16700Schasinglulu| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
467*91f16700Schasinglulu+=======+=====================+====================+==========================+
468*91f16700Schasinglulu| 0     | 110                 | 28                 | 93                       |
469*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
470*91f16700Schasinglulu| 1     | 110                 | 28                 | 93                       |
471*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
472*91f16700Schasinglulu| 2     | 110                 | 28                 | 93                       |
473*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
474*91f16700Schasinglulu| 3     | 111                 | 28                 | 93                       |
475*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
476*91f16700Schasinglulu| 4     | 195                 | 22                 | 181                      |
477*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
478*91f16700Schasinglulu| 5     | 20                  | 23                 | 6                        |
479*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+
480*91f16700Schasinglulu
481*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
482*91f16700SchasingluluCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
483*91f16700Schasinglulupowers down to the cluster level, requiring a flush of both L1 and L2 caches.
484*91f16700Schasinglulu
485*91f16700SchasingluluThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
486*91f16700Schasinglululead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
487*91f16700Schasingluluan L1 cache flush.
488*91f16700Schasinglulu
489*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
490*91f16700SchasingluluCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
491*91f16700Schasingluluto the little cluster (1MB).
492*91f16700Schasinglulu
493*91f16700SchasingluluThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
494*91f16700Schasinglulufor CPUs in the little cluster due to greater CPU performance.  These times
495*91f16700Schasinglulugenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
496*91f16700Schasinglulubecause there is more code to execute in the "on finisher" compared to the
497*91f16700Schasinglulu"suspend finisher" (for example, GIC redistributor register programming).
498*91f16700Schasinglulu
499*91f16700Schasinglulu``PSCI_VERSION`` on all CPUs in parallel
500*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
501*91f16700Schasinglulu
502*91f16700SchasingluluSince very little code is associated with ``PSCI_VERSION``, this test
503*91f16700Schasingluluapproximates the round trip latency for handling a fast SMC at EL3 in TF.
504*91f16700Schasinglulu
505*91f16700Schasinglulu+-------+-------------------+
506*91f16700Schasinglulu| CPU   | TOTAL TIME (ns)   |
507*91f16700Schasinglulu+=======+===================+
508*91f16700Schasinglulu| 0     | 3020              |
509*91f16700Schasinglulu+-------+-------------------+
510*91f16700Schasinglulu| 1     | 2940              |
511*91f16700Schasinglulu+-------+-------------------+
512*91f16700Schasinglulu| 2     | 2980              |
513*91f16700Schasinglulu+-------+-------------------+
514*91f16700Schasinglulu| 3     | 3060              |
515*91f16700Schasinglulu+-------+-------------------+
516*91f16700Schasinglulu| 4     | 520               |
517*91f16700Schasinglulu+-------+-------------------+
518*91f16700Schasinglulu| 5     | 720               |
519*91f16700Schasinglulu+-------+-------------------+
520*91f16700Schasinglulu
521*91f16700SchasingluluThe times for the big CPUs are less than the little CPUs due to greater CPU
522*91f16700Schasingluluperformance.
523*91f16700Schasinglulu
524*91f16700SchasingluluWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
525*91f16700Schasinglulueffects, given that these measurements are at the nano-second level.
526*91f16700Schasinglulu
527*91f16700Schasinglulu--------------
528*91f16700Schasinglulu
529*91f16700Schasinglulu*Copyright (c) 2019-2023, Arm Limited and Contributors. All rights reserved.*
530*91f16700Schasinglulu
531*91f16700Schasinglulu.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
532*91f16700Schasinglulu.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
533*91f16700Schasinglulu.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0
534