1*91f16700SchasingluluPSCI Performance Measurements on Arm Juno Development Platform 2*91f16700Schasinglulu============================================================== 3*91f16700Schasinglulu 4*91f16700SchasingluluThis document summarises the findings of performance measurements of key 5*91f16700Schasingluluoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI) 6*91f16700Schasingluluimplementation, using the in-built Performance Measurement Framework (PMF) and 7*91f16700Schasingluluruntime instrumentation timestamps. 8*91f16700Schasinglulu 9*91f16700SchasingluluMethod 10*91f16700Schasinglulu------ 11*91f16700Schasinglulu 12*91f16700SchasingluluWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2 13*91f16700Schasinglulux Cortex-A57 clusters running at the following frequencies: 14*91f16700Schasinglulu 15*91f16700Schasinglulu+-----------------+--------------------+ 16*91f16700Schasinglulu| Domain | Frequency (MHz) | 17*91f16700Schasinglulu+=================+====================+ 18*91f16700Schasinglulu| Cortex-A57 | 900 (nominal) | 19*91f16700Schasinglulu+-----------------+--------------------+ 20*91f16700Schasinglulu| Cortex-A53 | 650 (underdrive) | 21*91f16700Schasinglulu+-----------------+--------------------+ 22*91f16700Schasinglulu| AXI subsystem | 533 | 23*91f16700Schasinglulu+-----------------+--------------------+ 24*91f16700Schasinglulu 25*91f16700SchasingluluJuno supports CPU, cluster and system power down states, corresponding to power 26*91f16700Schasinglululevels 0, 1 and 2 respectively. It does not support any retention states. 27*91f16700Schasinglulu 28*91f16700SchasingluluGiven that runtime instrumentation using PMF is invasive, there is a small 29*91f16700Schasinglulu(unquantified) overhead on the results. PMF uses the generic counter for 30*91f16700Schasinglulutimestamps, which runs at 50MHz on Juno. 31*91f16700Schasinglulu 32*91f16700SchasingluluThe following source trees and binaries were used: 33*91f16700Schasinglulu 34*91f16700Schasinglulu- TF-A [`v2.9-rc0`_] 35*91f16700Schasinglulu- TFTF [`v2.9-rc0`_] 36*91f16700Schasinglulu 37*91f16700SchasingluluPlease see the Runtime Instrumentation :ref:`Testing Methodology 38*91f16700Schasinglulu<Runtime Instrumentation Methodology>` 39*91f16700Schasinglulupage for more details. 40*91f16700Schasinglulu 41*91f16700SchasingluluProcedure 42*91f16700Schasinglulu--------- 43*91f16700Schasinglulu 44*91f16700Schasinglulu#. Build TFTF with runtime instrumentation enabled: 45*91f16700Schasinglulu 46*91f16700Schasinglulu .. code:: shell 47*91f16700Schasinglulu 48*91f16700Schasinglulu make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \ 49*91f16700Schasinglulu TESTS=runtime-instrumentation all 50*91f16700Schasinglulu 51*91f16700Schasinglulu#. Fetch Juno's SCP binary from TF-A's archive: 52*91f16700Schasinglulu 53*91f16700Schasinglulu .. code:: shell 54*91f16700Schasinglulu 55*91f16700Schasinglulu curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \ 56*91f16700Schasinglulu https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin 57*91f16700Schasinglulu 58*91f16700Schasinglulu#. Build TF-A with the following build options: 59*91f16700Schasinglulu 60*91f16700Schasinglulu .. code:: shell 61*91f16700Schasinglulu 62*91f16700Schasinglulu make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \ 63*91f16700Schasinglulu BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \ 64*91f16700Schasinglulu ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip 65*91f16700Schasinglulu 66*91f16700Schasinglulu#. Load the following images onto the development board: ``fip.bin``, 67*91f16700Schasinglulu ``scp_bl2.bin``. 68*91f16700Schasinglulu 69*91f16700SchasingluluResults 70*91f16700Schasinglulu------- 71*91f16700Schasinglulu 72*91f16700Schasinglulu``CPU_SUSPEND`` to deepest power level 73*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 74*91f16700Schasinglulu 75*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 76*91f16700Schasinglulu parallel (v2.9) 77*91f16700Schasinglulu 78*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 79*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 80*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 81*91f16700Schasinglulu | 0 | 0 | 104.58 | 241.20 | 5.26 | 82*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 83*91f16700Schasinglulu | 0 | 1 | 384.24 | 22.50 | 138.76 | 84*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 85*91f16700Schasinglulu | 1 | 0 | 244.56 | 22.18 | 5.16 | 86*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 87*91f16700Schasinglulu | 1 | 1 | 670.56 | 18.58 | 4.44 | 88*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 89*91f16700Schasinglulu | 1 | 2 | 809.36 | 269.28 | 4.44 | 90*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 91*91f16700Schasinglulu | 1 | 3 | 984.96 | 219.70 | 79.62 | 92*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 93*91f16700Schasinglulu 94*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 95*91f16700Schasinglulu parallel (v2.10) 96*91f16700Schasinglulu 97*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 98*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 99*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 100*91f16700Schasinglulu | 0 | 0 | 242.66 (+132.03%) | 245.1 | 5.4 | 101*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 102*91f16700Schasinglulu | 0 | 1 | 522.08 (+35.87%) | 26.24 | 138.32 | 103*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 104*91f16700Schasinglulu | 1 | 0 | 104.36 (-57.33%) | 27.1 | 5.32 | 105*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 106*91f16700Schasinglulu | 1 | 1 | 382.56 (-42.95%) | 23.34 | 4.42 | 107*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 108*91f16700Schasinglulu | 1 | 2 | 807.74 | 271.54 | 4.64 | 109*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 110*91f16700Schasinglulu | 1 | 3 | 981.36 | 221.8 | 79.48 | 111*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 112*91f16700Schasinglulu 113*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 114*91f16700Schasinglulu serial (v2.9) 115*91f16700Schasinglulu 116*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 117*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 118*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 119*91f16700Schasinglulu | 0 | 0 | 236.56 | 23.24 | 138.18 | 120*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 121*91f16700Schasinglulu | 0 | 1 | 236.86 | 23.28 | 138.10 | 122*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 123*91f16700Schasinglulu | 1 | 0 | 281.04 | 22.80 | 77.24 | 124*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 125*91f16700Schasinglulu | 1 | 1 | 100.28 | 18.52 | 4.54 | 126*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 127*91f16700Schasinglulu | 1 | 2 | 100.12 | 18.78 | 4.50 | 128*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 129*91f16700Schasinglulu | 1 | 3 | 100.36 | 18.94 | 4.44 | 130*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 131*91f16700Schasinglulu 132*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 133*91f16700Schasinglulu serial (v2.10) 134*91f16700Schasinglulu 135*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 136*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 137*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 138*91f16700Schasinglulu | 0 | 0 | 236.84 | 27.1 | 138.36 | 139*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 140*91f16700Schasinglulu | 0 | 1 | 236.96 | 27.1 | 138.32 | 141*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 142*91f16700Schasinglulu | 1 | 0 | 280.06 | 26.94 | 77.5 | 143*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 144*91f16700Schasinglulu | 1 | 1 | 100.76 | 23.42 | 4.36 | 145*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 146*91f16700Schasinglulu | 1 | 2 | 100.02 | 23.42 | 4.44 | 147*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 148*91f16700Schasinglulu | 1 | 3 | 100.08 | 23.2 | 4.4 | 149*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 150*91f16700Schasinglulu 151*91f16700Schasinglulu``CPU_SUSPEND`` to power level 0 152*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 153*91f16700Schasinglulu 154*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 155*91f16700Schasinglulu parallel (v2.9) 156*91f16700Schasinglulu 157*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 158*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 159*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 160*91f16700Schasinglulu | 0 | 0 | 662.34 | 15.22 | 8.08 | 161*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 162*91f16700Schasinglulu | 0 | 1 | 802.00 | 15.50 | 8.16 | 163*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 164*91f16700Schasinglulu | 1 | 0 | 385.22 | 15.74 | 7.88 | 165*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 166*91f16700Schasinglulu | 1 | 1 | 106.16 | 16.06 | 7.44 | 167*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 168*91f16700Schasinglulu | 1 | 2 | 524.38 | 15.64 | 7.34 | 169*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 170*91f16700Schasinglulu | 1 | 3 | 246.00 | 15.78 | 7.72 | 171*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 172*91f16700Schasinglulu 173*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 174*91f16700Schasinglulu parallel (v2.10) 175*91f16700Schasinglulu 176*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 177*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 178*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 179*91f16700Schasinglulu | 0 | 0 | 801.04 | 18.66 | 8.22 | 180*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 181*91f16700Schasinglulu | 0 | 1 | 661.28 | 19.08 | 7.88 | 182*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 183*91f16700Schasinglulu | 1 | 0 | 105.9 (-72.51%) | 20.3 | 7.58 | 184*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 185*91f16700Schasinglulu | 1 | 1 | 383.58 (+261.32%) | 20.4 | 7.42 | 186*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 187*91f16700Schasinglulu | 1 | 2 | 523.52 | 20.1 | 7.74 | 188*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 189*91f16700Schasinglulu | 1 | 3 | 244.5 | 20.16 | 7.56 | 190*91f16700Schasinglulu +---------+------+-------------------+--------+-------------+ 191*91f16700Schasinglulu 192*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.9) 193*91f16700Schasinglulu 194*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 195*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 196*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 197*91f16700Schasinglulu | 0 | 0 | 99.80 | 15.94 | 5.42 | 198*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 199*91f16700Schasinglulu | 0 | 1 | 99.76 | 15.80 | 5.24 | 200*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 201*91f16700Schasinglulu | 1 | 0 | 278.26 | 16.16 | 4.58 | 202*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 203*91f16700Schasinglulu | 1 | 1 | 96.88 | 16.00 | 4.52 | 204*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 205*91f16700Schasinglulu | 1 | 2 | 96.80 | 16.12 | 4.54 | 206*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 207*91f16700Schasinglulu | 1 | 3 | 96.88 | 16.12 | 4.54 | 208*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 209*91f16700Schasinglulu 210*91f16700Schasinglulu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.10) 211*91f16700Schasinglulu 212*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 213*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 214*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 215*91f16700Schasinglulu | 0 | 0 | 99.84 | 18.86 | 5.54 | 216*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 217*91f16700Schasinglulu | 0 | 1 | 100.2 | 18.82 | 5.66 | 218*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 219*91f16700Schasinglulu | 1 | 0 | 278.12 | 20.56 | 4.48 | 220*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 221*91f16700Schasinglulu | 1 | 1 | 96.68 | 20.62 | 4.3 | 222*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 223*91f16700Schasinglulu | 1 | 2 | 96.94 | 20.14 | 4.42 | 224*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 225*91f16700Schasinglulu | 1 | 3 | 96.68 | 20.46 | 4.32 | 226*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 227*91f16700Schasinglulu 228*91f16700Schasinglulu``CPU_OFF`` on all non-lead CPUs 229*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 230*91f16700Schasinglulu 231*91f16700Schasinglulu``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead 232*91f16700Schasinglulucore to the deepest power level. 233*91f16700Schasinglulu 234*91f16700Schasinglulu.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.9) 235*91f16700Schasinglulu 236*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 237*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 238*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 239*91f16700Schasinglulu | 0 | 0 | 235.76 | 26.14 | 137.80 | 240*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 241*91f16700Schasinglulu | 0 | 1 | 235.40 | 25.72 | 137.62 | 242*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 243*91f16700Schasinglulu | 1 | 0 | 174.70 | 22.40 | 77.26 | 244*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 245*91f16700Schasinglulu | 1 | 1 | 100.92 | 24.04 | 4.52 | 246*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 247*91f16700Schasinglulu | 1 | 2 | 100.68 | 22.44 | 4.36 | 248*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 249*91f16700Schasinglulu | 1 | 3 | 101.36 | 22.70 | 4.52 | 250*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 251*91f16700Schasinglulu 252*91f16700Schasinglulu.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.10) 253*91f16700Schasinglulu 254*91f16700Schasinglulu +---------------------------------------------------+ 255*91f16700Schasinglulu | test_rt_instr_cpu_off_serial (latest) | 256*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 257*91f16700Schasinglulu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 258*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 259*91f16700Schasinglulu | 0 | 0 | 236.04 | 30.02 | 137.9 | 260*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 261*91f16700Schasinglulu | 0 | 1 | 235.38 | 29.7 | 137.72 | 262*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 263*91f16700Schasinglulu | 1 | 0 | 175.18 | 26.96 | 77.26 | 264*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 265*91f16700Schasinglulu | 1 | 1 | 100.56 | 28.34 | 4.32 | 266*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 267*91f16700Schasinglulu | 1 | 2 | 100.38 | 26.82 | 4.3 | 268*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 269*91f16700Schasinglulu | 1 | 3 | 100.86 | 26.98 | 4.42 | 270*91f16700Schasinglulu +---------+------+-----------+--------+-------------+ 271*91f16700Schasinglulu 272*91f16700Schasinglulu``CPU_VERSION`` in parallel 273*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~ 274*91f16700Schasinglulu 275*91f16700Schasinglulu.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.9) 276*91f16700Schasinglulu 277*91f16700Schasinglulu +-------------+--------+-------------+ 278*91f16700Schasinglulu | Cluster | Core | Latency | 279*91f16700Schasinglulu +-------------+--------+-------------+ 280*91f16700Schasinglulu | 0 | 0 | 1.48 | 281*91f16700Schasinglulu +-------------+--------+-------------+ 282*91f16700Schasinglulu | 0 | 1 | 1.04 | 283*91f16700Schasinglulu +-------------+--------+-------------+ 284*91f16700Schasinglulu | 1 | 0 | 0.56 | 285*91f16700Schasinglulu +-------------+--------+-------------+ 286*91f16700Schasinglulu | 1 | 1 | 0.92 | 287*91f16700Schasinglulu +-------------+--------+-------------+ 288*91f16700Schasinglulu | 1 | 2 | 0.96 | 289*91f16700Schasinglulu +-------------+--------+-------------+ 290*91f16700Schasinglulu | 1 | 3 | 0.96 | 291*91f16700Schasinglulu +-------------+--------+-------------+ 292*91f16700Schasinglulu 293*91f16700Schasinglulu.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.10) 294*91f16700Schasinglulu 295*91f16700Schasinglulu +-------------+--------+----------------------+ 296*91f16700Schasinglulu | Cluster | Core | Latency | 297*91f16700Schasinglulu +-------------+--------+----------------------+ 298*91f16700Schasinglulu | 0 | 0 | 1.1 (-25.68%) | 299*91f16700Schasinglulu +-------------+--------+----------------------+ 300*91f16700Schasinglulu | 0 | 1 | 1.06 | 301*91f16700Schasinglulu +-------------+--------+----------------------+ 302*91f16700Schasinglulu | 1 | 0 | 0.58 | 303*91f16700Schasinglulu +-------------+--------+----------------------+ 304*91f16700Schasinglulu | 1 | 1 | 0.88 | 305*91f16700Schasinglulu +-------------+--------+----------------------+ 306*91f16700Schasinglulu | 1 | 2 | 0.92 | 307*91f16700Schasinglulu +-------------+--------+----------------------+ 308*91f16700Schasinglulu | 1 | 3 | 0.9 | 309*91f16700Schasinglulu +-------------+--------+----------------------+ 310*91f16700Schasinglulu 311*91f16700SchasingluluAnnotated Historic Results 312*91f16700Schasinglulu-------------------------- 313*91f16700Schasinglulu 314*91f16700SchasingluluThe following results are based on the upstream `TF master as of 31/01/2017`_. 315*91f16700SchasingluluTF-A was built using the same build instructions as detailed in the procedure 316*91f16700Schasingluluabove. 317*91f16700Schasinglulu 318*91f16700SchasingluluIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and 319*91f16700SchasingluluCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead 320*91f16700SchasingluluCPU. 321*91f16700Schasinglulu 322*91f16700Schasinglulu``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and 323*91f16700Schasinglulu``CFLUSH_OVERHEAD`` the latency of the cache flush operation. 324*91f16700Schasinglulu 325*91f16700Schasinglulu``CPU_SUSPEND`` to deepest power level on all CPUs in parallel 326*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 327*91f16700Schasinglulu 328*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 329*91f16700Schasinglulu| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 330*91f16700Schasinglulu+=======+=====================+====================+==========================+ 331*91f16700Schasinglulu| 0 | 27 | 20 | 5 | 332*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 333*91f16700Schasinglulu| 1 | 114 | 86 | 5 | 334*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 335*91f16700Schasinglulu| 2 | 202 | 58 | 5 | 336*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 337*91f16700Schasinglulu| 3 | 375 | 29 | 94 | 338*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 339*91f16700Schasinglulu| 4 | 20 | 22 | 6 | 340*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 341*91f16700Schasinglulu| 5 | 290 | 18 | 206 | 342*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 343*91f16700Schasinglulu 344*91f16700SchasingluluA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is 345*91f16700Schasingluluobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait 346*91f16700Schasinglulufor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release 347*91f16700Schasingluluthe lock before proceeding. 348*91f16700Schasinglulu 349*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the 350*91f16700Schasinglululast CPUs in their respective clusters to power down, therefore both the L1 and 351*91f16700SchasingluluL2 caches are flushed. 352*91f16700Schasinglulu 353*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3 354*91f16700Schasinglulubecause the L2 cache size for the big cluster is lot larger (2MB) compared to 355*91f16700Schasingluluthe little cluster (1MB). 356*91f16700Schasinglulu 357*91f16700Schasinglulu``CPU_SUSPEND`` to power level 0 on all CPUs in parallel 358*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 359*91f16700Schasinglulu 360*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 361*91f16700Schasinglulu| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 362*91f16700Schasinglulu+=======+=====================+====================+==========================+ 363*91f16700Schasinglulu| 0 | 116 | 14 | 8 | 364*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 365*91f16700Schasinglulu| 1 | 204 | 14 | 8 | 366*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 367*91f16700Schasinglulu| 2 | 287 | 13 | 8 | 368*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 369*91f16700Schasinglulu| 3 | 376 | 13 | 9 | 370*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 371*91f16700Schasinglulu| 4 | 29 | 15 | 7 | 372*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 373*91f16700Schasinglulu| 5 | 21 | 15 | 8 | 374*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 375*91f16700Schasinglulu 376*91f16700SchasingluluThere is no lock contention in TF generic code at power level 0 but the large 377*91f16700Schasingluluvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno 378*91f16700Schasingluluplatform code. The platform lock is used to mediate access to a single SCP 379*91f16700Schasinglulucommunication channel. This is compounded by the SCP firmware waiting for each 380*91f16700SchasingluluAP CPU to enter WFI before making the channel available to other CPUs, which 381*91f16700Schasinglulueffectively serializes the SCP power down commands from all CPUs. 382*91f16700Schasinglulu 383*91f16700SchasingluluOn platforms with a more efficient CPU power down mechanism, it should be 384*91f16700Schasinglulupossible to make the ``PSCI_ENTRY`` times smaller and consistent. 385*91f16700Schasinglulu 386*91f16700SchasingluluThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not 387*91f16700Schasinglulurequire locks at power level 0. 388*91f16700Schasinglulu 389*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only 390*91f16700Schasingluluthe cache associated with power level 0 is flushed (L1). 391*91f16700Schasinglulu 392*91f16700Schasinglulu``CPU_SUSPEND`` to deepest power level on all CPUs in sequence 393*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 394*91f16700Schasinglulu 395*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 396*91f16700Schasinglulu| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 397*91f16700Schasinglulu+=======+=====================+====================+==========================+ 398*91f16700Schasinglulu| 0 | 114 | 20 | 94 | 399*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 400*91f16700Schasinglulu| 1 | 114 | 20 | 94 | 401*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 402*91f16700Schasinglulu| 2 | 114 | 20 | 94 | 403*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 404*91f16700Schasinglulu| 3 | 114 | 20 | 94 | 405*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 406*91f16700Schasinglulu| 4 | 195 | 22 | 180 | 407*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 408*91f16700Schasinglulu| 5 | 21 | 17 | 6 | 409*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 410*91f16700Schasinglulu 411*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster 412*91f16700Schasingluluare large because all other CPUs in the cluster are powered down during the 413*91f16700Schasinglulutest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a 414*91f16700Schasingluluflush of both L1 and L2 caches. 415*91f16700Schasinglulu 416*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 417*91f16700SchasingluluCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 418*91f16700Schasingluluto the little cluster (1MB). 419*91f16700Schasinglulu 420*91f16700SchasingluluThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead 421*91f16700SchasingluluCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to 422*91f16700Schasinglululevel 0, which only requires L1 cache flush. 423*91f16700Schasinglulu 424*91f16700Schasinglulu``CPU_SUSPEND`` to power level 0 on all CPUs in sequence 425*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 426*91f16700Schasinglulu 427*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 428*91f16700Schasinglulu| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 429*91f16700Schasinglulu+=======+=====================+====================+==========================+ 430*91f16700Schasinglulu| 0 | 22 | 14 | 5 | 431*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 432*91f16700Schasinglulu| 1 | 22 | 14 | 5 | 433*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 434*91f16700Schasinglulu| 2 | 21 | 14 | 5 | 435*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 436*91f16700Schasinglulu| 3 | 22 | 14 | 5 | 437*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 438*91f16700Schasinglulu| 4 | 17 | 14 | 6 | 439*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 440*91f16700Schasinglulu| 5 | 18 | 15 | 6 | 441*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 442*91f16700Schasinglulu 443*91f16700SchasingluluHere the times are small and consistent since there is no contention and it is 444*91f16700Schasingluluonly necessary to flush the cache to power level 0 (L1). This is the best case 445*91f16700Schasingluluscenario. 446*91f16700Schasinglulu 447*91f16700SchasingluluThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than 448*91f16700Schasinglulufor the CPUs in little cluster due to greater CPU performance. 449*91f16700Schasinglulu 450*91f16700SchasingluluThe ``PSCI_EXIT`` times are generally lower than in the last test because the 451*91f16700Schasinglulucluster remains powered on throughout the test and there is less code to execute 452*91f16700Schasingluluon power on (for example, no need to enter CCI coherency) 453*91f16700Schasinglulu 454*91f16700Schasinglulu``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level 455*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 456*91f16700Schasinglulu 457*91f16700SchasingluluThe test sequence here is as follows: 458*91f16700Schasinglulu 459*91f16700Schasinglulu1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence. 460*91f16700Schasinglulu 461*91f16700Schasinglulu2. Program wake up timer and suspend the lead CPU to the deepest power level. 462*91f16700Schasinglulu 463*91f16700Schasinglulu3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU. 464*91f16700Schasinglulu 465*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 466*91f16700Schasinglulu| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 467*91f16700Schasinglulu+=======+=====================+====================+==========================+ 468*91f16700Schasinglulu| 0 | 110 | 28 | 93 | 469*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 470*91f16700Schasinglulu| 1 | 110 | 28 | 93 | 471*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 472*91f16700Schasinglulu| 2 | 110 | 28 | 93 | 473*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 474*91f16700Schasinglulu| 3 | 111 | 28 | 93 | 475*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 476*91f16700Schasinglulu| 4 | 195 | 22 | 181 | 477*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 478*91f16700Schasinglulu| 5 | 20 | 23 | 6 | 479*91f16700Schasinglulu+-------+---------------------+--------------------+--------------------------+ 480*91f16700Schasinglulu 481*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other 482*91f16700SchasingluluCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call 483*91f16700Schasinglulupowers down to the cluster level, requiring a flush of both L1 and L2 caches. 484*91f16700Schasinglulu 485*91f16700SchasingluluThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because 486*91f16700Schasinglululead CPU 4 is running and CPU 5 only powers down to level 0, which only requires 487*91f16700Schasingluluan L1 cache flush. 488*91f16700Schasinglulu 489*91f16700SchasingluluThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 490*91f16700SchasingluluCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 491*91f16700Schasingluluto the little cluster (1MB). 492*91f16700Schasinglulu 493*91f16700SchasingluluThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than 494*91f16700Schasinglulufor CPUs in the little cluster due to greater CPU performance. These times 495*91f16700Schasinglulugenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests 496*91f16700Schasinglulubecause there is more code to execute in the "on finisher" compared to the 497*91f16700Schasinglulu"suspend finisher" (for example, GIC redistributor register programming). 498*91f16700Schasinglulu 499*91f16700Schasinglulu``PSCI_VERSION`` on all CPUs in parallel 500*91f16700Schasinglulu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 501*91f16700Schasinglulu 502*91f16700SchasingluluSince very little code is associated with ``PSCI_VERSION``, this test 503*91f16700Schasingluluapproximates the round trip latency for handling a fast SMC at EL3 in TF. 504*91f16700Schasinglulu 505*91f16700Schasinglulu+-------+-------------------+ 506*91f16700Schasinglulu| CPU | TOTAL TIME (ns) | 507*91f16700Schasinglulu+=======+===================+ 508*91f16700Schasinglulu| 0 | 3020 | 509*91f16700Schasinglulu+-------+-------------------+ 510*91f16700Schasinglulu| 1 | 2940 | 511*91f16700Schasinglulu+-------+-------------------+ 512*91f16700Schasinglulu| 2 | 2980 | 513*91f16700Schasinglulu+-------+-------------------+ 514*91f16700Schasinglulu| 3 | 3060 | 515*91f16700Schasinglulu+-------+-------------------+ 516*91f16700Schasinglulu| 4 | 520 | 517*91f16700Schasinglulu+-------+-------------------+ 518*91f16700Schasinglulu| 5 | 720 | 519*91f16700Schasinglulu+-------+-------------------+ 520*91f16700Schasinglulu 521*91f16700SchasingluluThe times for the big CPUs are less than the little CPUs due to greater CPU 522*91f16700Schasingluluperformance. 523*91f16700Schasinglulu 524*91f16700SchasingluluWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache 525*91f16700Schasinglulueffects, given that these measurements are at the nano-second level. 526*91f16700Schasinglulu 527*91f16700Schasinglulu-------------- 528*91f16700Schasinglulu 529*91f16700Schasinglulu*Copyright (c) 2019-2023, Arm Limited and Contributors. All rights reserved.* 530*91f16700Schasinglulu 531*91f16700Schasinglulu.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/ 532*91f16700Schasinglulu.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d 533*91f16700Schasinglulu.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0 534