xref: /arm-trusted-firmware/docs/plat/nvidia-tegra.rst (revision 91f16700b400a8c0651d24a598fc48ee2997a0d7)
1*91f16700SchasingluluNVIDIA Tegra
2*91f16700Schasinglulu============
3*91f16700Schasinglulu
4*91f16700Schasinglulu-  .. rubric:: T194
5*91f16700Schasinglulu      :name: t194
6*91f16700Schasinglulu
7*91f16700SchasingluluT194 has eight NVIDIA Carmel CPU cores in a coherent multi-processor
8*91f16700Schasingluluconfiguration. The Carmel cores support the ARM Architecture version 8.2,
9*91f16700Schasingluluexecuting both 64-bit AArch64 code, and 32-bit AArch32 code. The Carmel
10*91f16700Schasingluluprocessors are organized as four dual-core clusters, where each cluster has
11*91f16700Schasinglulua dedicated 2 MiB Level-2 unified cache. A high speed coherency fabric connects
12*91f16700Schasingluluthese processor complexes and allows heterogeneous multi-processing with all
13*91f16700Schasinglulueight cores if required.
14*91f16700Schasinglulu
15*91f16700Schasinglulu-  .. rubric:: T186
16*91f16700Schasinglulu      :name: t186
17*91f16700Schasinglulu
18*91f16700SchasingluluThe NVIDIA® Parker (T186) series system-on-chip (SoC) delivers a heterogeneous
19*91f16700Schasinglulumulti-processing (HMP) solution designed to optimize performance and
20*91f16700Schasingluluefficiency.
21*91f16700Schasinglulu
22*91f16700SchasingluluT186 has Dual NVIDIA Denver2 ARM® CPU cores, plus Quad ARM Cortex®-A57 cores,
23*91f16700Schasingluluin a coherent multiprocessor configuration. The Denver 2 and Cortex-A57 cores
24*91f16700Schasinglulusupport ARMv8, executing both 64-bit Aarch64 code, and 32-bit Aarch32 code
25*91f16700Schasingluluincluding legacy ARMv7 applications. The Denver 2 processors each have 128 KB
26*91f16700SchasingluluInstruction and 64 KB Data Level 1 caches; and have a 2MB shared Level 2
27*91f16700Schasingluluunified cache. The Cortex-A57 processors each have 48 KB Instruction and 32 KB
28*91f16700SchasingluluData Level 1 caches; and also have a 2 MB shared Level 2 unified cache. A
29*91f16700Schasingluluhigh speed coherency fabric connects these two processor complexes and allows
30*91f16700Schasingluluheterogeneous multi-processing with all six cores if required.
31*91f16700Schasinglulu
32*91f16700SchasingluluDenver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
33*91f16700Schasinglulufully Armv8-A architecture compatible. Each of the two Denver cores
34*91f16700Schasingluluimplements a 7-way superscalar microarchitecture (up to 7 concurrent
35*91f16700Schasinglulumicro-ops can be executed per clock), and includes a 128KB 4-way L1
36*91f16700Schasingluluinstruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
37*91f16700Schasinglulucache, which services both cores.
38*91f16700Schasinglulu
39*91f16700SchasingluluDenver implements an innovative process called Dynamic Code Optimization,
40*91f16700Schasingluluwhich optimizes frequently used software routines at runtime into dense,
41*91f16700Schasingluluhighly tuned microcode-equivalent routines. These are stored in a
42*91f16700Schasingluludedicated, 128MB main-memory-based optimization cache. After being read
43*91f16700Schasingluluinto the instruction cache, the optimized micro-ops are executed,
44*91f16700Schasinglulure-fetched and executed from the instruction cache as long as needed and
45*91f16700Schasinglulucapacity allows.
46*91f16700Schasinglulu
47*91f16700SchasingluluEffectively, this reduces the need to re-optimize the software routines.
48*91f16700SchasingluluInstead of using hardware to extract the instruction-level parallelism
49*91f16700Schasinglulu(ILP) inherent in the code, Denver extracts the ILP once via software
50*91f16700Schasinglulutechniques, and then executes those routines repeatedly, thus amortizing
51*91f16700Schasingluluthe cost of ILP extraction over the many execution instances.
52*91f16700Schasinglulu
53*91f16700SchasingluluDenver also features new low latency power-state transitions, in addition
54*91f16700Schasingluluto extensive power-gating and dynamic voltage and clock scaling based on
55*91f16700Schasingluluworkloads.
56*91f16700Schasinglulu
57*91f16700Schasinglulu-  .. rubric:: T210
58*91f16700Schasinglulu      :name: t210
59*91f16700Schasinglulu
60*91f16700SchasingluluT210 has Quad Arm® Cortex®-A57 cores in a switched configuration with a
61*91f16700Schasinglulucompanion set of quad Arm Cortex-A53 cores. The Cortex-A57 and A53 cores
62*91f16700Schasinglulusupport Armv8-A, executing both 64-bit Aarch64 code, and 32-bit Aarch32 code
63*91f16700Schasingluluincluding legacy Armv7-A applications. The Cortex-A57 processors each have
64*91f16700Schasinglulu48 KB Instruction and 32 KB Data Level 1 caches; and have a 2 MB shared
65*91f16700SchasingluluLevel 2 unified cache. The Cortex-A53 processors each have 32 KB Instruction
66*91f16700Schasingluluand 32 KB Data Level 1 caches; and have a 512 KB shared Level 2 unified cache.
67*91f16700Schasinglulu
68*91f16700SchasingluluDirectory structure
69*91f16700Schasinglulu-------------------
70*91f16700Schasinglulu
71*91f16700Schasinglulu-  plat/nvidia/tegra/common - Common code for all Tegra SoCs
72*91f16700Schasinglulu-  plat/nvidia/tegra/soc/txxx - Chip specific code
73*91f16700Schasinglulu
74*91f16700SchasingluluTrusted OS dispatcher
75*91f16700Schasinglulu---------------------
76*91f16700Schasinglulu
77*91f16700SchasingluluTegra supports multiple Trusted OS'.
78*91f16700Schasinglulu
79*91f16700Schasinglulu- Trusted Little Kernel (TLK): In order to include the 'tlkd' dispatcher in
80*91f16700Schasinglulu  the image, pass 'SPD=tlkd' on the command line while preparing a bl31 image.
81*91f16700Schasinglulu- Trusty: In order to include the 'trusty' dispatcher in the image, pass
82*91f16700Schasinglulu  'SPD=trusty' on the command line while preparing a bl31 image.
83*91f16700Schasinglulu
84*91f16700SchasingluluThis allows other Trusted OS vendors to use the upstream code and include
85*91f16700Schasinglulutheir dispatchers in the image without changing any makefiles.
86*91f16700Schasinglulu
87*91f16700SchasingluluThese are the supported Trusted OS' by Tegra platforms.
88*91f16700Schasinglulu
89*91f16700Schasinglulu- Tegra210: TLK and Trusty
90*91f16700Schasinglulu- Tegra186: Trusty
91*91f16700Schasinglulu- Tegra194: Trusty
92*91f16700Schasinglulu
93*91f16700SchasingluluScatter files
94*91f16700Schasinglulu-------------
95*91f16700Schasinglulu
96*91f16700SchasingluluTegra platforms currently support scatter files and ld.S scripts. The scatter
97*91f16700Schasinglulufiles help support ARMLINK linker to generate BL31 binaries. For now, there
98*91f16700Schasingluluexists a common scatter file, plat/nvidia/tegra/scat/bl31.scat, for all Tegra
99*91f16700SchasingluluSoCs. The `LINKER` build variable needs to point to the ARMLINK binary for
100*91f16700Schasingluluthe scatter file to be used. Tegra platforms have verified BL31 image generation
101*91f16700Schasingluluwith ARMCLANG (compilation) and ARMLINK (linking) for the Tegra186 platforms.
102*91f16700Schasinglulu
103*91f16700SchasingluluPreparing the BL31 image to run on Tegra SoCs
104*91f16700Schasinglulu---------------------------------------------
105*91f16700Schasinglulu
106*91f16700Schasinglulu.. code:: shell
107*91f16700Schasinglulu
108*91f16700Schasinglulu    CROSS_COMPILE=<path-to-aarch64-gcc>/bin/aarch64-none-elf- make PLAT=tegra \
109*91f16700Schasinglulu    TARGET_SOC=<target-soc e.g. t194|t186|t210> SPD=<dispatcher e.g. trusty|tlkd>
110*91f16700Schasinglulu    bl31
111*91f16700Schasinglulu
112*91f16700SchasingluluPlatforms wanting to use different TZDRAM\_BASE, can add ``TZDRAM_BASE=<value>``
113*91f16700Schasingluluto the build command line.
114*91f16700Schasinglulu
115*91f16700SchasingluluThe Tegra platform code expects a pointer to the following platform specific
116*91f16700Schasinglulustructure via 'x1' register from the BL2 layer which is used by the
117*91f16700Schasinglulubl31\_early\_platform\_setup() handler to extract the TZDRAM carveout base and
118*91f16700Schasinglulusize for loading the Trusted OS and the UART port ID to be used. The Tegra
119*91f16700Schasinglulumemory controller driver programs this base/size in order to restrict NS
120*91f16700Schasingluluaccesses.
121*91f16700Schasinglulu
122*91f16700Schasinglulutypedef struct plat\_params\_from\_bl2 {
123*91f16700Schasinglulu/\* TZ memory size */
124*91f16700Schasingluluuint64\_t tzdram\_size;
125*91f16700Schasinglulu/* TZ memory base */
126*91f16700Schasingluluuint64\_t tzdram\_base;
127*91f16700Schasinglulu/* UART port ID \*/
128*91f16700Schasingluluint uart\_id;
129*91f16700Schasinglulu/* L2 ECC parity protection disable flag \*/
130*91f16700Schasingluluint l2\_ecc\_parity\_prot\_dis;
131*91f16700Schasinglulu/* SHMEM base address for storing the boot logs \*/
132*91f16700Schasingluluuint64\_t boot\_profiler\_shmem\_base;
133*91f16700Schasinglulu} plat\_params\_from\_bl2\_t;
134*91f16700Schasinglulu
135*91f16700SchasingluluPower Management
136*91f16700Schasinglulu----------------
137*91f16700Schasinglulu
138*91f16700SchasingluluThe PSCI implementation expects each platform to expose the 'power state'
139*91f16700Schasingluluparameter to be used during the 'SYSTEM SUSPEND' call. The state-id field
140*91f16700Schasingluluis implementation defined on Tegra SoCs and is preferably defined by
141*91f16700Schasinglulutegra\_def.h.
142*91f16700Schasinglulu
143*91f16700SchasingluluTegra configs
144*91f16700Schasinglulu-------------
145*91f16700Schasinglulu
146*91f16700Schasinglulu-  'tegra\_enable\_l2\_ecc\_parity\_prot': This flag enables the L2 ECC and Parity
147*91f16700Schasinglulu   Protection bit, for Arm Cortex-A57 CPUs, during CPU boot. This flag will
148*91f16700Schasinglulu   be enabled by Tegrs SoCs during 'Cluster power up' or 'System Suspend' exit.
149