November, 2010

Multicore and More
Freescale's Multicore Technologies

Alex Peck
Field Applications Engineering
Agenda

► Power Architecture Multicore Roadmap

► E5500 64-bit core Architecture

► Data Path Acceleration Architecture

► Starcore DSP Roadmap

► Software and Tools

► Green Hills Presentation

► Don’t Miss Jeff Logan’s Migration to QorIQ session!
Power Architecture® and Communications Processor Roadmap
Multicore Solutions in the Heart of Our Connected World

- Stagnating CapEx drives increased CapEx efficiency
- Ability to deliver ‘more services’ at lower CapEx …
- Service density and data deluge of network traffic drives significant opportunities in Multicore SoC
- Freescale closes “The Gap” with “Balanced, Application Driven Architecture”:
  - Smart multicore devices
  - Targeted application acceleration
  - Hardware assisted virtualization
  - Aggressive process technology
  - Extensive ecosystem and VortiQa multicore optimized software

“The Gap”
Exponentially increasing performance demands cannot be met by Moore’s Law alone.

Growth Relative to 2008=1

- Normalized IP Traffic
- Normalized Subscribers
- Normalized Capex
- Normalized SAM
- Moore’s Law
Continued innovation in hardware architectures
- QorIQ™: Broadest scalable family of processors in the market
  - Evolution from PowerQUICC® family
  - Dual core @ 800 MHz at < 5 Watts
  - Eight cores @ 1.5 GHz/core at 30 Watts
- StarCore® DSP solutions
  - Up to 1.0Ghz in 3-6 core configurations with advanced accelerators
- Industry leading integration and Communication Engines

Increasing software investment
- Optimized Multicore Solutions
- Hybrid software simulation environment and debug tools
  - Production ready software with VortiQa solutions
- Fast time to market
  - Simplified migration to multicore architecture
  - More flexibility to create a uniquely differentiated product

45nm high-performance technology in production

Service Provider
Enterprise
Consumer Access
Industrial and Aerospace
IBM and Freescale Collaboration on Power Architecture within Power.org

► Power Architecture Advisory Council – PAAC - IBM and Freescale
  • Maintain integrity of the ISA over its evolution – open architecture
  • Collaboration on technology innovations

► Recent Innovations - ISA 2.04/2.05/2.06
  • Added support for multi-core, virtualization and hypervisor
  • Additional instructions: Write and pre-fetch instruction for improved performance

► Technical Working Groups
  • Common debug methodology – single industry wide approach
  • Hypervisor
    ▪ Full Virtual CPU Virtualization
    ▪ Para-virtualization, API H-call interface for embedded PAPR
  • Simulation modeling – framework for compatibility between simulation tools
  • ABI – Application Binary Interface - Ecosystem enablement

► Future Innovations within the Architecture
  • Power management
  • Virtual CPU/Hypervisor
  • 64b Architecture
  • Multi-core
Power Architecture™ Cores Portfolio

QorIQ P5

- e500 mc plus:
  - 64b ISA2.06
  - 36 bit addressing (64GB) per process
  - Full speed FPU
  - Extended Branch Predictor for 64b mode
  - Additional Integer/FP instructions
  - Support 512KB BS L2
  - Supports 32-bit mode for software legacy
  - Support Hypervisor/ Trust Architecture (secure Boot, anti-Tamper/Detection)

QorIQ P3/P4

- e500 v2 plus:
  - Support Hypervisor/ Trust Architecture (secure Boot, anti-Tamper/Detection)
  - Support DP FPU (classic), decorated L/S instructions
  - Support 128K/256K BS L2, 64 entries MMU TLB variable size, 64B CL
  - Designed for CoreNet Coherency Fabric, double snoop BW

PowerQuicc, QorIQ P2/P1

- 32b PPC-E
- OOO, Dual-Issue, 7-stage pipeline
- Support SPE and EFPU
- Designed for Shared Bus, supports SC/DC

Freescale, the Freescale logo, AltVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLink and VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.
Roadmap: Freescale Processors Built on Power Architecture® Technology

First Generation: 45nm
- PowerQUICC I
- PowerQUICC II
- e600 +Soc

Next Generation: 45nm & 28nm
- QorIQ – P1: P1020/P1021/E500V2
- QorIQ – P2: P2020/P2011/E500V2
- QorIQ – P3: P3041/E500MC
- QorIQ – P4: P4080/P4040/E500MC
- QorIQ – P5: P5020/P5010/E5500

Decreasing Power
- High Performance within Embedded Power Budget of 30W
- Performance at Reasonable Power
- Value Priced for Power/Performance Applications
- Power Sensitive Applications

Increasing Performance
- Continuous enhancement of application performance
- Enhanced Virtualization Support
- Trust Architecture & H/W Accelerators
- Increasing # of cores
- Application performance H/W enhancements
- Power Sensitive Multicore Trust Architecture & H/W Accelerators
- Increasing # of cores
- High core frequency

Next Generation:
- Next Gen – T5: Higher core frequency
- Next Gen – T4: Application H/W accelerators
- Next Gen - T3: Increasing # of cores
- Next Gen - P1: Power Sensitive Multicore Trust Architecture & H/W Accelerators

Increasing Performance
- Continuous enhancement of application performance

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., a wholly owned subsidiary of NXP Semiconductors N.V., Reg. U.S. Pat. & TM Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logos, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLink and VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.
### Embedded Challenge

<table>
<thead>
<tr>
<th>Embedded Challenge</th>
<th>Bus Architecture</th>
<th>Fabric Arch</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>Core performance</td>
<td>Power Architecture e500 and e600 cores</td>
<td>Power Architecture e500 core w Front and backside cache</td>
<td>Moving to e500 cores across the family, common ISA from 1 to 8 cores</td>
</tr>
<tr>
<td>System Performance</td>
<td>Classic interrupt shared BUS architecture</td>
<td>Point to point Cross-bar Fabric</td>
<td>Balanced architecture between cores/IO/accelerators</td>
</tr>
<tr>
<td>Embedded Power Budgets</td>
<td>&lt;4 to 30 Max Power</td>
<td>&lt;4 to 30 Max Power</td>
<td>Year on Year improvement of System Performance within embedded Pwr</td>
</tr>
<tr>
<td>Trusted computing and Virtualization</td>
<td>Software based virtualization</td>
<td>Hardware/software Virtualization/Trusted Computing</td>
<td>Highest level of Secure Boot and anti-threat protection in the industry</td>
</tr>
<tr>
<td>Quick time to market</td>
<td>Simple debug</td>
<td>The Multicore challenge: Multiple flows with multiple points of failure</td>
<td>Advanced hardware debug support and software modeling capabilities</td>
</tr>
</tbody>
</table>

### Freescale’s Multicore System Architecture

#### PowerQUICC/QorIQ P1, P2

- **CPU**: Shared Bus, Bridge, I/O, Accel
- **DRAM**: Bridge, DRAM

#### QorIQ P3, P4

- **CPU**: Shared Bus, Cross-bar, Bridge, Accel, DRAM
- **DRAM**: Bridge, DRAM
Platform Interconnect is Critical to Delivering Multicore Scalability

- Multicore interconnects must address:
  - **Scalability** of CPU cores, memory and I/O bandwidth
  - Flexible inter-processor communication **programming models**
  - **QoS** differentiation for control/data plane and network traffic
  - **Efficient** memory subsystem including caching and hardware coherency

- The CoreNet™ interconnect fabric on the QorIQ™ P4080 addresses the scalability needs of multicore processors

---

Freescale, the Freescale logo, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLink and VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.
Multicore Communications Processors: QorIQ™ P4080

- **8x e500 Superscalar Cores**
- **Tri-level Cache Hierarchy**
  - CoreNet On-chip Fabric
    - Eliminates shared bus contention and supports dramatically higher address issue bandwidth to ‘feed’ multiple cores
- **Hardware Virtualization Support**
- **On-demand Application Acceleration – DPAA**
- **Data Path Acceleration Architecture**
- **Industry-leading Performance, Process**
- **Advanced 45nm process technology**

**Application driven, balanced multicore architecture**

High performance cores, coupled with on demand application acceleration, on chip fabric interconnect and high speed interconnect.
QorIQ P1 Platform – P1020

- **Dual e500 Power Architecture™ core**
  - 533 – 800 MHz
  - 256KB Frontside L2 cache w/ECC, HW cache coherent
  - 36 bit physical addressing, DP-FPU

- **System Unit**
  - 32-bit DDR2/DDR3 with ECC to 800MHz datarate
  - Integrated SEC 3.3 Security Engine
  - Open-PIC Interrupt Controller, Perf Mon, 2x I2C, Timers, 16 GPIO’s, DUART
  - 16-bit Enhanced Local Bus supports booting from NAND Flash
  - Two USB 2.0 Controllers Host/Device support
  - SPI controller supporting booting from SPI serial Flash
  - SD/MMC card controller supporting booting from Flash cards
  - TDM interface
  - Three 10/100/1000 Ethernet Controllers (eTSEC) w/ Jumbo Frame support, SGMII interface
    - Enhanced features: Parser/Filer, QOS, IP-Checksum Offload, Lossless Flow Control
    - IEEE1588v2 Support
    - Two PCI Express 1.0a Controllers operating at 2.5GHz
  - Power Management

- **Process & Package**
  - 45nm SOI, XX +/- XX, 0C to 125C Tj
    - with -40C to 125C Tj option
  - 689-pin TePBGAII, 31x31mm, 1.0mm pitch
QorIQ™ P1 First Derivative – P1021

- Dual e500 core; 533 - 800 MHz
  - 256KB Frontside L2 cache w/ECC, HW cache coherent
  - 36 bit physical addressing, DP-FPU
- System Unit
  - 32-bit DDR2/DDR3, 800 MHz data rate w/ECC
  - Integrated SEC 3.3 Security Engine
  - Open-PIC Interrupt Controller, Perf Mon, 2x I2C, Timers, 16 GPIO’s, DUART
  - 16-bit Enhanced Local Bus supports booting from NAND Flash
  - USB 2.0 Controllers Host/Device support
  - SPI controller supporting booting from SPI serial Flash
  - SD/MMC card controller supporting booting from Flash cards
  - Three 10/100/1000 Ethernet Controllers (eTSEC) w/Jumbo Frame support, SGMII interface
    • IEEE1588v2 Support
  - QorIQ Engine for protocol off load and legacy interfaces
    • TDM interfaces with HDLC support
    • UTOPIA-L2 interface for ATM support
  - Two PCI Express 1.0a Controllers operating up to 2.5Gbps
- Power Management
  - 45nm SOI, 0.95V+/−50mV, -40C to 125C Tj
  - 689-pin TePBGAII
What’s New? - QorIQ™ P3 Series P3041 Block Diagram

Quad e500mc Power Architecture®
- 4 cores (up to 1.5GHz)
- Each with 128KB backside L2 cache
- 1MB Shared L3 Cache w/ECC

Memory Controller
- DDR3/3L SDRAM up to 1.3 GHz
- 32/64 bit data bus w/ECC

High Speed Interconnect
- 4 PCIe 2.0 Controllers
- 2 sRapidIO 2.1 Controllers
  - Type 9 and 11 messaging
- 2 SATA 2.0

CoreNet Switch Fabric

Ethernet
- 5 x 10/100/1000 Ethernet Controllers
  - Or 4x 2.5Gb/s SGMII
- 1 x 10GE Controllers
- All w/ Classification, H/W Queueing, policing, and Buffer Management, Checksum Offload, QoS, Lossless Flow Control, IEEE 1588
- Up to 1 XAUI, 4 SGMII or 2.5Gb/s SGMII, 2 RGMII

Datapath Acceleration

Device
- 45nm SOI Process
- 1295-pin package, pin compat with P4040
  - 37.5x37.5mm
Freescale QorIQ Platform’s - Trust Architecture

Protection Against

- **Theft of Functionality** - loss of control of the system’s functionality
- **Theft of Data** - where a data protection policy exists, loss of data to an unauthorized party
- **Theft of Uniqueness** - loss of product differentiation through reverse engineering, duplication, and unapproved inter-operability.

Relying on

- **Secure Boot** – start from Trusted code base or don’t start at all
- **Strong Partitioning of the System** – isolation of cores from each other to provide redundancy and data corruption protection between critical functions
- **Threat detection** – internal and external security event detection
- **Secure Debug**
Trusted Boot and Hypervisor

Secure Platform Boot: Configured to boot from on-chip ROM

- CPU#0 begins to boot from on-chip ROM, all other CPUs held in reset

- CPU executing from on-chip trusted boot code (provided by FSL) performs initial SoC configuration and health checks, verifies a signature over the Hypervisor micro-kernel, stored in the NV RAM of OEM’s choice

Secure boot insures that the system begins executing trusted code. This trusted code can test the trustworthiness of other system code before allowing it to execute.

Note: ‘Trusted’ = passes signature check. Don’t sign it if you don’t trust it!
## Power Architecture Growing Market Reach - Single Board Computer Partners

*Subset of a comprehensive partner ecosystem*

<table>
<thead>
<tr>
<th></th>
<th>ATCA</th>
<th>AMC</th>
<th>COM-express</th>
<th>Compact-PCI</th>
<th>VME</th>
<th>PMC’s</th>
<th>ATX,uATX</th>
</tr>
</thead>
<tbody>
<tr>
<td>Freescale</td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Curtiss Wright</td>
<td></td>
<td></td>
<td></td>
<td>✓</td>
<td></td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>KONTRON</td>
<td>✓</td>
<td></td>
<td></td>
<td></td>
<td>✓</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Emerson Network Power</td>
<td>✓</td>
<td>✓</td>
<td></td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
<td></td>
</tr>
<tr>
<td>EuroTech</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>✓</td>
<td></td>
</tr>
<tr>
<td>GE Intelligent Platforms</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>✓</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Interphase</td>
<td>✓</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>✓</td>
<td></td>
</tr>
<tr>
<td>Mercury</td>
<td>✓</td>
<td>✓</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>RadiSys</td>
<td>✓</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>TQ Embedded</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>✓</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
What’s New? - Easing the “Make vs Buy” Decision

- Freescale Development Systems with Production ready COMe
  - QorIQ products P4080, P2020, P1022
  - Linux BSP from Mentor
  - Code Warrior 90 day license

- Availability - $1499/System
  - P2020COME-DS-PB  November, 2010
  - P4080COME-DS-PB  December, 2010
  - P1022COME-DS-PB  January, 2010

- Emerson Network Power production ready COMe boards
  - COMX-P2020
    - P2020 – dual core @ 1.2GHz/core
    - X4 USB, PCIe and 3 GigE ports
    - 2 GB of DDR3 – 800 MHz (not Included)
    - 2-3D Graphics Processor Unit
  - COMX-P4080
    - P4080 – 8 cores @1.5GHz/core
    - X8 USB, 3 GigE and PCIe ports
    - Local Bus
    - Dual banks of 2 GB of DDR# - 1333MHz (not included)
  - COMX-P1022
    - P1022 – Dual core @ 1.067 GHz/core
    - w/Integrated Digital Display Output
    - X4 USB, x2 PCIe, dual GigE ports
    - X2 SATA
    - 2 GB DDR3-800 MHz
In the News: Freescale Announced Strategic Alliance

Freescale has signed new strategic partnerships with Enea, Green Hills and Mentor Graphics for Freescale’s QorIQ, PowerQUICC and StarCore portfolios

These deep partnerships call for unprecedented levels of collaboration across the entire silicon lifecycle
  • IP sharing
  • Joint investments in technology and product roadmaps
  • Go-to-market partnership

Establishes extremely comprehensive enablement support for QorIQ, PowerQUICC and StarCore devices

Plans call for adding more strategic partners over time
The Ecosystem to Enable the Connected World

World Class Alliances
Strategic Technology Collaboration

Architecture
Alliance

Hardware
Hardware Partners

Power Architecture®
Technology

Applications
Secure Networks

Tools/OS
Software Partners

Value Partners: Enable faster time to market and longer time in market

Optimize application specific stacks for continual improvement in network security solutions

Development and production systems
in standard industry form factors

WIND RIVER
mocana
D2
ARINC
CriticalBlue
Wind
Kaspersky
Codewarrior

e200 e300 e500 e600
SOC integrated devices
Embedded power budgets
Networking life cycles
Networking/security IP
Content Aware Packet Processing

Freescale, the Freescale logo, AltVice, C-S, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLink and VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.
Freescale’s Product Longevity Program

► Freescale has a longstanding track record of providing long-term production support for our products

► Freescale is pleased to provide a formal product longevity program for the market segments we serve
  
  • For all market segments in which Freescale participates, Freescale will make a broad range of devices available for a minimum of 10 years
  
  • Life cycles begin at the time of launch

► A list of participating Freescale products is available at: www.freescale.com/productlongevity
e5500 Overview

P5020 Core Architecture
QorIQ™ 64-bit Core

It’s a smarter approach to multicore. Freescale’s e5500 Core

► Next Generation 64-bit Core Architecture for higher performance, computational intensive applications.
  • 64-bit ISA support (Power Architecture v2.06 compliant)
  • Increased addressable memory space
  • Supports up to 2GHz CPU frequency

► High Performance Classic Floating Point Unit (FPU) for Industrial applications.
  • Supports IEEE Std. 754™ FPU Double Precision Floating Point

► Hybrid 32-bit mode to support legacy software and transition to 64-bit architecture.
  • Register settings allow users to utilize 32-bit mode or 64-bit mode, easing transition to 64-bit architecture

Introducing e5500

• Based on the e500mc Architecture with 64-bit ISA
• Core frequency up to 2GHz
• Up to 64GB addressable memory space
• Supports up to 512KB backside L2 cache
• High performance classic FPU
e500mc Improvements

Memory Subsystem

- Double cache line size (32 bytes to 64 bytes)
  - Implicit prefetching
  - Reduces address and snoop bandwidth
- Improved snoop capabilities for multicore systems
  - Snoops require half as many LSU cycle slots compared to e500v2
  - Snoop-misses (common case) can be sustained every other core cycle (every platform cycle) indefinitely
  - Protocol improvements have removed the need for snooping instruction accesses
  - Backside L2 reduces bus traffic relative to no backside L2
- Improved MMU
  - L2 TLB now supports 64 variable size pages (up from 16)
  - L1 TLB now supports 8 variable size pages (up from 4)
  - Maintains 512 4KB entries
  - Most embedded applications will never suffer an MMU miss
- Improved flow of data between datapath subsystem and cores
  - Highly optimal, low-latency method keeps cores busy
- Improved lock/mutex support
  - Removed one bus access for most lock/mutex operations (lwarx/stwcx)
- Improved statistics support
  - “Decorated Storage” APU provides fire-and-forget atomic updates of up to two 64-bit quantities with a single access
ISA Improvements

- Hypervisor provides protection and partitioning guarantees for multicore systems
- Special purpose “statistics instructions” (a.k.a Decorated stores)
- “Classic” floating point in place of SPE floating point
  - Compatible with e300 and e600 cores

Private backside L2

- Provides low-latency access to private cache
- Provides up to 4x more private cache resources for locking cache lines into the L2 (as well as L1), to facilitate determinism, fast interrupt handlers, etc
- Flexible allocation modes:
  - Unified: all 8 ways can be used for instruction or data
  - D-only: all 8 ways are reserved for data
  - I-only: all 8 ways are reserved for instructions
  - Per-way: N ways are reserved for data, 8-N ways are reserved for instructions
- Reduces snoop traffic in the system
# Core Comparison (e500)

<table>
<thead>
<tr>
<th></th>
<th>e500v1 and v2</th>
<th>e500mc</th>
<th>e5500&lt;sup&gt;4&lt;/sup&gt;</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Max Frequency</strong></td>
<td>1.5GHz</td>
<td>1.5 GHz</td>
<td>2GHz</td>
</tr>
<tr>
<td><strong>Dhrystone</strong></td>
<td>2.4</td>
<td>2.5</td>
<td>3.0</td>
</tr>
<tr>
<td><strong>Pipeline depth / Width</strong></td>
<td>7 / 2</td>
<td>7 / 2</td>
<td>7 / 2</td>
</tr>
<tr>
<td><strong>Integer Units</strong></td>
<td>3</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td><strong>GFLOPs</strong></td>
<td>SP FP = 2 OP/cycle&lt;sup&gt;5&lt;/sup&gt;</td>
<td>SP FP = 1 OP/cycle&lt;sup&gt;2&lt;/sup&gt;</td>
<td>SP FP = 2 OP/cycle&lt;sup&gt;2&lt;/sup&gt;</td>
</tr>
<tr>
<td></td>
<td>DP FP = 1 OP/cycle&lt;sup&gt;5&lt;/sup&gt;</td>
<td>DP FP = 0.5 OP/cycle&lt;sup&gt;2&lt;/sup&gt;</td>
<td>DP FP = 2 OP/cycle&lt;sup&gt;2&lt;/sup&gt;</td>
</tr>
<tr>
<td><strong>Floating-Point</strong></td>
<td>Embedded</td>
<td>Classic</td>
<td>Classic</td>
</tr>
<tr>
<td><strong>Vector support</strong></td>
<td>SPE</td>
<td>&lt;none&gt;</td>
<td>&lt;none&gt;</td>
</tr>
<tr>
<td><strong>Cache line size</strong></td>
<td>32 bytes</td>
<td>64 bytes</td>
<td>64 bytes</td>
</tr>
<tr>
<td><strong>L1 I and D caches</strong></td>
<td>32K 8-way PLRU</td>
<td>32K 8-way PLRU</td>
<td>32K 8-way PLRU</td>
</tr>
<tr>
<td><strong>Backside private cache</strong></td>
<td>&lt;none&gt;</td>
<td>128KB 8-way backside L2 per core, PLRU replacement</td>
<td>512KB 8-way backside L2 per core, PLRU replacement</td>
</tr>
<tr>
<td><strong>Frontside shared cache</strong></td>
<td>256-1024KB 8-way L2</td>
<td>2MB 32-way CPC&lt;sup&gt;1&lt;/sup&gt;</td>
<td>2MB 32-way CPC&lt;sup&gt;1&lt;/sup&gt;</td>
</tr>
<tr>
<td><strong>Branch direction prediction</strong></td>
<td>512-entry, two-bit</td>
<td>512-entry, two-bit</td>
<td>512-entry, two-bit</td>
</tr>
</tbody>
</table>

1. P4080 Implementation
2. Pre-silicon calculation
3. Includes private backside cache
4. 64-bit core with 36-bit physical addressing
5. V2 core with SPE only
## Advantages of e5500

<table>
<thead>
<tr>
<th>Features</th>
<th>Benefits</th>
</tr>
</thead>
</table>
| 64-bit ISA Support                            | • Provides the ability for the core to utilize twice the amount of data per CPU cycle (64-bit vs 32-bit), which increase performance for computational-intensive applications with large data sets  
• Increased addressable memory space makes programming easier as it allows a single process to have a larger address space, and enables more complex applications that need more memory space. |
| 7-stage Pipeline with Out-of-Order Execution  | • Allows the core to continue to do productive work in the event of a stalled instruction or a wrong branch prediction.                                                                                   |
| Floating Point Unit                           | • Classic double precision floating point supported which allows for faster, more accurate computation                                                                                                    |
| Backside L2 Cache                             | • Provides a lower latency cache with higher bandwidth to the core, enabling higher performance, and Reduces the transactions on the shared interconnect and DDR memory                                                   |
| Up to 2GHz CPU Frequency                      | • Higher frequency provides additional performance for 32-bit & 64-bit applications. Applications with complex numerical algorithms will particularly see a performance improvement due to 64-bit and higher frequency. |
## E5500 Ecosystem Overview

<table>
<thead>
<tr>
<th>Ecosystem Partner</th>
<th>Solution Offering for e5500</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="" alt="ENEA Logo" /></td>
<td>Real Time Operating System support</td>
</tr>
<tr>
<td><img src="" alt="Green Hills Logo" /></td>
<td>Complete portfolio of software &amp; hardware development tools, trace tools and real-time operating systems</td>
</tr>
<tr>
<td><img src="" alt="Mentor Graphics Logo" /></td>
<td>Commercial grade Linux solution</td>
</tr>
<tr>
<td><img src="" alt="CodeSourcery Logo" /></td>
<td>Tool chain support for new core technology</td>
</tr>
<tr>
<td><img src="" alt="virtutech Logo" /></td>
<td>Provides Simics model of core technology to enable early 64-bit development.</td>
</tr>
<tr>
<td><img src="" alt="Power.org Logo" /></td>
<td>Power.Org supports the Power Architecture™ core technology using the new ISA v2.06</td>
</tr>
</tbody>
</table>
What’s New? - QorIQ™ P5 Series P5020 Block Diagram

- **Dual e500mc-64 Power Architecture®**
  - 2x 64-bit e500mc cores (up to 2 GHz)
  - Each with 512 KB backside L2 cache
  - Dual 1MB Shared L3 Cache w/ECC
  - Supports up to 64GB addressability (36 bit physical addressing)

- **Memory Controller**
  - Dual DDR3, 3L up to 1.3 GHz
  - 32/64 bit data bus w/ECC

- **High Speed Interconnect**
  - 4 PCIe 2.0 Controllers
  - 2 SRIO 2.1 Controllers
    - Type 9 and 11 messaging
  - 2 SATA 3Gb/s
  - 2 USB 2.0 with PHY

- **CoreNet Switch Fabric**

- **Ethernet**
  - 5 x 10/100/1000 Ethernet Controllers
  - 1 x 10GE Controller (XAUI)
  - All w/ Classification/Policing, H/W Queuing, policing, and Buffer Management, Checksum Offload, QoS, Lossless Flow Control, IEEE 1588v2, 4 SGMII,

- **Datapath Acceleration**
  - SEC 4
  - PME 2
  - RapidIO Messaging

- **Device**
  - 45nm SOI Process
  - 1295-pin package: Pin compatible with P4080 and P3041

Freescale, the Freescale logo, Altivec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLink and VortiQi™ are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.
AltiVec Technology on the e5500 Core

► Moving the AltiVec technology to the QorIQ processor family
  • Aligns with the hardware accelerator strategy – offload processing to dedicated functions/applications
  • Utilizing the QorIQ platform power management architecture to manage power of all functions on the device

► The initial core will be e5500 + AltiVec
  • 64-bit core with next-generation Floating Point Unit (increase over e500mc)
  • AltiVec 128-bit SIMD unit which operates independent of Scalar Integer and Floating Point Units
  • Improved functionality
    - Vector Absolute Difference function – single cycle function which previously was taking multiple lines of code
    - Improved Load and Store instructions – which resolve the cumbersome alignment issues and improves performance
    - Gated clocks to minimize dynamic power

► Freescale software enablement support of internally-developed and externally-supplied libraries
An Introduction to the QorIQ™ Data Path Acceleration Architecture (DPAA)
What is DPAA?
Multicore Data-Path Issues
DPAA Components
- FMAN
- QMAN
- BMAN
- Hardware Accelerators (SEC, PME)
Use-case scenarios
- A Day in the Life of a Packet
- Handling Packets from External I/O
Leveraging DPAA Performance
Summary
What is the Datapath Acceleration Architecture (DPAA)?

The QorIQ™ DPAA is a comprehensive architecture which integrates all aspects of packet processing in the SoC, addressing issues and requirements resulting from the multicore nature of QorIQ™ SoCs.

► The DPAA includes:
  • Cores
  • Network and packet I/O
  • Hardware offload accelerators
  • The infrastructure required to facilitate the flow of packets between the above

The DPAA also addresses various performance related requirements especially those created by the high speed network I/O found on multicore SoCs such as the P4080
Multicore Datapath Issues and Requirements

Multicore SoCs, like the P4080, have a number of new requirements related to packet processing when compared to single core SoCs:

- Load spreading of arriving packets across pools of cores
- Packet ordering issues after processing
- Pipelined processing of packets using cores
- “Virtualization” or sharing of hardware accelerators and network I/O
- Inter-core communication
“Infrastructure” components
- Queue Manager (QMan)
- Buffer Manager (BMan)

Network I/O
- Frame Manager (FMan)

Hardware accelerators
- SEC – cryptographic accelerator
- PME – Pattern matching engine

Cores

CoreNet is not part of the DPAA but it provides the interconnect between the cores and the DPAA infrastructure as well as access to memory (DRAM)
DPAA infrastructure replaces descriptor rings:

- Queueing is split from buffer management and from the passing of frames to/from cores
- Queues can be shared by multiple cores
- Data reception is not throttled by how fast software can service ring entries
- Data can be stashed into cache just before it is processed, not when it is received
Queue Manager (QMan) supports:

► Low latency, prioritized queuing of descriptors between cores, network I/O and accelerators

► Lockless shared queues for load spreading and device “virtualization”

► Order restoration as well as order preservation through queue affinity

► Active queue management (WRED)

► Optimized core interface which can pre-position data/context/descriptors in core’s cache

► Delivery of per-queue accelerator specific commands and context information to offload accelerators along with dequeued descriptors
Frame Queues (FQs) are the basic queuing structure supported by QMan
- FIFO lists of Frame Descriptors (FDs)
- Each FD describes a frame which is a delineated piece of data (e.g. a packet) in buffer(s) in memory
- Multi-buffer frames are described using Scatter/Gather Tables
- FQs are in turn enqueued on Work Queues (WQs)

Channels are a collection of 8 WQs which have priority relative to each other
- Class scheduling is performed at a channel
- FQs are an ordered list of frames which need to be processed in the same way
- WQs are an ordered list of FQs which all have the same priority

Portal is a hardware interface used to access QMan facilities (e.g. Enqueue or Dequeue) possibly for multiple channels
DPAA Infrastructure: BMan

Buffer Manager (BMan) supports:

► 64 pools of buffer pointers
  • All buffers in a pool have “like” characteristics
  • BMan places no restrictions on these characteristics

► Hardware (and software) acquire and release of buffer pointers from/to pools
  • BMan is primarily intended to reduce the buffer management load on SW

► BMan keeps a small per-pool stockpile of buffer pointers in internal memory
  • Absorbs “bursts” of acquire/release without external memory access
  • Reduces acquire latency

► Pools (list of pointers) overflow into DRAM

► Pool depletion thresholds for pool replenishment and lossless flow control
  • All thresholds have hysteresis
Frame Manager (FMan) supports:
► One 10GE MAC and 4 GE MACs
  • Max 12xGE parse+classify
► L2/L3/L4 protocol parsing and validation
  • User defined protocols supported
► Hash based queue selection
► Exact match classification queue selection
► IEEE 1588 timestamping
► RMON/ifMIB stats
► Color aware dual rate, 3 color policing
► “Right size” buffer acquisition from BMan buffer pools
  • Picks buffer based on RX’ed frame size
► Per port egress rate limiting
► TCP/UDP TX checksum calculation
Core Interface: QMan Software Portals

- Software portals provide the DPAA interface to cores and software
  - Portal per core
  - Can be used by a core to access multiple channels or queues directly

- Low latency lock free dequeue and enqueue of descriptors

- Portals can work closely with a core to (optionally) position:
  - Descriptors
  - Packet data
  - Software defined per queue context or state information in L1 or L2 cache

- Queues can be “held” on a portal to ensure temporary affinity for order preservation
Security Engine SEC 4.0 supports
- Public key cryptography
- Random number generation
- Cryptographic authentication
  - SHA-, “SHA-2”, MD5
- Encryption and decryption
  - DES, 3DES, ARC, AES, Kasumi, Snow…
- From ~2 Gbps to >10 Gbps depending on algorithm
- Advanced protocol support
  - IPsec, SSL/TLS, LinkSec/MacSec…
- Run time integrity checking
Pattern Matching Engine (PME) 2.x

- Regex support plus significant extensions:
  - Patterns can be split into 256 sets each of which can contain 16 subsets
  - 32K patterns of up to 128B length
  - 9.6 Gbps raw performance

- Combined hash/NFA technology
  - No “explosion” in number of patterns due to wildcards
  - Low system memory utilization
  - Fast pattern database compiles and incremental updates

- Matching across “work units” finds patterns in streamed data

- The Pattern Matching Engine utilizes a pipeline of processing blocks to provide a complete pattern matching solution
RAID5/6 Engine

- RAID5/RAID6 Parity generation
- Configurable GF polynomial
- Dual parity generation in single pass
- Up to 16 sources
- Scatter gather support
- Descriptor pre-fetch
- Data Integrity Field (DIF) support
  - Called “Protection Information” in T10 SBC-3r17 spec
  - On-the-fly DIF add, check and remove
- Block sizes of 512B, 1K, 2K and 4KB
- Extensive command set
- Supports 2 x 10Gbps host BW for RAID5/6
RapidIO Message Manager

- RapidIO Rev 1.3 Compliant with 2.1 features
- Dual controllers
  - 1.25/2.5/3.125/5GBaud operation
    - 1x, 2x, 4x operation
- Extensive Transaction Type support
  - Type 9 Data Streaming
  - Type 10 Doorbells
  - Type 11 messaging
  - NWRITE/SWRITE
  - Port-write
- Support for hundreds of ingress/egress queues
- Robust QoS
- Direct interworking between Ethernet and RapidIO in hardware
  - No runtime CPU intervention required
“Virtualized” Accelerator Interface

- SEC and PME are integrated into the DPAA
  - Acquire/release buffer pointers from/to BMan
  - Dequeue and enqueue frames from QMan
- QMan “virtualizes” these HW accelerators
- QMan provides processing “context” and instructions with dequeued frames
  - e.g. crypto keys, IVs, ciphersuite
  - Simplifies software’s use of accelerators
FMan/QMan Ingress Packet Processing

1. Packets Arriving
2. Buffer Acquisition Request
3. Packet Data written to main memory subsystem
4. Packet Data Stored in H/W managed buffers

Classification driven enqueue distribution

References to Packet

Frontside Cache

DDR SDRAM
A Day in the Life of a Packet
## Leveraging Data Path Acceleration Support

<table>
<thead>
<tr>
<th>Offload</th>
<th>Feature</th>
<th>Advantage</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Ingress</strong></td>
<td>Hash calculation</td>
<td>Packet distribution to multiple cores, flow-pinning, table lookup</td>
</tr>
<tr>
<td></td>
<td>Coarse classification</td>
<td>Offload stateless ACL processing</td>
</tr>
<tr>
<td></td>
<td>Packet parsing</td>
<td>Avoid software overhead</td>
</tr>
<tr>
<td><strong>Generic</strong></td>
<td>Hardware buffer management</td>
<td>No buffer alloc/free operations in software</td>
</tr>
<tr>
<td></td>
<td>Hardware queue management</td>
<td>Simpler packet Rx/Tx, efficient stashing (to L1/L2), leaves room in cache for other data</td>
</tr>
<tr>
<td><strong>Egress</strong></td>
<td>Hardware QoS</td>
<td>Avoid software overhead, mitigate DoS attacks, prioritize CPU cycles</td>
</tr>
<tr>
<td><strong>Core</strong></td>
<td>Backside L2 cache</td>
<td>Faster access for multiple flow tables</td>
</tr>
<tr>
<td><strong>Look-Aside</strong></td>
<td>Protocol-aware cryptography</td>
<td>Offload protocol encapsulation/decapsulation, sequence tracking etc.</td>
</tr>
</tbody>
</table>
Data Path Acceleration Advantage

Data Path Acceleration provides up to 2x-3x improvement

Each feature added incrementally reduces cycles, increases throughput.
The QorIQ Datapath Acceleration Architecture components include:

- Queue Manager
- Buffer Manager
- Frame Manager
- Hardware accelerators such as SEC and PME

Together these components address multicore requirements including:

- Load spreading
- Packet ordering
- Device virtualization
- Inter-core communication
- HW buffer management
Freescale High Performance Multicore StarCore DSPs
Recognizing Performance

Freescale SC3850 DSP core earns highest BDTImark2000™ score to date

Freescale StarCore® SC3850 core technology used in the MSC8156 multicore DSP has garnered leading benchmark results from independent signal-processing technology analysis firm, Berkeley Design Technology, Inc. (BDTI)

Winning Customers

20+ design Wins at basestation customers in 2009

8 out of the top 10 basestation customers chose Freescale in 2009

Beat out Texas Instruments in all cases.
MSC8156 Block Diagram

6x SC3850 Cores Subsystems (6GHz/48GMACS) each with:
- SC3850 DSP core at up to 1GHz (8GMACs 16b or 8b)
- 512 Kbyte unified L2 cache / M2 memory.
- 32 Kbyte I-cache, 32Kbyte D-cache, WBB, WTB, MMU, PIC
- Fully Programmable

Internal/External Memories/Caches
- 1056 KByte M3 shared memory (SRAM)
- Two DDR 2/3 64-bit SDRAM interfaces at up to 800 MHz

CLASS – Chip-Level Arbitration & Switching Fabric
- Non-Blocking, fully pipelined, low latency
- Full fabric 12 masters to 8 slaves, up to 512 Gbps throughput

MAPLE-B – Accelerator Block
- Turbo/Viterbi Decoder up to 200/115 Mbps
- Fourier Transform accelerator up to 350 Msps FFT and 175 Msps DFT

Security Engine (Talitos 3.1) (Optional)
- Data and Code Protection (AES, SHA, RC-4, Kasumi, SNOW)

High Speed Interconnects
- Dual 4x/1x Serial RapidIO at 1.25/2.5/3.125 Gbaud
- PCI-e 4x/1x

Dual RISC QUICCEngine® supporting
- Dual SGMII/RGMII Gigabit Ethernet ports
- Eth. Protocols, Talitos control and sRIO offload

TDM Highway
- 1024 ch., 400Mbps, divided into 4 ports of 256

DMA Engine 16 bi-directional channels
8 hardware semaphores

Other Peripheral Interfaces
- SPI, UART, I2C, 32 GPIO, 16 Timers, 96KB boot ROM, JTAG/SAP, 8WDT

Technology
- Process: 45nm SOI
- Voltage: 1V core, 2.5, 1.8/1.5V I/O
- Package: FCBPGA (29x29) 1mm pitch, RoHS
MSC8256 Block Diagram

6x SC3850 Cores Subsystems (6GHz/48GMACS) each with:
- SC3850 DSP core at up to 1GHz (8GMACs 16b or 8b)
- 512 Kbyte unified L2 cache / M2 memory.
- 32 Kbyte I-cache, 32Kbyte D-cache, WBB, WTB, MMU, PIC
- Fully Programmable

Internal/External Memories/Caches
- 1056 KByte M3 shared memory (SRAM)
- Two DDR 2/3 64-bit SDRAM interfaces at up to 800 MHz

CLASS – Chip-Level Arbitration & Switching Fabric
- Non-Blocking, fully pipelined, low latency
- Full fabric 12 masters to 8 slaves, up to 512 Gbps throughput

High Speed Interconnects
- Dual 4x/1x Serial RapidIO at 1.25/2.5/3.125 Gbaud
- PCI-e 4x/1x

Dual RISC QUICCEngine® supporting
- Dual SGMII/RGMII Gigabit Ethernet ports
- Eth. Protocols, Talitos control and sRIO offload

Ethernet
- Dual Gigabit Ethernet ports (SGMII/RGMII)

TDM Highway
- 1024 ch., 400Mbps, divided into 4 ports of 256

DMA Engine 16 bi-directional channels

Other Peripheral Interfaces
- SPI, UART, I2C, 32 GPIO, 16 Timers, 96KB boot ROM, JTAG/SAP, 8WDT

Technology
- Process: 45nm SOI
- Voltage: 1V core, 2.5, 1.8/1.5V I/O
- Package: FCBPGA (29x29) 1mm pitch, RoHS
## New Product Rack and Stack

<table>
<thead>
<tr>
<th>Device</th>
<th>8156</th>
<th>8154</th>
<th>8152</th>
<th>8151</th>
<th>8256</th>
<th>8254</th>
<th>8252</th>
<th>8251</th>
</tr>
</thead>
<tbody>
<tr>
<td>SC8350 DSP Cores</td>
<td>6</td>
<td>4</td>
<td>2</td>
<td>1</td>
<td>6</td>
<td>4</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>Core Speed (MHz)</td>
<td>1GHz</td>
<td>1GHz</td>
<td>1GHz</td>
<td>1GHz</td>
<td>1GHz</td>
<td>1GHz</td>
<td>1GHz</td>
<td>1GHz</td>
</tr>
<tr>
<td>Core Performance (16-bit MMACs)</td>
<td>Up to 48000</td>
<td>Up to 32000</td>
<td>Up to 16000</td>
<td>Up to 8000</td>
<td>Up to 48000</td>
<td>Up to 32000</td>
<td>Up to 16000</td>
<td>Up to 8000</td>
</tr>
<tr>
<td>Shared M3 Memory</td>
<td>1MB</td>
<td>1MB</td>
<td>1MB</td>
<td>1MB</td>
<td>1MB</td>
<td>1MB</td>
<td>1MB</td>
<td>1MB</td>
</tr>
<tr>
<td>I-Cache (per core)</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
</tr>
<tr>
<td>D-Cache (per core)</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
<td>32 KB</td>
</tr>
<tr>
<td>L2 I-Cache (per core)</td>
<td>512KB</td>
<td>512KB</td>
<td>512KB</td>
<td>512KB</td>
<td>512KB</td>
<td>512KB</td>
<td>512KB</td>
<td>512KB</td>
</tr>
<tr>
<td>DDR2/3</td>
<td>2 (800MHz)</td>
<td>2 (800MHz)</td>
<td>2 (800MHz)</td>
<td>2 (800MHz)</td>
<td>2 (800MHz)</td>
<td>2 (800MHz)</td>
<td>2 (800MHz)</td>
<td>2 (800MHz)</td>
</tr>
<tr>
<td>PCIe</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>GEMAC (RGMII, SGMII)</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>sRIO</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>TDM</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>SPI</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>UART</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>I²C</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>FFT/DFT Accelerators</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>Security</td>
<td>AES, SHA, RC4, Kasumi, SNOW</td>
<td>AES, SHA, RC4, Kasumi, SNOW</td>
<td>AES, SHA, RC4, Kasumi, SNOW</td>
<td>AES, SHA, RC4, Kasumi, SNOW</td>
<td>AES, SHA, RC4, Kasumi, SNOW</td>
<td>AES, SHA, RC4, Kasumi, SNOW</td>
<td>AES, SHA, RC4, Kasumi, SNOW</td>
<td></td>
</tr>
<tr>
<td>Proc. Tech.</td>
<td>45nm SOI</td>
<td>45nm SOI</td>
<td>45nm SOI</td>
<td>45nm SOI</td>
<td>45nm SOI</td>
<td>45nm SOI</td>
<td>45nm SOI</td>
<td>45nm SOI</td>
</tr>
<tr>
<td>Package</td>
<td>783 Ball FC-PBGA</td>
<td>783 Ball FC-PBGA</td>
<td>783 Ball FC-PBGA</td>
<td>783 Ball FC-PBGA</td>
<td>783 Ball FC-PBGA</td>
<td>783 Ball FC-PBGA</td>
<td>783 Ball FC-PBGA</td>
<td>783 Ball FC-PBGA</td>
</tr>
</tbody>
</table>
### MSC825x DSP Power Consumption

<table>
<thead>
<tr>
<th>Device</th>
<th>Power Consumption</th>
</tr>
</thead>
<tbody>
<tr>
<td>MSC8256@1GHz</td>
<td>6W</td>
</tr>
<tr>
<td>MSC8256@800MHz</td>
<td>5.5W</td>
</tr>
<tr>
<td>MSC8254@1GHz</td>
<td>4.7W</td>
</tr>
<tr>
<td>MSC8254@800MHz</td>
<td>4.4W</td>
</tr>
<tr>
<td>MSC8252@1GHz</td>
<td>3.5W</td>
</tr>
<tr>
<td>MSC8251@1GHz</td>
<td>2.9W</td>
</tr>
</tbody>
</table>

Typical power values were estimated assuming: DSP cores running at 1V, each at 75% utilization. A single 64 bit DDR3 running at 800MHz, 50% utilization. M3 Memory 50% utilized, TDM Enabled 20% loading, 1 RGMII @ 1Gbps 50% loading with junction temperature of 60°C.
# Multicore High Performance DSP comparison

<table>
<thead>
<tr>
<th>Device</th>
<th>Freescale multicore parts</th>
<th>TI multicore part</th>
<th>FSL advantage</th>
</tr>
</thead>
<tbody>
<tr>
<td>ASP</td>
<td>MSC8156</td>
<td>MSC8256</td>
<td>C6472-7</td>
</tr>
<tr>
<td>$150.94</td>
<td>$134.77</td>
<td>$210.00</td>
<td></td>
</tr>
<tr>
<td>Performance (BDTIM2000TM)</td>
<td>92,520</td>
<td>92,520</td>
<td>79,020</td>
</tr>
<tr>
<td>Performance (GHz)</td>
<td>6 x 1.0GHz</td>
<td>6 x 1.0GHz</td>
<td>6 x 700MHz</td>
</tr>
<tr>
<td>PCI</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>SRIO</td>
<td>2 (4 lanes)</td>
<td>2 (4 lanes)</td>
<td>1 (2 lanes)</td>
</tr>
<tr>
<td>DDR</td>
<td>2xDDR3 (800MHz)</td>
<td>2xDDR3 (800MHz)</td>
<td>DDR2 (500MHz)</td>
</tr>
<tr>
<td>TCP/VCP</td>
<td>1</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>DFT/FFT</td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Process Technology</td>
<td>45nm</td>
<td>45nm</td>
<td>65nm</td>
</tr>
</tbody>
</table>

## Optimized MAPLE-B
- FFT (128-2048)
- DFT (12-1536)
- Viterbi coprocessor & Turbo coprocessor

## Scalable solution
- Pin compatible 1,2,4,6 Core versions
- Full code compatibility between all DSPs

## CLASS fast switch fabric
- Single module and therefore has uniformity in data transfer
- Non-blocking, full-fabric interconnect
- Supplemented by dual FAST DDR2/3 (800MHz) controller
Easy-to-use Development Tools and Training

**MSC825x/815x ADS Board**
- $3900 (includes 1 Year Free CodeWarrior Tools Subscription)
- On-board Emulation

**CodeWarrior Software Development Tools**
- New Eclipse IDE
- Trace and Profile, SmartDSP OS, Debugger, C/C++ Compiler

**Software Migration Tools**
- Texas Instruments C64x+ to FSL SC3850 Migration Tools
- DSP libraries

**Documentation and Available Support**

www.freescale.com/dsp
- Device and Tool Fact Sheets
- Product Data Sheets
- Freescale DSP forums
- System block diagrams on all target end equipments
StarCore Easy to Use Development Tools

CodeWarrior® IDE
- Eclipse-based

StarCore® Build Tools
- ‘C’ & ‘C++’ Optimizing Compliers, Linker, ASM, Utilities

Debugger
- Multicore and Multi-DSP support
- Full access and control
- USB and Ethernet TAP probes for silicon debug

Trace & Profile
- Support of advanced debug & profiling capabilities/analysis
- MSC8256 silicon & simulator targets

Software Simulators
- Core Platform Cycle Accurate
- Device Functional Accurate

SmartDSP-OS RTOS
- Field deployed
- Fully pre-emptive
- Royalty free
- Built-in device drivers for MAPLE-V, Serial RapidIO, PCIe, VIC, Eth, TDM, DMA, SPI, I2C
Multicore Software Technology
Hardware Multicore Implementations

**Single Core with Hardware Accelerators**
- CPU
- Shared Bus
- Bridge
- DRAM
- I/O
- Accel

- Sequential Operations that cannot be multi-threaded
- Hardware acceleration provides more power/performance efficiency than software

**Homogeneous Multi/Many Core**

**With or Without accelerators**
**Shared or Distributed Memory**

- CPU
- CPU
- CPU
- CPU

- Accel
- I/O
- I/O
- Accel

- Easier Programming Environment
- Easier Migration of Legacy Code
- Lack of specialized hardware for differing tasks

**Heterogeneous Multi/Many Core**

- CPU
- GPU
- DSP
- CPU

- Accel
- I/O
- I/O
- FPU

- Specialized hardware for different tasks
- Most power/performance efficient
- Software complexity and Portability

---

**Increasing Software Complexity**
Varied Multicore Programming Models Required

### Symmetric Multi Processing
- **Single OS on all cores**
- Applications can run on any core
- Common implementation in Desktops

### Asymmetric Multi Processing
- **Many OS instances on a core**
- Common implementation in Servers
- **Goal** – consolidate servers, increase utilization

### Asymmetric Multi Processing
- **Many OS’s on dedicated cores**
- Common implementation in embedded markets
Multicore Software Solution Model

► MC Applications:
  • VortiQa security Applications
  • SMP And AMP Models
  • Component Model For Scalability

► Com Stacks/APIs:
  • Si Optimized
  • Open And Scalable

► Linux:
  • Control Plane Processing
  • SMP Support

► Light Weight Executive (LWE):
  • Data Path Acceleration Library
  • Run To Completion

► BSP:
  • Si Optimized
  • Full Featured
  • Open Source

► HyperVisor:
  • Security & Separation
  • Messaging among Cores
  • System-level Event Handling
  • Debug Support

Freescale, the Freescale logo, Altiview, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform-in-a-Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLink and VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.
QorIQ™ P4080/4040 Multicore Programming Paradigm

- **Support a variety of customer use-cases**
  - Multiple operating systems utilized across cores on a single device
  - Proprietary, 3rd party and Open Source multicore operating systems
  - Symmetric Multi-Processing (SMP) and Asymmetric Multi-Processing (AMP), often running concurrently
  - Often bare-metal, or engineered light-OS, used on forwarding/data plane cores

- **Freescale has developed a reference development platform:**
  - Freescale embedded reference Hypervisor
  - Freescale boot standards, including u-boot
  - Leverage open boot protocol and API standards (e.g. Power.org™)
  - Freescale Light Weight Executive (LWE) for run to completion data plane processing
  - Demonstrate performance and provide reference example for customers

---

Freescale, the Freescale logo, Altis, C-5, Code/TEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLink and VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.
Light Weight Executive Concept

- **Set of C-libraries** needed to support data plane applications (C++ planned)
- **Run-to-completion** software model
  - Processes do not pre-empt each other - the process must run to completion before other processes get a chance to run, as scheduled by the QMan (= implicit work scheduler)
  - IRQs are supported, software responsibility to postpone actual processing using SWI or implement proper protection/sharing mechanism
- Device Trees for LWE configuration
- Runs in supervisor state
- Dependency on Hypervisor
- Hypercalls used to access Hypervisor functionality

---

**Ingress Channel**

- FQ
- FQ
- FQ
- FQ

**Function**

---

**Egress Channel**

- FQ
- FQ
- FQ
- FQ
Hypervisor partitions system “spatially” into separate domains
- Guests run in separate partitions
- Separation of domains enforced by virtualization capabilities of e500-mc core and P4080 SoC

Hypervisor Architecture Overview

- Hypervisor partitions system “spatially” into separate domains
- Guests run in separate partitions
- Separation of domains enforced by virtualization capabilities of e500-mc core and P4080 SoC
Hypervisor Features

Operating System sees a virtual core plus hypervisor services

- Virtual CPU (like e500mc minus hypervisor features)
- Services via hypercall
  - Interrupt controller
  - IOMMU
  - Inter-partition doorbells
  - Partition management
  - Byte-channels
  - Power management
  - Error management
  - HA Failover
- Debug stub interface for debugging guest operating systems
Partition Management

Capabilities

- Copy data to/from another partition’s memory (e.g. loading OS images)
- Starting, rebooting other partitions
- Notifications—watchdog expiration, guest requests reboot, state change
- Linux `partman` command implements basic partition management features

Multicore System Hardware

- Hypervisor
- Shared Cache
- Interrupt Controller
- I/O
- Memory
- CPU
- App

Partition Management

- partition
- App
- Linux®

- partition
- App
- RTOS

- partition
- App
- Legacy OS
Each partition has a guest event queue for partition specific errors.

A global error queue is owned by a partition designated to be an “error manager”.

The guests implement policies specific to their needs.
Debugging

- Debug of guest operating systems is supported using hypervisor-resident debug agents
- Transport over multiplexed serial interface
- CodeWarrior and GDB supported
- Plug-in architecture for creating stubs

![Diagram showing Debugging concepts]
 Byte-channel— a hypercall based character I/O channel

 Flexible endpoint configuration

- A physical UART on the QorIQ™ P4080
- Another byte-channel endpoint
- A byte-channel to UART multiplexer
- A hypervisor debug stub
- The hypervisor console

![Diagram of Byte-Channels](image-url)
Multicore Software Development Kit (SDK)
P4080 SDK Architecture Today

Linux User Space
- PME Compiler
- FM Config Script
- Other Linux Apps
- Partition Manager
- LWE/Apps Image
- LWE CP Apps

LWE Applications
- IPFwd
- Pktwire
- PME
- IPSec
- Bridging
- QM tester
- Crypto
- FM Tester

Linux Kernel Space
- PME Driver
- BM Driver
- Legacy Drivers
- QM Driver
- FM Config Driver
- SEC Driver
- Ethernet Driver

LWE
- Mem Mgt
- SEC Driver
- Initialization
- BM Driver
- PME Driver
- Statistics
- QM Driver
- Atomic Calls
- Timer
- Byte Channel
- Inter Process Communication

Hypervisor
- Virtual CPU
- Interrupt controller
- Error Mgmt
- Boot services
- IPI
- U-Boot
- Secure Boot
- GNU Tools
- IOMMU
- Byte-channels
- Guest debugging
- Power Mgmt
- Partition Mgmt
- Integration / Packaging
Future High End Multicore Software Architecture

Linux User Space
- PME Tools
- DPAA Tools
- Std commands/libs
- Debug support
- FM Enhanced CfgDriver

Linux Kernel
- FM Basic Cfg Driver
- DPAA Ethernet Driver
- perfmon
- QM Driver
- BM Driver
- SEC Driver
- PME Driver

System Configuration and Control
- pthreads
- Statistics
- Stats/state access
- perfmon control

Hypervisor
- Virtual CPU
- Interrupt controller
- Error Mgmt
- Boot services
- IPI
- IOMMU
- Byte-channels
- Guest debugging
- Power Mgmt
- Partition Mgmt

LWE Applications
- IPFwd/IPSEC
- Focused Performance Examples
- Crypto
- PME

LWE
- Mem Mgt
- SEC Driver
- Initialization
- BM Driver
- PME Driver
- Statistics
- QM Driver
- Atomic Calls
- Timer
- Legacy Drivers

LWE Applications
- U-Boot
- GNU Tools
- Secure Boot
- MG System Builder

Linux Kernel
- Std commands/libs
- Debug support
- FM Enhanced CfgDriver

Hypervisor
- Virtual CPU
- Interrupt controller
- Error Mgmt
- Boot services
- IPI
- IOMMU
- Byte-channels
- Guest debugging
- Power Mgmt
- Partition Mgmt

Future High End Multicore Software Architecture

Linux User Space
- PME Tools
- DPAA Tools
- Std commands/libs
- Debug support
- FM Enhanced CfgDriver

Linux Kernel
- FM Basic Cfg Driver
- DPAA Ethernet Driver
- perfmon
- QM Driver
- BM Driver
- SEC Driver
- PME Driver

System Configuration and Control
- pthreads
- Statistics
- Stats/state access
- perfmon control

Hypervisor
- Virtual CPU
- Interrupt controller
- Error Mgmt
- Boot services
- IPI
- IOMMU
- Byte-channels
- Guest debugging
- Power Mgmt
- Partition Mgmt

LWE Applications
- IPFwd/IPSEC
- Focused Performance Examples
- Crypto
- PME

LWE
- Mem Mgt
- SEC Driver
- Initialization
- BM Driver
- PME Driver
- Statistics
- QM Driver
- Atomic Calls
- Timer
- Legacy Drivers

LWE Applications
- U-Boot
- GNU Tools
- Secure Boot
- MG System Builder

Linux User Space
- PME Tools
- DPAA Tools
- Std commands/libs
- Debug support
- FM Enhanced CfgDriver

Linux Kernel
- FM Basic Cfg Driver
- DPAA Ethernet Driver
- perfmon
- QM Driver
- BM Driver
- SEC Driver
- PME Driver

System Configuration and Control
- pthreads
- Statistics
- Stats/state access
- perfmon control

Hypervisor
- Virtual CPU
- Interrupt controller
- Error Mgmt
- Boot services
- IPI
- IOMMU
- Byte-channels
- Guest debugging
- Power Mgmt
- Partition Mgmt

LWE Applications
- IPFwd/IPSEC
- Focused Performance Examples
- Crypto
- PME

LWE
- Mem Mgt
- SEC Driver
- Initialization
- BM Driver
- PME Driver
- Statistics
- QM Driver
- Atomic Calls
- Timer
- Legacy Drivers

LWE Applications
- U-Boot
- GNU Tools
- Secure Boot
- MG System Builder
NSD Software and Enablement Technologies

- Advanced SW Development Tools
- Full Application Visibility/Control

- Si Optimized SW components
- Scalable Robust SW Architectures

- Compiler Friendly Cores
- Advanced Debug IP

Applications
Comm Stacks and APIs
Run Time, Schedulers, Virtualization
Optimized Software Drivers/BSPs/HAL

Cores Accelerators Peripherals

App Profile Comm Events Run Time Events Instrumentation IP Events, Trace

NSD Software and Enablement Technologies

- Advanced SW Development Tools
- Full Application Visibility/Control

- Si Optimized SW components
- Scalable Robust SW Architectures

- Compiler Friendly Cores
- Advanced Debug IP

Applications
Comm Stacks and APIs
Run Time, Schedulers, Virtualization
Optimized Software Drivers/BSPs/HAL

Cores Accelerators Peripherals

App Profile Comm Events Run Time Events Instrumentation IP Events, Trace
Freescale is focused on developing high performance, full enablement multicore software components

Freescale’s multicore software strategy supports various customer application programming models

Freescale’s multicore software strategy supports both high performance and low cost multicore devices