A Deep Dive into Image Processing for i.MX 6 Application Processors

Oliver Brown | Senior Software Engineer

APR. 10. 2014
Agenda

• Introduction
• IPUv3 – system overview
• IPUv3 – fundamentals
• IPU & the iMX Linux BSP
• Use case examples / tips
Introduction

- IPU (Image processing Unit) is present on most of i.MX products
- IPUv1 was 1st introduced on i.MX31 and upgraded on i.MX35
- IPUv3 is a family of IPs that are present on MX37, i.MX51, i.MX53, i.MX6Q/D/DL
Introduction

• This presentation will describe IPUv3 on i.MX5 and i.MX6.
• The slides are based on i.MX6
• IPUv3 architecture is common for all products
• The differences between one product to another are on
  – Processing speed (from 133Mhz to 264Mhz)
  – The modules included (CSI, VDI, ISP etc)
  – The connectivity options and interfaces (HDMI, LVDS, MIPI etc)
The following slide set is based on past versions of IPUv3

- The slides used for the i.MX51’s NPI training can be found on:
  - http://compass.freescale.net/doc/195331621/0201_IPUv3EX_In_MX51.ppt

- The slides used for the i.MX53’s NPI training can be found on:

- The slides used for the i.MX6’s NPI training can be found on:
IPUv3 resources

Freescale Internal Links

• IPU on Compass
  - http://compass.freescale.net/go/ipu
  - http://compass.freescale.net/go/ipudes

• IPUv3 code examples for MX6/Q
  - http://compass.freescale.net/livelink/livelink?func=ll&objId=222977460&objAction=browse&viewType=1

• IPUv3 code examples
  - http://compass.freescale.net/go/189478969

• IPUv3 users mail list
  - IPUFORUM@freescale.com

External Links

- https://community.freescale.com/community/imx
Video/Graphics System in i.MX6 D/Q
Video & Graphics System in i.MX6 D/Q

Full HW Support -> Multiple Advantages
- The CPU does not have to touch pixels
  -> available to run application
- Optimized data path -> reduced DDR load
  -> complex use cases with only 32-bit DDR memories
- Lower power consumption
  (because of both aspects above)

VPU
(video processing unit)
Video encoding and decoding

IPUs
(image processing unit)
- Connectivity to relevant devices
- Image processing: conversions, enhancement...
- Synchronization and control

GPUs
(graphics processing units)
Graphics generation

DCICs
(display content integrity check)

Interface Bridges
LVDS, HDMI, MIPI

Video Sources
Displays
Video/Graphics Subsystem in i.MX6 D/Q
Multimedia Processing Chain

- **Image Signal Processing**
  - Bayer -> YUV conversion
  - Image quality enhancement
  - Camera corrections

- **Image Conversions**
  - De-interlacing
  - Resizing (resolution adjustment)
  - Rotation & Inversion
  - Color Space Conversion
  - Pixel Format Conversion (packing…)

- **Display Enhancement**
  - Color adjustments and gamut mapping
  - Gamma correction and contrast stretching
  - Compensation for low-light conditions and backlight reduction
Multimedia Processing Chain – Implementation

- Comprehensive HW support:
  - Video/graphics fully handled by IPU, VPU and GPU.
  - The CPU does not have to touch pixels

- HW Accelerated

- Combining with Audio

- Communication Network

- Separation from Audio
Display Support
How to calculate the display resolution?

- FW = Frame Width
- FH = Frame Height
- FPS = Frame rate (fps)
- BI = Blanking interval
  - Provided in the display’s DS up to 35% (1.35).
  - Use min values

The pixel clock [MHz] is calculated according to:

\[ F = FW \times FH \times FPS \times BI \]

- Few things to consider:
  - Data format (pixel per clock?)
  - Display’s clock source (DI#_CLK_EXT bit)
  - The load on the display controller (DC)
### i.MX37, i.MX51, i.MX53, i.MX6 D/Q – Display Support

<table>
<thead>
<tr>
<th>Feature</th>
<th>i.MX51 (IPUv3EX)</th>
<th>i.MX37 (IPUv3D)</th>
<th>i.MX53 (IPUv3M)</th>
<th>i.MX6 D/Q (2 x IPUv3H)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Throughput</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td># of outputs</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td><strong>Pixel clock rate</strong></td>
<td>Up to 133 MHz</td>
<td>Up to 200 MHz</td>
<td>Up to 266 MHz per IPU</td>
<td></td>
</tr>
<tr>
<td><strong>Resolution</strong></td>
<td>WXGA+ (1600x900)</td>
<td>720p (1280x720) + SVGA (800x600)</td>
<td>WUXGA (1920x1200)</td>
<td>1080p (1920x1080) + WVGA (800x480)</td>
</tr>
<tr>
<td><strong>Interfaces</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Parallel</td>
<td>Two ports 24 bits + 18 bits</td>
<td>Two ports 24 bits + 24 bits</td>
<td>Synchronous (for display refresh) and asynchronous (to memory)</td>
<td>Very flexible - glue-less connection to RAM-less displays, display controllers, and TV encoders.</td>
</tr>
<tr>
<td>LVDS</td>
<td>No</td>
<td>Two channels; consumer version (multiple pairs); 2x 85 MHz or 170MHz</td>
<td>No</td>
<td>Two channels; consumer version (multiple pairs); 2x 85 MHz or 170MHz</td>
</tr>
<tr>
<td>HDMI</td>
<td>No</td>
<td>No</td>
<td>One port</td>
<td>One port, 2 lanes x 1 Gbps (non-Automotive)</td>
</tr>
<tr>
<td>MIPI/DSI</td>
<td>No</td>
<td>No</td>
<td>One port, 2 lanes x 1 Gbps (non-Automotive)</td>
<td>One port, 2 lanes x 1 Gbps (non-Automotive)</td>
</tr>
<tr>
<td>Analog</td>
<td>One port; TV-out</td>
<td>Also VGA rate increased to 1080p60</td>
<td>None (phased out)</td>
<td>None (phased out)</td>
</tr>
<tr>
<td>Display Content Authentication (CRC)</td>
<td>No</td>
<td>No</td>
<td>Yes, for 2 displays</td>
<td>Yes, for 2 displays</td>
</tr>
<tr>
<td><strong>Processing</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>On-the-fly combining</td>
<td>2 planes</td>
<td>2 planes</td>
<td>For 2 displays, 2 planes for each (6 more planes for lower resolutions)</td>
<td></td>
</tr>
<tr>
<td>(for high resolution displays)</td>
<td>(up to 3 more planes for lower resolutions)</td>
<td>(up to 3 more planes for lower resolutions)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Off-line combining</td>
<td>Up to 20 MP/sec</td>
<td>Up to 200+ MP/sec</td>
<td>Up to 500+ MP/sec</td>
<td></td>
</tr>
<tr>
<td>Display enhancement</td>
<td>Color adjustment and smart gamut mapping; gamma correction and contrast enhancement</td>
<td>Supporting effective proprietary algorithms</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Backlight power optimization</td>
<td>Yes; Supporting efficient proprietary algorithms</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Interfaces:**
- Parallel: Two ports, 24 bits + 18 bits.
- LVDS: No.
- HDMI: No.
- MIPI/DSI: No.
- Analog: One port; TV-out.

**Display Content Authentication (CRC):**
- No.

**Processing:**
- On-the-fly combining: 2 planes (up to 3 more planes for lower resolutions).
- Off-line combining: Up to 20 MP/sec.
- Display enhancement: Color adjustment and smart gamut mapping; gamma correction and contrast enhancement.
- Backlight power optimization: Yes; Supporting efficient proprietary algorithms.
### Capabilities
- Maximal display resolution: 4096x4096 pixels
- Maximal pixel rate: 264 MP/sec (200MP/sec on MX53, 133MP/sec on MX51)

### Display refresh rate
- The maximal refresh rate is: \( \frac{264M}{W \times H \times B} \)
  - \( W \times H \) is the display resolution
  - \( B > 1 \) reflecting blanking overhead, e.g. as specified by VESA, CEA-861-D, etc.
- The table provides the maximal refresh rates for some typical resolutions
- Usually, the refresh rate is required to be at least 60 Hz, to prevent blinking.
- The blanking overhead factor assumed for the calculation is 1.3.
  - The actual factor depends on the display and is often closer to 1, allowing higher resolutions @ 60 Hz (e.g. HD1440).
  - For example, for HD1080, the standard specifies \( B \approx 1.2 \)
- This is the capability of each of the two IPUs, so the total capability of the processor is doubled.
- Note: these rates refer only to screen refresh, gated by the capabilities of the display port. A full use case typically includes additional activities and to confirm its support with a given refresh rate, additional aspects – video processing capabilities, capacity of the memory system, etc. – should be also analyzed carefully.

### Maximal Resolution & Refresh Rate Table

<table>
<thead>
<tr>
<th>Name</th>
<th>Resolution</th>
<th>Width</th>
<th>x</th>
<th>Height</th>
<th>Total [MP]</th>
<th>Maximal Refresh Rate [Hz]</th>
</tr>
</thead>
<tbody>
<tr>
<td>VGA</td>
<td>640 x 480</td>
<td>640</td>
<td>x</td>
<td>480</td>
<td>0.31</td>
<td>666</td>
</tr>
<tr>
<td>PAL</td>
<td>720 x 480</td>
<td>720</td>
<td>x</td>
<td>480</td>
<td>0.35</td>
<td>592</td>
</tr>
<tr>
<td>WVGA</td>
<td>800 x 480</td>
<td>800</td>
<td>x</td>
<td>480</td>
<td>0.38</td>
<td>533</td>
</tr>
<tr>
<td>NTSC</td>
<td>720 x 576</td>
<td>720</td>
<td>x</td>
<td>576</td>
<td>0.41</td>
<td>493</td>
</tr>
<tr>
<td>SVGA</td>
<td>800 x 600</td>
<td>800</td>
<td>x</td>
<td>600</td>
<td>0.48</td>
<td>426</td>
</tr>
<tr>
<td>WSVGA</td>
<td>1024 x 600</td>
<td>1024</td>
<td>x</td>
<td>600</td>
<td>0.61</td>
<td>333</td>
</tr>
<tr>
<td>XGA</td>
<td>1024 x 768</td>
<td>1024</td>
<td>x</td>
<td>768</td>
<td>0.79</td>
<td>260</td>
</tr>
<tr>
<td>HD720</td>
<td>1280 x 720</td>
<td>1280</td>
<td>x</td>
<td>720</td>
<td>0.92</td>
<td>222</td>
</tr>
<tr>
<td>WXGA</td>
<td>1366 x 768</td>
<td>1366</td>
<td>x</td>
<td>768</td>
<td>1.05</td>
<td>195</td>
</tr>
<tr>
<td>WXGA+</td>
<td>1440 x 900</td>
<td>1440</td>
<td>x</td>
<td>900</td>
<td>1.30</td>
<td>158</td>
</tr>
<tr>
<td>SXGA</td>
<td>1280 x 1024</td>
<td>1280</td>
<td>x</td>
<td>1024</td>
<td>1.31</td>
<td>156</td>
</tr>
<tr>
<td>SXGA+</td>
<td>1400 x 1050</td>
<td>1400</td>
<td>x</td>
<td>1050</td>
<td>1.47</td>
<td>139</td>
</tr>
<tr>
<td>WSXGA+</td>
<td>1680 x 1050</td>
<td>1680</td>
<td>x</td>
<td>1050</td>
<td>1.76</td>
<td>116</td>
</tr>
<tr>
<td>UXGA</td>
<td>1600 x 1200</td>
<td>1600</td>
<td>x</td>
<td>1200</td>
<td>1.92</td>
<td>107</td>
</tr>
<tr>
<td>HD1080</td>
<td>1920 x 1080</td>
<td>1920</td>
<td>x</td>
<td>1080</td>
<td>2.07</td>
<td>99</td>
</tr>
<tr>
<td>WUXGA</td>
<td>1920 x 1200</td>
<td>1920</td>
<td>x</td>
<td>1200</td>
<td>2.30</td>
<td>89</td>
</tr>
<tr>
<td>9VGA</td>
<td>1920 x 1440</td>
<td>1920</td>
<td>x</td>
<td>1440</td>
<td>2.76</td>
<td>74</td>
</tr>
<tr>
<td>4XGA</td>
<td>2048 x 1536</td>
<td>2048</td>
<td>x</td>
<td>1536</td>
<td>3.15</td>
<td>65</td>
</tr>
<tr>
<td>HD1440</td>
<td>2560 x 1440</td>
<td>2560</td>
<td>x</td>
<td>1440</td>
<td>3.69</td>
<td>56</td>
</tr>
<tr>
<td>4WXGA</td>
<td>2560 x 1600</td>
<td>2560</td>
<td>x</td>
<td>1600</td>
<td>4.10</td>
<td>50</td>
</tr>
<tr>
<td>4K x 2K</td>
<td>4096 x 2048</td>
<td>4096</td>
<td>x</td>
<td>2048</td>
<td>8.39</td>
<td>25</td>
</tr>
</tbody>
</table>
**IPU in i.MX6 D/Q (IPUv3H) – Dual-Display Capabilities**

### Notes
- **This is the capability of each of the two IPUs, so the total capability of the processor is doubled.**
- The maximal pixel clock rate supported by the display ports
  - Each display: 220 MHz
  - Total: 240 MHz
- For a TV, the clock rate is fixed by the corresponding standards
- For other displays
  - The assumed screen refresh rate is 60 Hz
  - The blanking overhead – impacting the pixel clock rate – may vary between displays.
  - The table refers – for concreteness – to the VESA CVT (Coordinated Video Timing) specification
    - "Full support": allowing full blanking (which is typically required for CRTs)
    - "Partial support": allowing only reduced blanking (which is still typically sufficient for digital displays, e.g. LCDs)
- The above table describes only the capabilities of the display ports to perform screen refresh. A full use case typically includes additional activities and to confirm its support with a given display configuration, additional aspects – video processing capabilities, capacity of the memory system, etc. – should be also analyzed carefully.
i.MX6 D/Q Display Ports Muxing

- Six ports
  - Two parallel - driven directly by the IPU
  - Two LVDS channels - driven by the LVDS bridge
  - One HDMI – driven by the HDMI transmitter
  - One MIPI-DSI – driven by the MIPI-DSI transmitter
- Four simultaneous outputs
  - Each IPU has two display ports (DI0 and DI1)
  - Therefore, up to four external ports can be active at any given time.
  - Additional asynchronous data flows can be sent through the parallel ports and the MIPI-DSI port
- Display Content Integrity Check (DCIC)
  - For parallel interfaces: probes the I/O loopback (essentially equivalent to probing the external wires)
  - For other integrated interfaces (e.g. LVDS): probes the IPU output (essentially equivalent to the inputs to the serializers)
Max Display Port Resolutions on i.MX6Q/D

- MIPI DSI, 2 lanes
  - WXGA (1366 x 768) or 720p (1280 x 720)

- RGB
  - Port 1 – 4XGA (2048 x 1536)
  - Port 2 – 4XGA (2048 x 1536)

- LVDS
  - Single channel – WXGA (1366 x 768) or 720p (1280 x 720)
  - Dual channel – UXGA (1600 x 1200) or 1080p (1920 x 1080)

- HDMI
  - 1080p (1920 x 1080) or 4XGA (2048 x 1536)

*Note: Assuming 30% blanking intervals overhead, 24bpp, 60fps*
## Connecting a display on the parallel interface

### i.MX51 LCD

<table>
<thead>
<tr>
<th>Port Name (x=1,2)</th>
<th>RGB, Signal name (General)</th>
<th>16 bit RGB</th>
<th>18 bit RGB</th>
<th>24 bit RGB</th>
<th>8 bit YCrCb</th>
<th>16 bit YCrCb</th>
<th>20 bit YCrCb</th>
<th>Signal name</th>
</tr>
</thead>
<tbody>
<tr>
<td>DISPx_DAT0</td>
<td>DAT[0]</td>
<td>B[0]</td>
<td>B[0]</td>
<td>B[0]</td>
<td>Y/C[0]</td>
<td>C[0]</td>
<td>C[0]</td>
<td>DAT[0]</td>
</tr>
</tbody>
</table>

### Smart Port Name (x=1,2)

- Di_x_DISP_CLK
- Di_x_PIN1
- Di_x_PIN2
- Di_x_PIN3
- Di_x_PIN4
- Di_x_PIN5
- Di_x_PIN6
- Di_x_PIN7
- Di_x_PIN8
- Di_x_PIN9
- Di_x_PIN10
- Di_x_PIN11
- Di_x_PIN12
- Di_x_PIN13
- Di_x_PIN14
- Di_x_PIN15
- Di_x_PIN16
- Di_x_PIN17

- Di_x_DISP_CLK: PixCLK
- Di_x_PIN1: VSYNC_IN
- Di_x_PIN2: HSYNC
- Di_x_PIN3: VSYNC
- Di_x_PIN4: DRDY/DV
- Di_x_PIN5: CS0
- Di_x_PIN6: CS1
- Di_x_PIN7: WR
- Di_x_PIN8: RD
- Di_x_PIN9: RS1
- Di_x_PIN10: RS2
- Di_x_PIN11: DRDY
IPUv3H – Video In Support
## i.MX51, i.MX53, i.MX6 D/Q – Video In Support

<table>
<thead>
<tr>
<th>Feature</th>
<th>i.MX51 (IPUv3EX)</th>
<th>i.MX53 (IPUv3M)</th>
<th>i.MX6 D/Q (2 x IPUv3H)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Video Input Interfaces</strong></td>
<td>Parallel</td>
<td>Two ports, 20 bits + 8 bits</td>
<td>180 MHz; e.g. 9 MP @ 15 fps</td>
</tr>
<tr>
<td></td>
<td></td>
<td>120 MHz; e.g. 6 MP @ 15 fps</td>
<td>180 MHz; e.g. 9 MP @ 15 fps</td>
</tr>
<tr>
<td></td>
<td>MIPI/CSI-2</td>
<td>No</td>
<td>One port, 4 lanes x 1 Gbps</td>
</tr>
<tr>
<td><strong>Video Rate</strong></td>
<td>Playback</td>
<td>720p30 (1280x720) @ 30 fps</td>
<td>1080i/p (1920x1080) @ 30 fps</td>
</tr>
<tr>
<td></td>
<td>Record</td>
<td>D1 (720x480@30 fps or 720x576@25 fps)</td>
<td>720p30 (1280x720) @ 30 fps</td>
</tr>
<tr>
<td></td>
<td>2-way</td>
<td>720p @ 20 fps</td>
<td>720p @ 20 fps</td>
</tr>
<tr>
<td><strong>Video Processing</strong></td>
<td>De-interlacing</td>
<td>High-quality motion adaptive algorithm</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Resizing</td>
<td>Yes – fully flexible</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Rotation/inversion</td>
<td>Yes</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Color conversion</td>
<td>Yes – fully flexible</td>
<td></td>
</tr>
<tr>
<td><strong>Memory Interface</strong></td>
<td>Protocol</td>
<td>AXI – Including split transaction</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Throughput</td>
<td>64-bit, 133 MHz</td>
<td>64-bit, 200 MHz</td>
</tr>
<tr>
<td><strong>Efficient memory bus utilization</strong></td>
<td>Selective read for combining</td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Control capabilities</strong></td>
<td></td>
<td>Display controller, DMA controller, Internal synchronization</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Autonomous operations: display refresh/update, view-finder</td>
<td></td>
</tr>
<tr>
<td><strong>Synchronization</strong></td>
<td>(to prevent tearing)</td>
<td>Double/triple buffering</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Frame-by-frame or tight – sub-frame (utilizing internal memory)</td>
<td></td>
</tr>
</tbody>
</table>
Video Input Ports In i.MX6 D/Q

- Three ports; up to six input channels
  - Two parallel – connected directly to the IPUs; independent clock and format setting
  - One MIPI/CSI-2 – can transfer up to four concurrent channels
  - Each port: up to 150Mpxl/s @200MHz, e.g. 10Mpxl @ 15fps

- Four concurrent channels
  - Each IPU has two input ports (CSI0 and CSI1), each can process an input channel from one of the external ports.
  - The MIPI/CSI-2 bridge sends all its channels to all the IPU input ports and each port can select for processing a different channel, identified by its DI (Data Identifier).
  - Additional channels can be transferred through a CSI transparently – as generic data – directly to the system memory.

- Formats supported:
  - BT.656
  - BT.1120
  - YUV422, RGB888, YUV444 = over an 8 bit bus
  - RAW format up to 16bpp which will be translated to 8 bit using companding
  - Generic data up to 20bit
IPUv3H – The camera port

Cameras

CSI (Camera Sensor I/F)

CSI (Camera Sensor I/F)

SMFC (Sensor Multi FIFO Ctrl.)

VDI (Video De-Interlacer)

IC (Image Converter)

IDMAC (Image DMA Controller)

64-bit AXI

Memory

display
The Camera Sensor Interface - CSI

• Role: controls the camera port
  - Provides direct connectivity to relevant image sensors and connectivity bridges: CSI-2, HDMI receiver, TV decoder…

• Data bus – up to 20 bits
  - Single value – up to 16 bits
  - Two values – up to 10 bits each; e.g. HDTV YUV 4:2:2 input

• Variety of data formats
  - Main (with on-the-fly processing): YUV 4:2:2/4:4:4, RGB 16/24 bpp
  - Other: as generic data, including compressed streams
  - All primary CSI-2 formats

• Frame resolution
  - Up to 8192 x 4096 pixels

• Input rate
  - 240M values/sec peak (@ 264 MHz internal clock)

• Additional features
  - Frame rate reduction – by skipping (reduction ratio: m:n, m<=n<=12)
  - Window-of-interest selection – by cropping
How to calculate the video in pixel clock?

- **FW** = Frame Width
- **FH** = Frame Height
- **FPS** = Frame rate (fps)
- **BI** = Blanking interval
  - Provided in the device’s DS up to 35% (1.35).
- **D** = Data format
  - How the data is arranged on the bus – (cycles/pixel)

The pixel clock [MHz] is calculated according to:

\[
F = FW \times FH \times FPS \times BI \times D
\]
16-bit camera support

- **16-bit YUV422**
  - CSI receives 2 components per cycle.
  - CSI [16 bit generic data.] => SMFC => MEM [16bit generic data] => IPU[YUV422]

- **16-bit RGB as generic data**
  - CSI receives 3 components per cycle.
  - Use a 16 bit sample of it (such as RGB565)

- **16-bit RGB565**
  - On the fly processing of 16 bit data.
  - CSI is programmed to receive 16 bit generic data.
  - The interface is restricted to be in "non-gated mode" and the CSI#_DATA_SOURCE bit has to be set
  - If the external device is 24bit - the user can connect a 16 bit sample of it (RGB565 format).
  - The IPU has to be configured in the same way as the case of CSI#_SENS_DATA_FORMAT=RGB565
BT.1120/BT.656 support

- BT.656
  - CCIR progressive/interlaced modes
- BT.1120
  - CCIR progressive/interlaced mode
  - SDR/DDR mode
- The timing reference signals (Sync events) are embedded in the data.
- The CCIR codes are defined in the standard. IPU can support non standard codes using the CCIR registers.
- On the fly processing is supported in both modes
BT.1120/BT.656 support

- IPUv3 is an 8bit per component system.
- In a 10bit data inputs there are few ways to handle the data
- Companding - programmable piecewise-linear map
- Regular and tight packing to memory

<table>
<thead>
<tr>
<th>BT.1120 mode</th>
<th>connectivity</th>
<th>companded</th>
<th>regular packing</th>
<th>tight packing</th>
</tr>
</thead>
<tbody>
<tr>
<td>20bit, YUV422-10</td>
<td>D[19:0]</td>
<td>{8DC,8DC,8DC,8R}</td>
<td>{16DE,16DE,16DE,16R }</td>
<td>{10DE,10DE,10DE,2R}</td>
</tr>
<tr>
<td>20bit, YUV422-8</td>
<td>D[19:12], D[9:2]</td>
<td>{8DC,8DC,8DC,8R}</td>
<td>{8D,8D,8D,8R}</td>
<td>NA</td>
</tr>
</tbody>
</table>

DC - data after being companded
DE - data after being extended.
R - reserved bits
i.MX37, i.MX51, i.MX53, i.MX6 D/Q – Video in Support

<table>
<thead>
<tr>
<th>Data Format</th>
<th>CSI Bus width</th>
<th>D (cycles/pixel)</th>
<th>comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>RGB888/YUV444</td>
<td>8</td>
<td>3</td>
<td>D[19:12]</td>
</tr>
<tr>
<td>YUV422</td>
<td>8</td>
<td>2</td>
<td>D[19:12]</td>
</tr>
<tr>
<td>YUV422</td>
<td>16</td>
<td>1</td>
<td>D[19:4]</td>
</tr>
<tr>
<td>Generic data</td>
<td>16</td>
<td>1</td>
<td>D[19:4]</td>
</tr>
<tr>
<td>RGB565</td>
<td>8</td>
<td>1</td>
<td>D[19:12]</td>
</tr>
<tr>
<td>RGB565</td>
<td>16</td>
<td>2</td>
<td>D[19:4]</td>
</tr>
<tr>
<td>BT.1120</td>
<td>20</td>
<td>1</td>
<td>D[19:0]</td>
</tr>
<tr>
<td>BT.1120</td>
<td>16</td>
<td>1</td>
<td>D[19:12], D[9:2]</td>
</tr>
<tr>
<td>BT.656</td>
<td>8</td>
<td>2</td>
<td>D[19:12]</td>
</tr>
<tr>
<td>Bayer</td>
<td>16</td>
<td>1</td>
<td>D[19:4]</td>
</tr>
</tbody>
</table>

F = FW X FH X FPS X BI X D
## CSI data mapping

<table>
<thead>
<tr>
<th>CSI1_D19</th>
<th>CSI1_D18</th>
<th>CSI1_D17</th>
<th>CSI1_D16</th>
<th>CSI1_D15</th>
<th>CSI1_D14</th>
<th>CSI1_D13</th>
<th>CSI1_D12</th>
<th>CSI1_D11</th>
<th>CSI1_D10</th>
<th>CSI1_D9</th>
<th>CSI1_D8</th>
<th>CSI1_D7</th>
<th>CSI1_D6</th>
<th>CSI1_D5</th>
<th>CSI1_D4</th>
<th>CSI1_D3</th>
<th>CSI1_D2</th>
<th>CSI1_D1</th>
<th>CSI1_D0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>RGB888/ YUV444</td>
<td>RGB565 8bit</td>
<td>RGB565 16bit</td>
<td>YUV4:2:2</td>
<td>Generic data</td>
<td>BT.656 16 bit</td>
<td>YUV422 (YUV422-10)</td>
<td>BT.1120 (YUV422-10)</td>
<td>BT.1120 (YUV422-8)</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CSI1_D19</td>
<td>D7</td>
<td>D7</td>
<td>R4</td>
<td>D7</td>
<td>MSB</td>
<td>D7</td>
<td>Y7</td>
<td>Y9</td>
<td>Y7</td>
<td>CSI1_D18</td>
<td>D6</td>
<td>D6</td>
<td>R3</td>
<td>D6</td>
<td>MSB-1</td>
<td>D6</td>
<td>Y6</td>
<td>Y8</td>
<td>Y6</td>
</tr>
<tr>
<td>CSI1_D17</td>
<td>D5</td>
<td>D5</td>
<td>R2</td>
<td>D5</td>
<td>MSB-2</td>
<td>D5</td>
<td>Y5</td>
<td>Y7</td>
<td>Y5</td>
<td>CSI1_D16</td>
<td>D4</td>
<td>D4</td>
<td>R1</td>
<td>D4</td>
<td>MSB-3</td>
<td>D4</td>
<td>Y4</td>
<td>Y6</td>
<td>Y4</td>
</tr>
<tr>
<td>CSI1_D15</td>
<td>D3</td>
<td>D3</td>
<td>R0</td>
<td>D3</td>
<td>MSB-4</td>
<td>D3</td>
<td>Y3</td>
<td>Y5</td>
<td>Y3</td>
<td>CSI1_D14</td>
<td>D2</td>
<td>D2</td>
<td>G5</td>
<td>D2</td>
<td>MSB-5</td>
<td>D2</td>
<td>Y2</td>
<td>Y4</td>
<td>Y2</td>
</tr>
<tr>
<td>CSI1_D13</td>
<td>D1</td>
<td>D1</td>
<td>G4</td>
<td>D1</td>
<td>MSB-6</td>
<td>D1</td>
<td>Y1</td>
<td>Y3</td>
<td>Y1</td>
<td>CSI1_D12</td>
<td>D0</td>
<td>D0</td>
<td>G3</td>
<td>D0</td>
<td>MSB-7</td>
<td>D0</td>
<td>Y0</td>
<td>Y2</td>
<td>Y0</td>
</tr>
<tr>
<td>CSI1_D11</td>
<td></td>
<td>G2</td>
<td>MSB-8</td>
<td>CrCb7</td>
<td>Y1</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td>CSI1_D10</td>
<td>G1</td>
<td>MSB-9</td>
<td>CrCb6</td>
<td>Y0</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CSI1_D9</td>
<td>G0</td>
<td>MSB-10</td>
<td>CrCb5</td>
<td>CrCb9</td>
<td>CrCb7</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>CSI1_D8</td>
<td>B4</td>
<td>MSB-11</td>
<td>CrCb4</td>
<td>CrCb8</td>
<td>CrCb6</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CSI1_D7</td>
<td>B3</td>
<td>MSB-12</td>
<td>CrCb3</td>
<td>CrCb7</td>
<td>CrCb5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>CSI1_D6</td>
<td>B2</td>
<td>MSB-13</td>
<td>CrCb2</td>
<td>CrCb6</td>
<td>CrCb4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CSI1_D5</td>
<td>B1</td>
<td>MSB-14</td>
<td>CrCb1</td>
<td>CrCb5</td>
<td>CrCb3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>CSI1_D4</td>
<td>B0</td>
<td>MSB-15</td>
<td>CrCb0</td>
<td>CrCb4</td>
<td>CrCb2</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CSI1_D3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>CSI1_D2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CSI1_D1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>CSI1_D0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
IPUv3H – Fundamentals
The Image Processing Unit

- Functions: comprehensive support for the flow of data from an image sensor and/or to a display device.
  - Connectivity to relevant devices
  - Related image processing and manipulation
  - Synchronization and control capabilities
**IPUv3H – Internal Structure**

![Diagram of IPUv3H with labels for each component: CSI (Camera Sensor I/F), SMFC (Sensor Multi FIFO Ctrl.), VDI (Video De-Interlacer), IC (Image Converter), DP (Display Processor), DMFC (Display Multi FIFO Ctrl.), CM (Control Module), IRT (Image Rotator), DC (Display Contr.), DI (Display I/F). The diagram also shows connections to Cameras, Displays, 32-bit AHB, MCU, 64-bit AXI, and Memory.]
IPUv3 Fundamentals - The display port

• The display port handles all the IPUv3 features targeted for controlling and sending data to the display.
• The display port consists of 4 modules:
  • DC - a display controller
  • DP - a display processor
  • DMFC - a display multi-FIFO controller
  • DI - a display interface. The DI is instantiated twice to provide two symmetrical display interfaces.
IPUv3 Fundamentals - Supported display interfaces

- The total number of supported displays by IPUv3 is 4.
- The display port has 2 DI interfaces.
- Each interface can handle up to 3 displays.
- Each DI can handle up to 2 asynchronous interfaces (e.g. Smart LCD, Graphic accelerator) - only one of them can be serial interface.
- Each DI can handle one synchronous interface (e.g. TV, dumb LCD).
IPUv3 Fundamentals – display channels’ mapping

• The display port supports multiple flows that may have different characteristics
• In order to configure the IPU we need to identify some of the flow’s characteristics and allocate the IPUv3’s resources that will participate in that flow.
• First we need to understand how the channels are distributed.
## IPUv3 Fundamentals – display channels’ mapping

<table>
<thead>
<tr>
<th>Ch #</th>
<th>Destination</th>
<th>DMFC/DC numbering</th>
<th>Flow’s nature</th>
<th>Alpha channel</th>
<th>comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>21</td>
<td>DC</td>
<td></td>
<td>SYNC or ASYNC</td>
<td>NA</td>
<td>Direct flow via IC. If this flow is used it replaces one of the DMFC channels.</td>
</tr>
<tr>
<td>23</td>
<td>DP - primary</td>
<td>5B/5</td>
<td>SYNC</td>
<td>51</td>
<td>Ch 23 is associated with ch27. When there’s only one plane in the flow – this channel should be used.</td>
</tr>
<tr>
<td>24</td>
<td>DP - Primary</td>
<td>6B/6</td>
<td>ASYNC</td>
<td>52</td>
<td>Ch 24 is associated with ch29. 2 ASYNC flows can use this channel via alternate flow. When there’s only one plane in the flow – this channel should be used.</td>
</tr>
<tr>
<td>27</td>
<td>DP - Secondary</td>
<td>5F/5</td>
<td>SYNC</td>
<td>31</td>
<td></td>
</tr>
<tr>
<td>28</td>
<td>DC</td>
<td>1/1</td>
<td>SYNC or ASYNC</td>
<td>NA</td>
<td>if ch28 is connected to DI0 then ch23 must be connected to DI1</td>
</tr>
<tr>
<td>29</td>
<td>DP - Secondary</td>
<td>6F/6</td>
<td>ASYNC</td>
<td>33</td>
<td></td>
</tr>
</tbody>
</table>
## IPUv3 Fundamentals – display channels’ mapping

<table>
<thead>
<tr>
<th>Ch #</th>
<th>Destination</th>
<th>DMFC/DC numbering</th>
<th>Flow’s nature</th>
<th>Alpha channel</th>
<th>comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>40</td>
<td>DC</td>
<td>0/0</td>
<td>Read</td>
<td>NA</td>
<td></td>
</tr>
<tr>
<td>41</td>
<td>DC</td>
<td>2/2</td>
<td>ASYNC</td>
<td>NA</td>
<td></td>
</tr>
<tr>
<td>42</td>
<td>DC</td>
<td>1C</td>
<td>Command</td>
<td>NA</td>
<td>Refer to the spec for the command channel restrictions</td>
</tr>
<tr>
<td>43</td>
<td>DC</td>
<td>2C</td>
<td>Command</td>
<td>NA</td>
<td>Refer to the spec for the command channel restrictions</td>
</tr>
<tr>
<td>44</td>
<td>DC</td>
<td>3</td>
<td>Mask</td>
<td>NA</td>
<td>Mask channel can be associated with ch 23 or ch 28</td>
</tr>
</tbody>
</table>
IPUv3 Fundamentals - DI

• The DI is responsible for the timing waveforms of each signal in the display’s interface.

• The DI is composed of
  - 8 sets of waveform generators controlling signals associated with the DI’s clock; These signals drive PIN1-PIN8. These pins can be used for signals like VSYNC, HSYNC
  - 12 sets of waveform generators controlling signals associated with the data; These signals drive PIN11-PIN17 + 2 CS signals. These pins can be used for signals like DRDY, CS, RS
  - The DI generates the clock to the display
    ▪ The DI clock can be derived from the IPUv3 hsp_clk
    ▪ The DI clock can be derived from an external to the IPU clock (PLL or pin)
This waveform describes how the display clock’s parameters are set.
IPUv3 Fundamentals - DI

• This waveform describes how the DI’s PIN parameters are set.
IPUv3 Fundamentals - DI

- This waveform provides an example of waveform concatenation.
IPUv3 Fundamentals - DC

• The DC (Display Controller) is responsible for:
  • Activation of a flow
    – When there’s new content to be displayed (asynchronous flow)
    – Upon an internal timer (synchronous flow)
  • Linkage between the microcode to
    – mapping units (within DC)
    – and timing units (within DI)
Events
(NL, NF, New Data…)

DI’s Waveform generators
(timing characteristics)

Mapping Unit (DC)
(DATA handling)
The DMFC is a multi FIFO controller utilizing a single memory to serve the DC and DP channels.

The FIFO is partitioned to 8 equal segments. The segments can be allocated asymmetrically to the channels.

The memory allocation for a specific channel must not be greater than a certain number of rows. The exact number is different from one channel to another (see the spec).

When the direct path from the IC is used, this channel replaces one of the existing DMFC channels.
The Image DMA Controller – IDMAC

- **Role:** control the memory ports; transfer data to/from system memory
- **Memory ports - AXI**
  - IMDAC: 1 read, 1 write
- **Throughput:**
  - External: 64 bits @ 264 MHz
  - Internal: up to 2 pixels/cycle @ 264 MHz (through each port)
  - Shared by all DMA channels: input from sensor, output to display and off-line processing
    -> efficient utilization of the bandwidth in different use cases
  - Efficient pipelining: 4 AXI ID’s; multiple outstanding transactions: read – 8; write – 6
- **Data arrangement in memory**
  - Row-after-row, with flexible line-stride – as needed for a window in a video/graphics buffer
- **Access order**
  - Block-by-block – for rotation
  - Row-by-row – used for all other channels
    - Needed for output to display or input from sensor
    - Decreases the memory bus load by increasing its utilization efficiency
The Image DMA Controller – IDMAC (cont.)

- A variety of pixel formats
  - YUV 4:2:0/4:2:2/4:4:4 – for video
  - Conventionally packed RGB pixels – 8/16/32 bpp – for graphics
  - Tightly packed RGB pixels – 12/18/24 bpp
    - For reduced load on the memory bus and for power-efficient screen refresh
  - Optional independent alpha (translucency) input
    - For planes that do not have interleaved alpha
  - Coded color (using a LUT; 4/8 bpp)
    - An additional option to reduce bus load and power
  - Gray scale
  - Generic data

- Additional features
  - Conditional read (for combining) – transparent pixels are not read
    - Pixel transparency – identified from the independent alpha input
  - Scrolling and panning
  - Uniform programming model for all channels
  - (stored in the CPMEM – channel parameter memory)
IPUv3 Fundamentals - IDMAC

- The IDMAC of IPUv3 is connected to the AXI bus.
  - Full separation between read and write
  - All the channels are symmetric
  - 64-bit AXI bus, internal bus of 128-bit (4 pixels)
- Each channel uses 2X160 words of the CPMEM memory, holding the channels settings
- Ability to use alternate rows in the CPMEM for alternate flows
- Usage of alpha located in separate buffers (ATC)
- Dynamic and static arbitration between channels.

- Prioritizing screen refresh channels over other IPU channels and other DDR masters
DP (Display Processor)

- DP has following features;
  - Support input format YUVA RGBA
  - Combining 2 video/graphics planes
  - Color conversion (YUV <-> RGB, YUV<->YUV) & Correction (gamut-mapping)
  - Gamma correction and Contrast stretching
  - Support output format YUV/RGB
  - Dynamic task switching between async and sync flows
IPUv3 Fundamentals - DP

- The DP handles the content of the frame.
- It performs image processing on the way to the display (combining, CSC, gamma correction).
- The DP supports one synchronous flow and two asynchronous flows.
- All the DP configuration is done only via the SRM. The control module handles the automatic switch between DP settings when the DP switches from one flow to another.
**IC (Image Converter)**

- **Resizing**
  - Fully flexible resizing ratio
  
  Maximal downsizing ratio: 8:1
  Maximal upsizing ratio: 1:8192
  
  - Independent horizontal and vertical resizing ratios

- **Color conversion/correction**
  - YUV <-> RGB, YUV <-> YUV conversion

- **Combining with a graphic plane**

- **Max output width 1024 pixels. Larger images are processed in stripes**
The Image Rotator - IRT

- **Role**: performs rotation and inversion
  - Rotation: 90, 180, 270 degrees
  - Inversion: horizontal and vertical

- **Rate**: up to 100M pixels/sec
  - (depends on use case)

- **Additional features**
  - Acts on 8x8 blocks
  - Multi-tasking: up to three tightly time-shared tasks – block-by-block
  - Pixel format: 24-bit
The Video De-Interlacer or combiner - VDIC

• Role 1: performs **de-interlacing** – converting interlaced video to progressive

• Method: a high-quality motion adaptive filter
  - For slow motion – retains the full resolution (of both top and bottom fields), by using temporal interpolation
  - For fast motion – prevents motion artifacts, by using vertical interpolation

• Resolution: field size up to 968x1024 for i.MX6 and 720x1024 in i.MX5 pixels. Larger frames are processed in stripes (split mode).
• Output rate: up to 120M pixels/sec

• Additional features
  - Uses three input fields for each output frame (the minimum needed for a reliable motion detection)
  - Vertical interpolation – 4-tap filter; using an internal row buffer
  - Single concurrent flow
  - Input may come from a video decoder (VPU) or directly from the CSI
The Video De-Interlacer or combiner - VDIC

- Role 2: performs **combining** – overlaying of 2 frames at the same color space

- As an alternative to the de interlacing function the VDIC HW can perform combining

  - Combining of 2 planes
  - Doesn’t have to be of the same size
  - Must be of the same color space (no CSC)
  - Perform 1 pixel per cycle
  - Color keying, alpha blending
IPUv3H – Combining
IPUv3 – Basic Combining Capabilities

Combining in the Display Processor (DP)
Two planes
• One plane may have any size and location
• The other one must be “full-screen” (cover the full output area)
Maximal rate: i.MX37/51 – 133 MP/sec, i.MX53 – 200 MP/sec, i.MX6 Dual/Quad – 264 MP/sec

Combining in the Image Converter (IC)
Two planes; both “full-screen” (cover the full output area)
Maximal rate: i.MX37/51 – 20 MP/sec, i.MX53 – 30 MP/sec, i.MX6 Dual/Quad – 40 MP/sec

• Combining methods (in both cases)
  – Color keying and/or alpha blending
  – Alpha: global or per-pixel; interleaved with the pixels (upper plane) or as a separate input

Note: This is the capability per IPUs, so the total capability of the processor is doubled in i.MX6DQ.
IPUv3 – Off-Line Combining

- Unlimited number of planes combined sequentially

**Note:** This is the capability per IPUIs, so the total capability of the processor is doubled in i.MX6DQ.

**Combining in the VDIC (Video De-Interlacer & Combiner) – i.MX53/6 Dual/Quad only**
- Available when de-interlacing is not needed
- Two planes; each may have any size and location (supplemented by a “background color”)
- Maximal rate: i.MX53 – 180 MP/sec, i.MX6 Dual/Quad – 240 MP/sec
- Combining method – as in the DP and IC
IPUv3 – Maximal On-The-Fly Combining To A Single Display

3-planes
i.MX37/51 – up to 20 MP/sec

4-planes
i.MX53 – up to 30 MP/sec
i.MX6 Dual/Quad – up to 40 MP/sec

Note: the bottom plane may be a result of additional off-line combining of several planes

Note: This is the capability per IPU's, so the total capability of the processor is doubled in i.MX6DQ.
i.MX6 Dual/Quad: On-The-Fly Combining Using 2x IPUv3

**Example 1: 2x 4 planes**

- i.MX6 Dual/Quad
- IPUv3 - 1
- DI, DC, IC, VDIC
- External Memory
- Plane 8 (top)
- Plane 7
- Plane 6
- Plane 5 (bottom)

**Example 2: 1x 7 planes**

- i.MX6 Dual/Quad
- IPUv3 - 1
- DI, DC, IC, VDIC
- External Memory
- Plane 7 (top)
- Plane 6
- Plane 5
- Plane 4 (top)
- Plane 3
- Plane 2
- Plane 1 (bottom)

**Example 3: 4x 2 planes**

- i.MX6 Dual/Quad
- IPUv3 - 0
- DI, DC, IC, VDIC
- External Memory
- Plane 6
- Plane 5 (bottom)

**Note:** Some planes may be a result of additional off-line combining of several planes. Such combining may be performed either with IPUss or GPUs.
# IPUv3 combining capabilities – summary

<table>
<thead>
<tr>
<th></th>
<th>DP</th>
<th>IC</th>
<th>VDIC</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Output</strong></td>
<td>Display</td>
<td>Memory*</td>
<td>Memory</td>
</tr>
<tr>
<td><strong>Relations between planes</strong></td>
<td>PlaneA &lt;= PlaneB</td>
<td>PlaneA = PlaneB</td>
<td>PlaneA &lt;= PlaneB</td>
</tr>
<tr>
<td><strong>Color space conversion</strong></td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td><strong>Performance</strong></td>
<td>1 cycle/pixel</td>
<td>4 cycle/pixel</td>
<td>1 cycle/pixel</td>
</tr>
<tr>
<td><strong>HW cursor</strong></td>
<td>32x32 unified color</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td><strong>Output Image size</strong></td>
<td>FW up to 2048</td>
<td>FW up to 1024</td>
<td>FW up to 1920</td>
</tr>
<tr>
<td><strong>Color keying</strong></td>
<td></td>
<td>Yes</td>
<td></td>
</tr>
<tr>
<td><strong>Alpha blending</strong></td>
<td></td>
<td>Yes</td>
<td></td>
</tr>
</tbody>
</table>

*The output of the IC can be sent directly to a smart display*
IPUv3 Fundamentals – programming steps

This is a high level example of IPUv3 programming flow.

1. What are the displays connected in the use case?
   a. Allocated the displays to each DI
   b. Define the timing characteristics of each signal for each display.

2. Define each flow in the DC
   a. sync/async
   b. Define the events that trigger the flow – and what to do upon their arrival
   c. Allocate mapping unit, and mapping scheme
   d. Allocate waveform generator in the DI

3. Configure the DP for each flow

4. Configure the IDMAC
   a. How the data is arranged in the memory (interleaved/not interleaved)
   b. What’s the data’s format (PFS, BPP) , mapping

5. Processing: VDIC, IC (Resizing, CSC) and rotation settings

6. Control module configuration for activation of a flow
   a. Define the trigger to start a flow
   b. Define is the processing chain
IPUv3H – Control
The control Module - CM

• The control module is responsible for the flow management within the IPU.
• The module is composed of
  
  - General control Registers (GCR)
  - Frame synchronization unit (FSU)
  - Shadow registers module (SRM)
  - Interrupt controller
  - Low Power modes controller
  - Debug unit
FSU – double buffering

- Similar to IPUv1 the data is tightly pipelined using double buffering
FSU – task chaining
Peripherals
The DSI MIPI Interface is a digital core accompanied with a multi-lane D-PHY that implements all protocol functions defined in the MIPI DSI Specification, providing an interface between the System and MIPI DSI compliant Display.

**Features of the MIPI DSI complex:**

**Supported standard version:**
- MIPI DSI Compliant
- DSI Version 1.01
- DPI Version 2.0
- DBI Version 2.0
- DSC Version 1.02
- PPI for D-PHY
- MIPI D-PHY Version 1.0

**Configuration:** one clock lane, two data lanes

**Speed:** Up to 1Gb/s per lane (fast speed). Low speed/low power signaling supported

DSI can support both command and video modes and up to four virtual channels to accommodate multiple displays.
- Command and video mode support (type 1, 2, 3, and 4 display architecture)
- Mode switching: low power and ultra low power
- Burst mode/Non-burst mode
- Bus turnaround
- Fault error recovery scheme

Both DPI and DBI coexist in the system but only one of them could be active in a certain time.
MIPI CSI-2

The CSI-2 MIPI Interface is a digital core accompanied with multi-lane D-PHY that implements all protocol functions defined in the MIPI CSI-2 Specification, providing an interface between the System and MIPI CSI-2 compliant Camera Sensor.

The features of the MIPI CSI-2 complex:

- **Supported standard version:** MIPI CSI-2 Version 1.0
- **Configuration:** one clock lane, four data lanes
- **Speed:** Up to 1Gb/s per lane
- **Throughput:** 250MB/sec

- Timing accurate signaling of Frame and Line synchronization packets;
- Support for several frame formats such as:
  - General Frame or Digital Interlaced Video with or without accurate sync timing
  - Data type (Packet or Frame level) and Virtual Channel interleaving
- 32-bit Image Data Interface delivering data formatted as recommended in CSI-2 Specification;
  - Directly supports all primary data formats conversion to IPU input. Some secondary formats are treated as “generic” data
  - RGB, YUV and RAW color space definitions;
  - From 24-bit down to 6-bit per pixel;
  - Generic or user-defined byte-based data types
LVDS Interface in i.MX53 & i.MX6 D/Q – Key Features

- Structure
  - Two Channels
    - Each channel contains 4 data pairs + 1 clock pair
- Data
  - 18 bpp pixels – using 3 LVDS data pairs
  - 24 bpp pixels – using 4 LVDS data pairs
- Control signals: HSYNC, VSYNC, DE

- Pixel clock rate
  - Single Channel: up to 85 MHz; e.g. WXGA @ 60 fps or 720p60
  - Dual Channel: up to 170 MHz; e.g. UXGA @ 60 Hz or 1080p60

- Relevant Standards
  - PHY Standard: ANSI EIA-644A
  - Display Protocol Standards:
    - VESA PSWG – Panel Standardization Working Group – set of standards for panels using LVDS.
    - JEIDA/JEITA DISM Standard JEIDA-59-1999
    - OpenLDI (National) – Revision 0.95 13/May/1999. *Only* Unbalanced operating mode supported (aligned with vast majority of LCD vendors).
LVDS – what is supported?

• Single Channel configuration
  - Pixel clock: up to 85 MHz; e.g. WXGA @ 60 fps or 720p60
  - LVDS Clock frequency = Pixel clock x 7 = 85*7 = 595Mhz
  - Data :
    ▪ 18 bpp pixels – using 3 LVDS data pairs
    ▪ 24 bpp pixels – using 4 LVDS data pairs

• Dual Channel configuration
  - Pixel clock: up to 170 MHz; e.g. UXGA @ 60 Hz or 1080p60
  - LVDS Clock frequency = Pixel clock x 7/2 = 170*7/2 = 595Mhz
  - Data :
    ▪ 18 bpp pixels – using 3 LVDS data pairs per channel
    ▪ 24 bpp pixels – using 4 LVDS data pairs per channel
LDB Features

- **LDB Structure:**
  - 2 Channels, same/independent data
  - Each channel contains 4 data pairs + 1 clock pair

- **Resolutions/Rates:**
  - Single Channel (up to WXGA): Up to 85 MHz, 3 or 4 data pairs
  - Dual Channel (up to UXGA): Up to 170 MHz, 6 or 8 data pairs
  - For example: can support 1080p60 or UXGA @60fps

- **Pixel Depths:**
  - 18 bpp – 3 LVDS data pairs
  - 24 bpp – 4 LVDS data pairs

- **Control signals:**
  - Supports HSYNC, VSYNC, DE
**HDMI General Features**

- **Description**: High-Definition Multimedia Interface (HDMI) Transmitter including both HDMI TX Controller and PHY
- **Standard Compliance**: HDMI 1.4a, DVI 1.0, HDCP 1.4 (with keys stored in embedded eFuses)
  - Supporting majority of primary 3D Video formats
- **TMDS Core Frequency**: From 25 MHz to 340 MHz
- **Consumer Electronic Control**: Supported
- **Monitor Detection**: Hot plug/unplug detection and link status monitor support
- **Testing Capabilities**: Integrated test module
- **Maximal Power Consumption**: 70mW
- **Temperature Range**: -40°C to +125°C (Tj)
i.MX6 Dual/Quad: HDMI Video/Audio Features

- **Video Standard Compliance**: EIA/CEA-861D
- **Supported Video Resolutions**: Up to 1080p@60Hz and 720p/1080i@120Hz HDTV display; up to QXGA graphics display
- **Pixel Clock Frequency**: From 25 MHz to 240 MHz
- **Video Data Formats**: YCbCr 4:4:4; RGB 4:4:4; YCbCr 4:2:2
- **Internal Video Processing**: Interpolation YCbCr 4:2:2 to 4:4:4; conversion YCbCr to RGB and vice versa
- **Audio Standard Compliance**: IEC60958, IEC61937
- **Supported Audio Formats**: All audio formats as specified by the HDMI Specification Version 1.4a
- **Audio Input Interfaces**: Embedded Audio DMA
- **Audio Sampling Rate**: Up to 192 kHz
IPU & the iMX Linux BSP
IPU Drivers

- IPU drivers are based on code re-used from MX5x IPU driver
  - Key modification: Support for multiple instances provides support for the 2 IPU modules in i.MX 6Quad/Dual
- IPU functionality accessed through multiple interfaces:
  - **IPU framebuffer (FB) driver**: Accessed through the Linux standard FB interface
    - Introduction of MXC Display Driver framework, to manage interaction between IPU and display device drivers (e.g., LCD, LVDS, HDMI, MIPI, etc.)
  - **IPU processing driver**: A custom API exposes IPU processing functionality
    - Resizing
    - Rotation
    - Combining of graphics planes
    - CSC
    - De-interlacing
  - **Video 4 Linux 2 (V4L2) output driver**: Based on V4L2 video API, leverages IPU processing driver
  - **V4L2 capture driver**: Based on V4L2 capture API; leverages IPU processing driver and IPU core driver
IPU Drivers

• **Architecture Design:**
  • IPU sub-module (CSI, IC, DI, DP, IDMAC, etc) functionality provided in a set of IPU Core driver functions
  • Largely unmodified between MX5x and MX 6Quad
• Leverage existing Linux APIs:
  • For framebuffer access (IPU FB driver)
  • Video image processing and display (V4L2 output driver)
  • Image capture (V4L2 capture driver)
• Fill in gaps with custom APIs and API extensions:
  • Extensions to FB interface to control certain IPU functionality – local and global alpha, gamma correction, etc.
  • IPU Processing driver to provide user space access to IPU processing capabilities
• Provide MXC Display Driver *(mxc_dispdrv.h)* framework to simplify connection between display devices and IPU modules.
Overview of IPU Drivers

- **MXC Display Driver**
  - Simple framework to manage MXC display device drivers.
  - Examples: LCD, TVE, MIPI, VGA, HDMI

- **IPU Processing Driver**
  - Manage IPU IC tasks in kernel space.

- **MXC V4L2 Drivers**
  - Based on IPU processing driver.
MXC Display Driver - Files

- MXC Display Driver files
  drivers/video/mxc/mxc_dispdrv.h
  drivers/video/mxc/mxc_dispdrv.c
- IPU framebuffer driver
  drivers/video/mxc/mxc_ipuv3_fb.c
- Display device drivers
  drivers/video/mxc/mxc_lcdif.c
  drivers/video/mxc/mxc_hDMI.c
  drivers/video/mxc/mipi_dsi.c
  ...

---

freescale™ External Use | 79
struct mxc DispDrv_driver {
    const char *name;
    int (*init) (struct mxc DispDrv_handle *, struct mxc DispDrv_setting *);
    void (*deinit) (struct mxc DispDrv_handle *);
    /* display driver enable function for extension */
    int (*enable) (struct mxc DispDrv_handle *, struct fb_info *);
    /* display driver disable function, called at early part of fb_blank */
    void (*disable) (struct mxc DispDrv_handle *, struct fb_info *);
    /* display driver setup function, called at early part of fb_set_par */
    int (*setup) (struct mxc DispDrv_handle *, struct fb_info *fbi);
};

struct mxc DispDrv_setting {
    /* input-feedback parameter */
    struct fb_info *fbi;
    int if_fmt;
    int default bpp;
    char *dft_mode_str;
    /* feedback parameter */
    int dev_id;
    int disp_id;
};
struct mxc_dispdrv_entry *mxc_dispdrv_register(
    struct mxc_dispdrv_driver *drv);

int mxc_dispdrv_unregister(struct mxc_dispdrv_entry *entry);

struct mxc_dispdrv_handle *mxc_dispdrv_gethandle(char *name,
    struct mxc_dispdrv_setting *setting);

int mxc_dispdrv_setdata(struct mxc_dispdrv_entry *entry,
    void *data);

void *mxc_dispdrv_getdata(struct mxc_dispdrv_entry *entry);
MXC Display Driver – Configuration Flow

LDB
RGB666
LDB-XGA
lpu0
di1

1

mxc_dispdrv_register
mxc_dispdrv_setdata

3
init

4

fb_add_videomode
fb_set_var

1

IPUv3 fb driver
dev=ldb
mode_str=?
fbi

2
mxc_dispdrv_gethandle

mxc_dispdrv
List
MXC Display Driver

- Command Line Options:
  first display:
  \[\text{video}=\text{mxcfb0}:\text{dev}=\text{dispdrv\_name},\text{mode\_str},\text{if}=\text{if\_fmt} \]
  second display:
  \[\text{video}=\text{mxcfb1}:\text{dev}=\text{dispdrv\_name},\text{mode\_str},\text{if}=\text{if\_fmt} \]

For example:

\[\text{video}=\text{mxcfb0}:\text{dev}=\text{hdmi},1920\times1080\text{M@60},\text{if}=\text{RGB24} \]
\[\text{video}=\text{mxcfb1}:\text{dev}=\text{ldb},\text{LDB-XGA},\text{if}=\text{RGB666} \]
\[\text{video}=\text{mxcfb2}:\text{dev}=\text{lcd},800\times480\text{M@55},\text{if}=\text{RGB565} \]
MXC Display Driver – Multi-Display Options

hdmi + lvds
   video=mxcfb0:dev=hdmi,1920x1080M@60,if=RGB24
   video=mxcfb1:dev=ldb,LDB-XGA,if=RGB666

lvds + lvds
   video=mxcfb0:dev=ldb,LDB-XGA,if=RGB666
   video=mxcfb1:dev=ldb,LDB-XGA,if=RGB666

cd + lvds
   video=mxcfb0:dev=lcd,800x480M@55,if=RGB565
   video=mxcfb1:dev=ldb,LDB-XGA,if=RGB666

hdmi + lvds + lvds
   video=mxcfb0:dev=hdmi,1920x1080M@60,if=RGB24
   video=mxcfb1:dev=ldb,LDB-XGA,if=RGB666
   video=mxcfb2:dev=ldb,LDB-XGA,if=RGB666
Example of MXC Display Driver - HDMI

• Software Architecture:
  - HDMI multifunction driver (MFD) manages software resources common to video and audio drivers
  - Audio driver uses ALSA/SoC audio framework.
  - Video driver:
    ▪ MXC Display Driver API to register with IPU FB driver
    ▪ Linux Framebuffer (FB) API to change the video mode and receive notifications of mode changes
HDMI II and the i.MX 6Quad Framebuffer and Display Device Architecture
Overview of IPU Drivers

• MXC Display Driver
  – Simple framework to manage MXC display device drivers.
  – Examples: LCD, TVE, MIPI, VGA, HDMI

• IPU Processing Driver
  – Manage IPU IC tasks in kernel space.

• MXC V4L2 Drivers
  – Based on IPU processing driver.
IPU Processing Driver - Introduction

- Each IPU has two kernel threads for IC task PP&VF
- Each kernel thread performs the tasks on its task queue list
- Each task executes the following sequence:
  `ipu_init_channel → ipu_init_channel_buffer → request_ipu_irq → ipu_enable_channel → wait_irq (task finish) → ipu_disable_channel → ipu_uninit_channel`

- Tasks are based on single buffer mode
- Split mode tasks will be split into 2 tasks per IPU.
- An application only needs to prepare a task and queue it
- Task operations include
  - Setting the task input/overlay/output/rotation/deinterlacing/buffer
  - Call `ioctl IPU_CHECK_TASK` first to adjust parameters according to feedback
  - Call `ioctl IPU_QUEUE_TASK` to queue task
- `IPU_QUEUE_TASK` is a blocking `ioctl`. 
IPU Processing Driver

• Files
  include/linux/ipu.h
  drivers/mxc/ipu3/ipu_device.c

• Structures
  see include/linux/ipu.h

• Ioctls
  #define IPU_CHECK_TASK _IOWR('I', 0x1, struct ipu_task)
  #define IPU_QUEUE_TASK _IOW('I', 0x2, struct ipu_task)
  #define IPU_ALLOC _IOWR('I', 0x3, int)
  #define IPU_FREE _IOW('I', 0x4, int)
IPU Processing Driver

- Example
  
  `linux-test/test/mxc_ipudev_test/mxc_ipudev_test.c`
IPU Processing Driver

• Advantages
  – Easy to use.
  – Provides workaround for IPU suspend/resume issue
    ▪ Cannot suspend when double buffering is enabled
  – All IC tasks may be based on this IPU processing driver, including
    – user applications and V4L2 output and capture drivers
      ▪ Reason: Easier to debug if there is an issue.

• Disadvantages
  – Based on kernel thread, all control is done by software, so Linux scheduler may have undesirable impact on the system.
  – Single-buffer mode doesn’t perform as well as double-buffer mode
Overview of IPU Drivers

• MXC Display Driver
  – Simple framework to manage MXC display device drivers.
  – Examples: LCD, TVE, MIPI, VGA, HDMI

• IPU Processing Driver
  – Manage IPU IC tasks in kernel space.

• MXC V4L2 Drivers
  – Based on IPU processing driver.
V4L2 – Common Kernel API

• What is Video4Linux (V4L)?
  - V4L is the original video capture/overlay API of the Linux kernel. It appeared late in the 2.1.x development cycle in the Linux kernel.

• What about V4L2?
  - V4L2 is the second generation of the video4linux API which fixes a number of design bugs of the first version. It was integrated into the standard kernel in 2.5.x.
  - V4L2 is an interface for analog radio, video capture and output drivers.
  - Hardware acceleration capabilities (IPUv3) are leveraged in V4L2 drivers and provided in the Linux BSP
  - Upper level software that uses the V4L2 API, such as G-streamer source/sink and Android camera HAL, does not need to understand the underlying hardware.

• Documentation / Web Resource / API spec
  - Documentation/video4linux/ subdirectory in kernel tree.
  - http://v4l2spec.bytesex.org - the spec of V4L2
MXC V4L2 – User APIs

- VIDIOC_QUERYCAP
- VIDIOC_G_FMT / VIDIOC_S_FMT
- VIDIOC_REQBUFS
- VIDIOC_QUERYBUF
- VIDIOC_QBUF / VIDIOC_DQBUF
- VIDIOC_STREAMON / VIDIOC_STREAMOFF
- VIDIOC_G_CTRL / VIDIOC_S_CTRL
- VIDIOC_CROPCAP / VIDIOC_G_CROP / VIDIOC_S_CROP
- VIDIOC_ENUMOUTPUT / VIDIOC_G_OUTPUT / VIDIOC_S_OUTPUT

APIs used only for MXC V4L2 capture:
- VIDIOC_ENUMINPUT / VIDIOC_G_INPUT / VIDIOC_S_INPUT

APIs used only for MXC V4L2 TV-in:
- VIDIOC_ENUMSTD / VIDIOC_G_STD / VIDIOC_S_STD
**MXC V4L2 – Internal APIs**

New version of V4L2 framework supports master / slave device drivers:
- Support multiple master devices and multiple slave devices
- `mxc_v4l2_capture` driver is the V4L2 master driver
- Camera drivers and `tv-in` driver are V4L2 internal slave drivers

These `ioctl`s are used in kernel internally for MXC V4L2 capture/tvin especially:
- `ioctl_dev_init` / `ioctl_dev_exit`
- `ioctl_s_power`
- `ioctl_g_ifparm`
- `ioctl_init`
- `ioctl_g_fmt_cap`
- `ioctl_g_parm` / `ioctl_s_parm`
- `ioctl_queryctrl` / `ioctl_g_ctrl` / `ioctl_s_ctrl`
V4L2 – Usage and Examples

• How to use V4L2:
  Generally, programming a V4L2 device consists of these steps:
  - Opening the device
  - Changing device properties, selecting a video and audio input, video standard, picture brightness, etc.
  - Negotiating a data format
  - Negotiating an input/output method
  - Executing the actual input/output loop
  - Closing the device

• Please refer to the following examples:
  - BSP team’s test cases:
    git clone git://sw-git01-tx30.am.freescale.net/linux-test.git (without password)
  - Test team’s VTE test cases
  - Camera HAL in Android source code
  - G-streamer source/sink source code
Features of MXC V4L2 Capture

• **Still capture**: capture a frame in a buffer, users can read the buffer and store the picture in a file.

• **Preview**: show the capture frames directly onto the framebuffer. Users can choose the framebuffer number on which the video will be shown.

• **Video capture**: capture frames in allocated buffers. Users can get the frames by calling `VIDIOC_DQBUF` and then send them to VPU for encoding or save them in a file.
Features of MXC V4L2 Capture

- To capture a still image
- No resizing/rotation/CSC can be done.
  - One image can be converted to be in a different pixel format within the same color space with the raw data by using CSI->SMFC->MEM IDMAC channel.
Features of MXC V4L2 Capture

- **To preview a captured video on frame buffer**
  - Preview on fb0 – Resizing/rotation/CSC can be done in IC(Prp_VF) channels. Manually control the buffer ready flags in interrupt handler. Use DP(DP_BG) channel to display the captured frames.
  - Preview on fb2 - Resizing/rotation can be done in IC(Prp_VF) channels. CSC can be done in DP(DP_FG) channel. The flow is totally controlled by FSU.
Features of MXC V4L2 Capture

To capture frames in allocated buffers (using IC channel)
- Resizing/rotation/CSC can be done in IC (PRP_ENC) channels. Manually control the buffer ready flags in interrupt handler.
- MXC V4L2 capture maintains the numbers of buffers.
- Users can get the captured buffer by calling `VIDIOC_DQBUF` ioctl and return the buffer to the kernel by calling `VIDIOC_DQBUF` ioctl.
- NOTE: Camera preview and capturing frames into buffers can be used at the same time.
Features of MXC V4L2 Capture

To capture frames in allocated buffers (using SMFC channel)

- No resizing/rotation/CSC can be done. Manually control the buffer ready flags in interrupt handler.
- MXC V4L2 capture maintains numbers of buffers.
- Users can get the captured buffer by calling `VIDIOC_QBUF ioctl` and return the buffer to the kernel by calling `VIDIOC_QBUF ioctl`.

![Diagram of MXC V4L2 Capture System](image-url)
Features of MXC V4L2 Output

• Support for playing video using one framebuffer at a time:
  - DP-BG framebuffer
  - DP-FG framebuffer
  - DC framebuffer

• Support for the following modes:
  - IC normal mode – resizing / CSC / rotation, using PP channel
  - IC bypass mode – CSC, using DP or DC channel directly
  - IC horizontal/vertical split mode – resizing / CSC / rotation, support high resolution output, using PP channel
  - VDI-IC video deinterlacing mode - deinterlacing / resizing / CSC / rotation, using PRP_VF channel, including high motion mode and low motion mode.

Note: V4L2 output and V4L2 capture can run at the same time if there is no IC or DP/DC channel conflict.
Features of MXC V4L2 Output

IC normal mode (using PP channel)
- Resizing/rotation/CSC. Manually control the IC output/display input buffer ready flags in interrupt handler and control IC input buffer ready flags in timer handler.
- MXC V4L2 output maintains numbers of buffers.
- Users can show the buffer on one framebuffer by calling `VIDIOC_DQBUF` ioctl and return the buffer to the kernel by calling `VIDIOC_DQBUF` ioctl.
Features of MXC V4L2 Output

IC bypass mode (using DP or DC channel)
- CSC can be done, but no resizing or rotation can be done. Manually control the display output buffer ready flags in the interrupt handler.
- MXC V4L2 output maintains numbers of buffers.
- Users can show the buffer on one `VIDIOC_QBUF` ioctl.
Features of MXC V4L2 Output

- **IC horizontal split mode (using PP channel)**
- Resizing/rotation/CSC. Manually control the IC output/IC input (right stripe)/display input buffer ready flags in the interrupt handler and control IC input (left stripe) buffer ready flags in the timer handler.
- MXC V4L2 output maintains the number of buffers.
- Users can show the buffer on one framebuffer by calling `VIDIOC_QBUF` ioctl and return the buffer to kernel by calling `VIDIOC_QBUF` ioctl.
Features of MXC V4L2 Output

- VDI-IC video deinterlacing mode (using PRP_VF channel)
- Resizing/rotation/CSC can be done. Manually control the IC output/display input buffer ready flags in interrupt handler and control IC input buffer ready flags in timer handler.
- MXC V4L2 output maintains the numbers of buffers.
- Users can show the buffer on one framebuffer by calling VIDIOC_DQBUF ioctl and return the buffer to the kernel by calling VIDIOC_QBUF ioctl.
How do we integrate IPUv3 into MXC V4L2?

• Based on analysis of the IPUv3 spec
• What channel should we use for the framebuffer?
• What channel should we use for V4L2 capture and V4L2 output?
• IPU low-level API design – enable/disable channel, init/unit channel, init channel buffer, interrupt handler register interface…
• Invoke IPU low-level APIs from the MXC V4L2 driver.
• Ensure backwards compatibility in the IPU low-level APIs in cases where the hardware has not changed dramatically.
Use case Examples & Tips
IPUv3 tips

• Use VDOA
  – For more efficient DDR access pattern
• Refresh the display at the rate of the content
  – For displays the perform frame rate conversion.
  – Sometimes called 24P cinema
  – Significantly reduces the amount of data read by IPU.
IPUv3 tips

• Buffer management
  - IPU write channel needs a free buffer in the DDR to start writing data.
  - If there’s no free buffer IPU’s internal FIFOs are filled, causing additional latencies
  - Buffer management system should guarantee that there’re always a free buffer for IPU’s usage.
  - IPU can start writing the data to that free buffer immediately avoiding unnecessary
IPUv3 tips

• Move load from the IC
  – Perform CSC (Color Space Conversion), in DP (Display Processor), and not in the IC. (Save memory bandwidth, and lower load on the IC).
  – Move combining tasks to the VDIC (if not used as de interlacer)
  – Consider the IC processing speed, for the tasks
    ▪ Resize – 2 cycles/pixel
    ▪ Combine – 2 cycles/pixel
    ▪ CSC – 3 cycles/pixel

• Flipping an image (a.k.a 180° rotation)
  – Use H-flip and V-flip transfers, done by IDMAC and IC, and not using the IRT module.
IPUv3 tips- Optimizing memory accesses

• Optimize Pixel formats
  • The larger the chunks of data are – the easier it is on the DDR
  • The smaller amount of bursts – better for the memory bus system
  • Choose the mode that works best for the specific use case and avoid the rest

<table>
<thead>
<tr>
<th>Format</th>
<th>Amount of data per macro block</th>
<th>Burst size</th>
<th>DDR3 x64 BL</th>
<th>Amount of bursts per macro block</th>
<th>Target</th>
</tr>
</thead>
<tbody>
<tr>
<td>YUV422 interleaved</td>
<td>256 bytes</td>
<td>16 bytes</td>
<td>2</td>
<td>16</td>
<td>Best: IPU =&gt; IPU; VDOA =&gt; IPU</td>
</tr>
<tr>
<td>YUV422 partial interleaved</td>
<td>256 bytes</td>
<td>8 bytes + 8 bytes</td>
<td>1 + 1</td>
<td>32</td>
<td></td>
</tr>
<tr>
<td>YUV422 non interleaved</td>
<td>256 bytes</td>
<td>8 bytes + 4 bytes + 4 bytes</td>
<td>1 +1+1</td>
<td>48</td>
<td></td>
</tr>
<tr>
<td>YUV420 interleaved</td>
<td>256 bytes</td>
<td>16 bytes</td>
<td>2</td>
<td>16</td>
<td></td>
</tr>
<tr>
<td>YUV420 partial interleaved (NV12)</td>
<td>192 bytes</td>
<td>8 bytes + 8 bytes</td>
<td>1 + 1</td>
<td>32</td>
<td>Best: VPU =&gt; VDOA (decode) Best: IPU =&gt; VPU (encode)</td>
</tr>
<tr>
<td>YUV420 non interleaved</td>
<td>192 bytes</td>
<td>8 bytes + 4 bytes + 4 bytes</td>
<td>1 +1+1</td>
<td>48</td>
<td></td>
</tr>
</tbody>
</table>
IPUv3M tips

• How to work efficiently with the memory system
  – Use real time channels
    ▪ Marking IPU accesses with an AXI ID to bypass the PL301’s arbitration
  – Lock feature
    ▪ issue a series of IPU bursts the belong to the same channel – better chance for DDR hit
  – Conditional read
    ▪ If an alpha mask is provided to the overlay plane transparent pixels are not read from memory.
IPUv3 tips

• Recommended Display Connectivity

<table>
<thead>
<tr>
<th>i.MX51</th>
<th>24-bit RGB</th>
<th>RGB666</th>
<th>RGB565</th>
<th>RGB555</th>
<th>24-bit YCbCr</th>
<th>YCbCr4:4:4</th>
</tr>
</thead>
<tbody>
<tr>
<td>DISP1_DAT0</td>
<td>B0</td>
<td>B0</td>
<td>B0</td>
<td>B0</td>
<td>Y0</td>
<td>Y0</td>
</tr>
<tr>
<td>DISP1_DAT1</td>
<td>B1</td>
<td>B1</td>
<td>B1</td>
<td>B1</td>
<td>Y1</td>
<td>Y1</td>
</tr>
<tr>
<td>DISP1_DAT3</td>
<td>B3</td>
<td>B3</td>
<td>B3</td>
<td>B3</td>
<td>Y3</td>
<td>Y3</td>
</tr>
<tr>
<td>DISP1_DAT4</td>
<td>B4</td>
<td>B4</td>
<td>B4</td>
<td>B4</td>
<td>Y4</td>
<td>Y4</td>
</tr>
<tr>
<td>DISP1_DAT5</td>
<td>B5</td>
<td>B5</td>
<td>G0</td>
<td>G0</td>
<td>Y5</td>
<td>Y5</td>
</tr>
<tr>
<td>DISP1_DAT6</td>
<td>B6</td>
<td>G0</td>
<td>G1</td>
<td>G1</td>
<td>Y6</td>
<td>Y6</td>
</tr>
<tr>
<td>DISP1_DAT7</td>
<td>B7</td>
<td>G1</td>
<td>G2</td>
<td>G2</td>
<td>Y7</td>
<td>Y7</td>
</tr>
<tr>
<td>DISP1_DAT8</td>
<td>G0</td>
<td>G2</td>
<td>G3</td>
<td>G3</td>
<td>Cb0</td>
<td>Cb0</td>
</tr>
<tr>
<td>DISP1_DAT9</td>
<td>G1</td>
<td>G3</td>
<td>G4</td>
<td>G4</td>
<td>Cb1</td>
<td>Cb1</td>
</tr>
<tr>
<td>DISP1_DAT10</td>
<td>G2</td>
<td>G4</td>
<td>G5</td>
<td>R0</td>
<td>Cb2</td>
<td>Cb2</td>
</tr>
<tr>
<td>DISP1_DAT11</td>
<td>G3</td>
<td>G5</td>
<td>R0</td>
<td>R1</td>
<td>Cb3</td>
<td>Cb3</td>
</tr>
<tr>
<td>DISP1_DAT12</td>
<td>G4</td>
<td>R0</td>
<td>R1</td>
<td>R2</td>
<td>Cb4</td>
<td>Cb4</td>
</tr>
<tr>
<td>DISP1_DAT13</td>
<td>G5</td>
<td>R1</td>
<td>R2</td>
<td>R3</td>
<td>Cb5</td>
<td>Cb5</td>
</tr>
<tr>
<td>DISP1_DAT14</td>
<td>G6</td>
<td>R2</td>
<td>R3</td>
<td>R4</td>
<td>Cb6</td>
<td>Cb6</td>
</tr>
<tr>
<td>DISP1_DAT15</td>
<td>G7</td>
<td>R3</td>
<td>R4</td>
<td>Cb7</td>
<td>Cb7</td>
<td></td>
</tr>
<tr>
<td>DISP1_DAT16</td>
<td>R0</td>
<td>R4</td>
<td></td>
<td>Cr0</td>
<td>Cr0</td>
<td></td>
</tr>
<tr>
<td>DISP1_DAT17</td>
<td>R1</td>
<td>R5</td>
<td></td>
<td>Cr1</td>
<td>Cr1</td>
<td></td>
</tr>
<tr>
<td>DISP1_DAT18</td>
<td>R2</td>
<td></td>
<td></td>
<td>Cr2</td>
<td>Cr2</td>
<td></td>
</tr>
<tr>
<td>DISP1_DAT19</td>
<td>R3</td>
<td></td>
<td></td>
<td>Cr3</td>
<td>Cr3</td>
<td></td>
</tr>
<tr>
<td>DISP1_DAT20</td>
<td>R4</td>
<td></td>
<td></td>
<td>Cr4</td>
<td>Cr4</td>
<td></td>
</tr>
<tr>
<td>DISP1_DAT21</td>
<td>R5</td>
<td></td>
<td></td>
<td>Cr5</td>
<td>Cr5</td>
<td></td>
</tr>
<tr>
<td>DISP1_DAT22</td>
<td>R6</td>
<td></td>
<td></td>
<td>Cr6</td>
<td>Cr6</td>
<td></td>
</tr>
<tr>
<td>DISP1_DAT23</td>
<td>R7</td>
<td></td>
<td></td>
<td>Cr7</td>
<td>Cr7</td>
<td></td>
</tr>
</tbody>
</table>

DI1_PIN2 | HSYNC |
DI1_PIN3 | VSYNC |
DI1_PIN15 | DRDY |
DI1_DISP_CLK | CLK |
IPUv3 - debug

• IPU error interrupts & status bits
  - IPU errors are reported on the IPU_INT_STAT_5, IPU_INT_STAT_6, IPU_INT_STAT_9 and IPU_INT_STAT_10 registers. The 1st debug step should be inspecting these bits
    ▪ A flickering display is normally a result of a system bus load (DDR).
    ▪ These will be reported as “new frame before end of frame error” on IDMAC_NFB4EOF register.
    ▪ Bus loads that causes errors on the CSI side will be reported on *FRM_LOST* status bits
    ▪ Some of IPU internal signals can be routed to pins and measured using the IPU diagnostics unit. These can be used to capture errors/interupts and track internal flows. (the IOMUX needs to be configured to output the ipu_diagbus signals)
IPUv3 - debug

• IPU diagnostics unit
  - Some of IPU internal signals can be routed to pins and measured using the IPU diagnostics unit.
  - These can be used to capture errors/interrupts and track internal flows.
  - The IOMUX needs to be configured to output the ipu_diagbus signals.

• Task status and flow control
  - A frozen display is sometimes a result of wrong control of the buffer management within the IPU.
  - The status of each flow controlled by the FSU can be monitored using the TASKS_STAT status registers.
  - In some cases a user may track the BUF_RDY and CUR_BUF indications of the flow to track the flow.
Dual video-in use case example

Vid in YUV

Vid in RGB

IPU

IC (VF)

IC (PP - copy)

Temp

Background

Instrumental Layer

DISPLAY

DI

CSI

CSI

CSI

GPU

Inverted, 60Hz ?

Inverted, 60Hz ?

Bypass path, depending on needs

Temp is needed for cases BG freq > Vid one

YUV, 20Hz
ITU 656 Rear-View Cam

RGB, 60Hz

Inverted
Playback, HD1080p H.264 HP → Display

IPU

CSI

VDI

IC

DC/DI

DP

Memory

Video
720p
YUV
4:2:0

WXGA
YUV
4:2:2

GUI
RGBA
8888

DISPLAY

60 Frames per sec
30 Frames per sec
Dual Playback, HD720p H.264 HP → WSVGA Display

- **IPU**: Contains CSI and VDI blocks.
- **Memory**: Stores Video 720p YUV 4:2:0, WSVGA YUV 4:2:2, RGB 888, and GUI RGBA 8888.
- **DISPLAY**: Outputs 60 Frames per sec and 30 Frames per sec.

Connections:
- **CSI**: Connects to VDI and DC/DI.
- **VDI**: Connects to IC and DP.
- **IC**: Connects to VDI and DP.
- **DC/DI**: Connects to CSI and DP.
- **DP**: Connects to VDI, IC, and WSVGA YUV 4:2:2.
Demo