Shadow

Thoughts On 4K, HDMI, Signal Bandwidth and the AV Signal Quality Triangle

Thoughts On 4K, HDMI, Signal Bandwidth and the AV Signal Quality Triangle

One thing all marketing plans need is a number. Big or small, the size doesn’t matter. Just so long as there’s a number, and ideally a single number, that can be said to correlate somehow with a product’s quality then markets will use it. You know the drill; it’s a number someone can point to and claim “If that number is bigger (or smaller), then the product is better (or worse)”. It doesn’t seem to matter if the correlation is ambiguous, or even outright misleading.

Sixty years ago, the audio community adopted total harmonic distortion (THD) as their number, in just one example. For decades there-after, advertisers touted .1% THD, then .01%THD, and even .001% THD as the metric one could trust when buying a stereo amplifier. Unfortunately for the purchaser, THD says almost nothing about how an amplifier will sound in use or whether it’s a good fit for a specific application. But shoppers wanted an easy way to make comparisons, and THD was an easy number to post and digest. The result was misinformation, and even misdirection. It wasn’t until we could move beyond the marketing of that number that we really began to consider the quality of the experience.

Today we’re in a similar situation with video display resolution. Identifying a product as HD, or Ultra HD, 4K or even 8K only tells a small part of the story. The display’s resolution number says very little about its dynamic range, contrast ratio, colorimetry, or temporal performance. It doesn’t even say all that much about the signal resolution! Let’s take a closer look.

Frame Rate

Video is a series of still images, each exhibited in turn on a screen. The changing series creates the illusion of motion. This effect has a name, it’s called the Phi phenomenon. It was identified in 1912 by psychologist Max Wertheimer, and suggests that our interpretation of motion is a result of alternating light positions. You can experience the Phi phenomenon at its most basic every year during the holiday season. Consider how those lights on the Christmas decorations seem to move as if chasing each other around the tree. We know perfectly well that the lights are fixed lamps, and that they’re only turning on and off in sequence. Yet we experience apparent motion.
Video, like computer graphics and film, presents still images as a series of frames (or fields, in the case of interlaced content) that are displayed at varying rates. Broadcast television is generally presented at 25 to 60 frames per second (FPS). Early silent films had frame rates between 16 and 26 FPS, finally settling at 24 FPS by 1930 with the introduction of sound synced to the film.
Why are there different frame rates? In the case of film, 24 FPS was selected because it was the lowest frame rate to which sound could be embedded while still delivering a quality visual experience while being able to save on expensive film. In video applications, the frame rate was locked to the most ubiquitous electronic clock available, the region’s power line frequency (60Hz in the US, 50Hz in EMEA).

The frame rate used for a specific content is selected based on the need for the perception of smooth, detailed motion and is described as the temporal resolution of the display. Human vision can perceive 10 to 12 FPS as discrete elements, while higher rates are interpreted as motion. As video frame rates move beyond 60 FPS, we experience less blurring and fewer motion artifacts. This link - https://frames-per-second.appspot.com/ - will take you to an excellent illustration of the visual effects resulting from changes in frame rate.

Modern video technology is evolving towards higher frame rates that provide smoother perceived motion with fewer visual artifacts. 50p/60p frame rates are the progressive formats most commonly associated with HDTV systems. Several advanced cameras, including the GoPro HD action cameras, can record content at rates as high as 120 frames per second. Some state-of-the-art smartphones can record at 240 FPS rates, but may reduce resolution to 720 by 1280 at this rate to keep the total payload within the specifications of the device processors. Frame rates as high as 300 FPS are being tested for use in sports broadcasts. 300 FPS also sets the upper limit for the H.265 High Efficiency Video Coding compression standard. It should come as no surprise that frame rate is directly proportional to payload size. The higher the video frame rate, the greater the amount of information that must be processed for display.

Frame rate is just one third of the story. Beyond the temporal elements that effect the perception of motion, we must consider how finely the system can resolve specific values of color. Video displays use an additive color system where red, green and blue primary colors (RGB) are mixed to create the full pallet of colors we see. Essentially, any video display, regardless of the source, require these three channels to operate.

Contemporary A/V content uses 8-bit words to describe each of the three monochromatic components of the video payload. There are several chroma decimation algorithms that help save bandwidth, but system design decisions should be based upon the maximum bandwidth expectation of the most likely uncompressed content (chroma decimation schemes will be explored in a separate paper). In digital A/V, an 8-bit package of information contains 28 (256) possible values. This is called the color depth, or bit depth, of the content. In this 8-bit configuration there are 256 possible “shades” of red, or green, or blue. 256 x 256 x 256 gives us a maximum 16.78 million possible combinations, or colors, that can be displayed. This is called the “true color” system, and is sometimes referred to as 24-bit color (24-bits for 8-bits R + 8-bits G + 8-bits B).

There is another, side to this story of which we should be aware. While nearly all video links operate on three additive channels, there are times when there are four channels. The fourth channel is called the alpha channel and is used primarily for video mark-up or annotation, where it defines transparency information. The alpha channel is a mask that specifies how the pixels are changed by an overlay. We won’t revisit the alpha channel in this article, but will save that discussion for a different day.

Deep Color

Deep color is an emerging trend that supports a billion or more colors. Deep color systems may be identified as xvYCC, extended-gamut or high dynamic range (HDR) and can operate at 10, 12 or 16-bits per pixel across the three RGB channels. Deep color may be identified as 30 -(1.1 billion), 36- (68.7 billion) or 48-bit (281.5 trillion) color systems, as described above. Deep color moves us closer to exploiting the full capabilities of human vision, allowing us to take in the maximum amount of information our senses can perceive. Deep color is critical for producing life-like images in VR simulation systems, gaming, art and technical display applications. Once again, it should come as no surprise that color bit-depth is directly proportional to the payload size. The greater the color depth, the greater the amount of information that must be processed for display, and the greater the amount that must move through system links. Color depth is accurately described as the radiometric resolution of the display.

Resolution

Resolution, in this context, is the pixel count of the display in vertical and horizontal terms that also inherently express the image aspect ratio; it’s the spatial resolution of the display.

1080p references an image with 1080 vertical pixels by 1920 horizontal pixels presented in a progressive scan mode (also written as 1920 x 1080p60). 1080 x 1920 = 2,073,600 total pixels or (rounded) two megapixels. 4K Ultra HD (UHD) doubles this pixel count in both directions to 2160 vertical pixels by 3840 (rounded to 4,000, hence 4K) horizontal pixels. 2160 x 3840 = 8,294,400 total pixels or rounded to eight megapixels.

So why is the 4K resolution number important? It might not be in and of itself, but when we start correlating screen size, resolution and viewer position with character legibility, content type and the tasks associated with consuming the visual data displayed by the system we quickly come to a point where the question becomes what can we see? Even more importantly, what do we need to see so we can optimize system design, efficiency and return on investment?

The ANSI/INFOCOMM V202.01 Display Image Size for 2D Content in Audiovisual Systems (DISCAS) standard provides a repeatable, quantifiable, scientific method for answering that question. There are many situations where viewers need to make analytical decisions based on what they’re seeing. Ensuring that a video display can deliver as much information as the viewer’s eyes can take in is a valuable benchmark and an important metric for these applications. Even in less critical circumstances, using a standard based on visual acuity assures a positive outcome through proper analysis.

And then there’s the reality of the marketplace. Four out of every ten flat-panel displays that ship this year, and even more importantly nearly every panel over 50-inches diagonal, will be a 4K set! A recent global survey of media executives foretasted 4K Ultra HD performance attaining mainstream status within five to seven years. More than 76 million 4K screens will ship this year, growing to 96 million in 2018 and more than 120 million per year by the end of the decade!

It should come as no surprise that image resolution is directly proportional to payload size. When the pixel count is increased fourfold, the amount of data that must be transported and processed also increases. Changes in content payload have a direct impact on infrastructure design.

Visual Display Payload

alt

We now see that there are three inter-related measures that effect the payload of a visual display system and, through that, the bandwidth necessary to effectively couple the diverse components required by a fully functional A/V installation. The relationship of the three can be likened to a triangle where each of the angles represents one of these measures. Let’s call this the AV Signal Quality Triangle.

alt

In such a triangle, increasing the value of one of the angles necessitates an equal decrease in the others if the total bandwidth is to remain the same. The sum of the total of the three angles is bounded by the laws of geometry, and the total of the content bandwidth is bounded by the three parameters. If we want to increase resolution, and maintain the same link bandwidth or payload size, then we must decrease the frame rate, the color depth, or both. Likewise, if we increase color depth (or frame rate) and want to run things through a link with the same bandwidth constrictions (either a physical link, like a length of cable or fiber, or a link node such as a processor or other device), then we must decrease the allocation of bandwidth allotted to the other elements.

Check back next week to discover how to calculate the pixel clock rate!