The Advanced TV Problem:

Old vs New Thinking on the Digital Frontier

Why the PC Industry Thinks the TV Industry Is About to Err Seriously

and Cost the American Consumer Billions of Dollars Unnecessarily

The FCC is poised to define the digital TV standard for the U.S.

The Congress, desirous to balance the budget by 2002, is hungry for chunks of the broadcast spectrum.

The personal computer industry, one of the most dynamic in the U.S. and the world, would love to make new business using newly available digital bandwidth.

So why is the PC industry busily involved in a last ditch battle to change the FCC's mind about its pending decision? And why isn't the FCC listening?

The Players The Proposals
The Risk Glossary

The Players:

New Players:

The Proposals:

What's at Risk?

Details of the CICATS Proposal

The CICATS technical proposal for the US digital TV standard is briefly this:

In other words, adopt all low-level ACATS standardization proposals, where low-level means all levels except the video data level, which is not to be standardized by the FCC.

CICATS understands that the FCC may find it impossible to honor the second point above (No Video Format), in which case we propose an alternative second point:

As will be explained below, CICATS actually couches this alternative as follows:

This is as opposed to the ACATS proposal of 18 video formats that do not use a base-layer concept and that include interlaced formats. The CICATS alternative proposal is cost-effective for consumers, immediately gives them higher resolution video, ensures smooth and true interoperability with computers, and is ready for improvements-such as even higher resolutions-as digital component costs drop.

The CICATS Reference Decoder

One way to look at the CICATS proposal is that it severs the decision to go digital from the decision to go high-resolution (or "high-definition"). We believe that going digital is the fundamental revolutionary step. We want to concentrate on doing it right. We believe that adding high resolution is straightforward if the groundwork is in place. We encourage adoption of a posture that allows this to happen when Moore's Law makes it more economically feasible than now, in about 5 years. We re-emphasize, however, that the CICATS base layer alone has higher perceived resolution than today's TV.

The preferred way to think of the CICATS proposal is in terms of a reference decoder. The CICATS Reference Decoder has a memory capable of supporting 1024 horizontal by 512 vertical pixels. This plus the requirement for square pixel spacing implies that the Reference Decoder is capable of decoding any resolution up to and including 1024x512. The following table shows several examples supported by the Reference Decoder on TV displays of various aspect ratios:

Aspect Horizontal Vertical Remarks
1.33:1 (4:3) 640 480 Current TV format
1.78:1 854 480 Approximately the ACATS 16:9 format
1.85:1 944 512 Most popular Hollywood format
2:1 960 480 Acceptable to Hollywood
2:1 1024 512 Acceptable to Hollywood
2.37:1 1024 432 Popular widescreen Hollywood format

Rather than propose a single video format, CICATS proposes that the FCC mandate the Reference Decoder. Then the choice of horizontal resolution becomes a secondary choice. This choice would be left to industry-that is, to market demand.

The CICATS Reference Decoder is a way of specifying a class of video formats acceptable to the computer industry. It is a hardware specification to the same degree that an ACATS video format is a hardware specification. That is, it puts requirements on the hardware but does not specify the implementation that satisfies them. Following are some example uses of the Reference Decoder.

There are arguments for the choice to transmit 640x480 pixels: It is consistent with today's capabilities. Progressively-scanned 640x480 systems have already been demonstrated. The costs and demands for this resolution are well known. The aspect ratio of 4:3 is the current one for which CRT (cathode-ray tube) technology is already well-suited and cost effective. Computer displays are as comfortable with this format as are TV displays. The cost of a converter for one of today's analog sets to receive the new digital signal is minimal for this resolution. CICATS believes this format to be the one most likely to appeal pricewise to consumers now, encouraging them to convert to the new digital standard and thereby release the old analog spectrum.

There are, however, good arguments for other choices within the set allowed by the the Reference Decoder. Consider, for example, 1024 by 512 pixels, the maximum allowed by the Reference Decoder (base layer only). The vertical resolution would be higher than today's analog TV because 512 is greater than 480, but more importantly because progressive 512 lines is equivalent to about 780 interlaced lines. And the horizontal resolution (on a TV set with aspect ratio 2:1) would be very much higher than today's analog TV, as well as spread out much wider. 2:1 aspect is considered desirable by Hollywood. Enhancement later (with an enhancement layer added to the base layer) to a nominal 2048x1024 resolution would be straightforward.

But there are serious counterarguments against the 1024 by 512 choice. The most serious is that displays for such an aspect ratio have not been demonstrated. Even if they were, they would probably be exorbitantly expensive at this time. So the same argument we levy against the expensive ACATS array of formats holds against this format too: Only the wealthy would be able to afford it at first. Sets that displayed in the old 4:3 aspect would either have to letter-box the wide aspect ratio, or pan-and-scan in it, or both (MPEG2 supports all of these choices). Both of these are familiar practices in widescreen films broadcast on TV today. All sets would implement the Reference Decoder but only those capable of 2:1 aspect would get full benefit of the signal.

Notice that a format with approximately 16:9 aspect could be chosen within the parameters of the CICATS Reference Decoder. This is one of the ACATS proposed aspect ratios. This aspect ratio has some of the same problems as just discussed for the 2:1 aspect ratio. In particular, sets to display at that ratio are too expensive for the average consumer. It is not an interesting aspect ratio for Hollywood. On the other hand, CRTs of that aspect have been demonstrated. Pan-and-scan or letterboxing would be required for satisfactory display on sets of smaller aspect ratio, as discussed above for the 2:1 ratio.

In any case, the new digital TV sets would implement the Reference Decoder. They would need 4-5 times less memory than the equivalent ACATS-compliant set so would be optimally cost effective for consumers-and at no loss in quality implied by the conversions required between the 18 ACATS formats at the receiver. The CICATS proposal would be cheaper and better. Over time the cost differential between the two types of sets would diminish (with Moore's Law again) but in the meantime, US consumers would have paid many billions of dollars for unnecessary conversion and suffered unnecessary loss of quality as well.

The full CICATS proposal has both a spatial and a temporal layering system:

The notion of a temporal base layer is a new one to these FCC-related discussions and needs some explanation. For example, it might appear that we are proposing three video formats here, one each at 24, 36, and 72 Hz. This is not the case and here's why:

In case of three separate formats, the broadcaster selects one of the three to transmit and the receiver detects which one is sent and converts, if necessary, to its local frame rate. Frame rate conversions are the most difficult, of all the conversions implied by the ACATS proposal, to do with quality at a low price.

In the case of a temporal base layer, all sets would implement the base layer (by definition of a base layer), hence all three frame rates would be implemented. Regardless of transmitted frame rate, a set receiving the proposed temporal base layer signal would operate at 72 Hz frame rate. It would select and decode the appropriate MPEG2 frames (I, P, and B frames in MPEG2 terminology) to form the 72 Hz display. The base layer technology makes this simple to do. It is a selection process rather than a conversion process.

It is important to note that the CICATS temporal base layer does not support 30 Hz or 60Hz. 30 Hz is a relic of interlaced scanning so is not needed in the progressively scanned future. The PC market has determined that 60 Hz is insufficient so it is not included in the CICATS temporal base layer.

But CICATS, again, understands that the FCC might have to support 60 Hz under pressure from the old analog world. In this case, we propose an alternative to the temporal base layer:

This alternative does extend the CICATS proposal to three video formats, but the three differ only in frame rate. Although we offer this alternative, we want the FCC to understand that it implies conversion hardware and more memory in the receiver, hence more cost to the consumer. Furthermore, the conversions between 60 and 72 Hz are particularly prone to poor quality. Nevertheless, 60 Hz display displayed on 60 Hz sets and 72 Hz displayed on 72 Hz sets would suffer no quality loss.

Glossary

ACATS: Advisory Committee on Advanced Television Service, to the FCC.

Aspect ratio: The ratio of the width of a picture to its height. Standard (current) TV has an aspect ratio of 4:3 ("4 to 3") = 1.333. The ACATS proposal mixes 4:3 with 16:9 aspect ratios. 16:9 = 1.777 is a strange aspect ratio that is wider than current TV but is not a Hollywood compatible aspect ratio. Hollywood films are most often in 1.85 ("academy") aspect or in 2.37 ("scope") for very wide-screen films. Hollywood would apparently be content with a 2:1 aspect ratio, but not with 16:9.

Base Layer: See layering.

CICATS: Computer Industry Coalition on Advanced Television Service, representing 10 leading personal computer companies (hardware and software): Apple, Compaq, Cray, Dell, Hewlett Packard, Intel, Microsoft, Novell, Oracle, Silicon Graphics, and Tandem.

FCC: The Federal Communications Commission.

Frame rate: The number of video pictures displayed per second. The goal is to seem continuous. Film's frame rate is 24 frames per second, where each frame is repeated 2 (or sometimes 3) times to give the equivalent frame rate of 48 frames per second (or sometimes 72). The word Hertz is used often to abbreviate "frames per second". The highest ACATS frame rate is 60 Hz ("60 Hertz" or 60 frames per second), whereas computer consumers rejected 60 years ago in favor of 70 or more frames per second to avoid objectionable flicker. (Looking at a TV or PC screen out of one's peripheral vision reveals the flicker.) 72 Hz is an attractive frame rate because it is computer friendly and an easy multiple of film rate (film is a major source of all TV content).

Hertz (Hz): One Hertz is short for one cycle per second, or one frame per second. Frequencies were formerly expressed in cycles per second-for example, a radio station might broadcast at 98.1 on the radio dial, meaning at 98.1 megacycles per second. Today this would be expressed as 98.1 megaHz, in honor of electromagnetic pioneer Heinrich Hertz. In a related usage, the "width" of a TV channel is measured in Hz-6 megaHz per channel.

Interlace: Current analog TV scans each frame by first drawing every other horizontal scanline across the face of the TV set, then starting over at the top and drawing all the skipped in-between scanlines. The first set, called a "field", is said to be interlaced with the second set, or second field. Interlaced scanning is opposed to progressive scanning.

Layering: A layered system is a logical system of related frame sizes, rates, and resolutions (as opposed to a grab-bag of unrelated formats as in the ACATS proposal). A layered system has a "base layer" that must be honored plus "enhancement layers" that may be added to the base layer to make it higher resolution. A good example of a layering scheme is that used by Kodak's PhotoCD. Snapshots are taken to a photo house from which they are returned in digital form on a CD, Kodak's PhotoCD. Each of the snapshots will appear on the CD in several resolutions. The base resolution is 768x512 (approximately video resolution), but the CD also contains enhancement layers that are added to the base resolution to make it into 1536x1024 pixels or 3072x2048 pixels. So one CD contains at least these three resolutions. Similarly, a layered TV channel could contain several resolutions simultaneously so long as they were layered logically. ACATS misuses the term "layering" to simply mean a TV picture is layered atop a string of digital bits, which is layered atop a radio frequency modulation technique. Theirs is a much more generic use of the term than the CICATS (or Kodak) use.

Moore's Law: The "law" that says computers get twice as fast every 18 months. In general, anything digital gets twice as good every 1.5 years. For example, memory doubles or the processor gets twice as fast - for a fixed cost - every 1.5 years. To understand how stunningly fast this is, let's restate it as 10 times faster every 5 years (that's the same as 2 times faster every 1.5 years). During the 8 years that ACATS has been working on its proposal, personal computers have increased in speed and memory by a factor of 50 to 100 times (at the same cost). At the beginning of the ACATS process, PCs weren't powerful enough for TV, but now they are. There is good reason to believe that Moore's Law will continue to operate for another 15 years - thus for another improvement of 1000 times over what we have today! This incredible digital revolution is what makes CICATS encourage the FCC not to freeze any digital standards now that it could better make 5-10 years from. We are simply incapable of predicting what an "order of magnitude" (10x) change means conceptually. Any standards made now will look foolish 5 years from now, so only the minimum should be done now. (Just think: two years ago there was no Netscape, and Microsoft was not an Internet company. Things change very fast in the digital world. The old analog modes of thinking do not work.)

Pixel: Short for picture element. In the digital world, a picture is represented by an array of tiny samples or picture elements - so many per line and so many lines. (Pixels, by the way, are single points, not little squares or rectangles as popularly described. We are careful to say square pixel spacing, not "square pixels".)

Progressive: Current PC screens draw each scanline in order from top to bottom. They are said to be "progressively scanned". This is considered opposite to interlaced scanning.

Reference Decoder: CICATS proposes a layered video format scheme by way of a reference decoder, which is a specification of the decoder of the new digital TV signal-separate from the display of that signal. This concept permits a degree of freedom not formerly present in these FCC-related discussions. For example, instead of specifying a specific horizontal resolution, which depends highly on the capabilities of a particular display, the CICATS Reference Decoder says only that the format must have 480 progressive lines (nominally) and square pixel spacing. So, if the display device has electronics and width enough to handle a 2:1 aspect ratio, then the Reference Decoder will honor a width of 960 pixels (assuming the industry has agreed to broadcast this signal). If the display device can only handle 4:3 aspect ratio (as the affordable ones today do), then the Reference Decoder would dictate a horizontal resolution of 640 pixels. The same decoder circuit, at the same parts cost, would handle either situation. The Reference Decoder is not hardware. It is a way of specifying a class of acceptable video formats rather than a single video format. Within this class there is no conversion required, but wide signals (wide aspect ratio) would have to be either letterboxed or pan-and-scanned to a display with smaller aspect ratio.

Resolution: The number of pixels per line and the number of lines - equivalently, the number of horizontal pixels and the number of vertical pixels. Thus a resolution might be given as 2048x1024 pixels, meaning 1024 scanlines with 2048 pixel on each scanline - equivalently, a rectangular array of 2048 times 1024 pixels (about 2 million pixels). Standard (current) TV has 480 lines vertical resolution and about 700 horizontal. But it is interlaced, which brings down its effective resolution to about 320 scanlines. The ACATS "high-definition" format of 1080 vertical lines is really about 700 lines since the 1080 lines are interlaced at the 60 Hz frame rate.

Spectrum: Simply all the channels used for TV, cable TV, AM radio, FM radio, ham radio, and so forth - treated as a single entity. The full electromagnetic spectrum is vast, including X rays, heat, and even ordinary light. The FCC has dominion over only the "radio frequency"-those uses listed in the first sentence of this paragraph. That is, the radio frequency spectrum is a subset of the full electromagnetic spectrum. One TV channel, whether old analog or new digital, is a slice of the radio frequency spectrum.

Square pixel spacing: This just means that the horizontal spacing between pixels is the same as the vertical spacing between pixels (or between scanlines). Although the conversion from non-square pixel spacing of many of the ACATS formats to square pixel spacing is straightforward, there are over 200 million PC operating systems in existence that assume square pixel spacing and do not have the software for doing the conversion.