The Advanced TV Problem:
Old vs New Thinking on the Digital Frontier
Why the PC Industry Thinks the TV Industry Is About to Err Seriously
and Cost the American Consumer Billions of Dollars Unnecessarily
The FCC is poised to define the digital TV standard for the U.S.
The Congress, desirous to balance the budget by 2002, is hungry for chunks of the broadcast spectrum.
The personal computer industry, one of the most dynamic in the U.S. and the world, would love to make new business using newly available digital bandwidth.
So why is the PC industry busily involved in a last ditch battle to change the FCC's mind about its pending decision? And why isn't the FCC listening?
ACATS: The advisory committee appointed by the FCC to recommend to it the new digital (advanced) TV standard. It took 8 years to craft a proposal recently submitted to the FCC for approval. This group's proposal is sometimes referred to as the Grand Alliance proposal.
CICATS: The recently organized computer industry coalition fighting the ACATS proposal now before the FCC. This group features Microsoft, Apple, Intel, Compaq, Novell etc., a total of 10 of the hottest companies in the PC business.
Hollywood: Steven Spielberg just spoke out publically against the ACATS proposal. The DGA (Directors Guild of America) and ASC (American Society of Cinematographers) have voiced complaints too.
Congress: Representatives Vern Ehlers and Jack Fields have begun to question the FCC position, and Senator Larry Pressler has too.
ACATS: A grab-bag of 18 formats including interlaced and progressive scan, current resolution and high-definition formats, current aspect ratio and wide-screen aspect ratios, frame rates of 24 (film rate), 30 (current TV rate), and 60 frames per second, and mixtures of square pixel spacing and non-square pixel spacings. An ACATS compatible TV receiver must be able to handle all 18 formats else it "goes black" when a format is broadcast which it doesn't receive. Not one of the 18 formats is natural to computers!
CICATS: One format, 480-line (nominal) progressive scan, square pixel spacing , computer compatible frame rate. The single format is a "base layer" of a logical family of higher resolution formats, but CICATS does not require the higher resolution formats, only that they be compatible with the base layer which a CICATS compatible receiver must receive. There is no "goes black" problem. How enhancement layers are added is more important than what they are (which we claim is not knowable today, especially compared to 5-10 years from now). The base layer alone has much higher quality than today's TV. The base layer is computer friendly. Our position: Ride out Moore's Law and let the market decide when a new technology is economically feasible. See details.
What's at Risk?
Cost: The ACATS proposal requires conversion of all 18 formats to a consumer's single format (else the "goes black" problem). These converters have to handle the highest resolution ACATS formats (approximately 2000x1000 pixels). The cost of such converters, at even a mediocre quality level, is hundreds of dollars more per current TV set, new digital TV set, or PC display than the corresponding CICATS single-format converter. This can be argued simply on the need for 5 times as much memory. There are approximately 20 million TVs sold per year and 20 million PCs sold per year. Thus the ACATS proposal adds a "tax" on the consumer of tens of billions of dollars per year - unnecessarily. High-definition is nice, certainly, but we don't believe consumers, other than the wealthy, will want to spend thousands of dollars for a high-definition TV for which there are only a few hours of content to be broadcast. The remaining consumers have to bear the burden of converters to serve these few. CICATS proposes a system that can easily be enhanced for higher and higher resolutions as the costs of digital memory and computation continue to drop, without burdening current consumers with the excess cost in the meantime.
Quality: Converting interlaced to progressive scan is basically a hard problem unless quality is sacrificed. Converting, say, 60 frames per second to computer-friendly 72 frames per second (say) is also hard unless quality is sacrificed. The CICATS proposal avoids both these potential losses of quality (and economics will certainly dictate low quality converters in order to reach a mass market price point) by disallowing them from the outset. The ACATS proposal ensures loss of quality to all but the wealthy by maintaining a 50-year old compression technology (interlace scan) instead of using modern, much more space-time compression techniques. CICATS proposes that existing "legacy" interlaced TV programs be converted at very high quality at a few places (thousands of TV stations) rather than at low quality in a lot of places (millions of consumer TV sets or PCs). The quality "tax" imposed by the ACATS proposal is unnecessary too.
Spectrum: The new digital TV standard for the country has been allotted a new part of the electromagnetic spectrum - that is, there have been new channels set aside for it. Meanwhile the old analog TV system (the current one) continues to use its current channels - its existing part of the spectrum. Congress would like to have this old part of the spectrum returned to other uses (such as to an auction that would raise billions of dollars to help balance the budget by 2002). The ACATS proposal, being so costly to consumers, almost guarantees that the old analog spectrum will not be available for reuse for 15 years or more (waiting for Moore's Law to kick in sufficiently to lower the exorbitant ACATS costs to affordable consumer costs). The CICATS proposal, being more cost effective, promises to return the old channels in much less time, say 5-7 years or sooner. Every additional year it takes to convert the country from analog to digital TV means another year of the "taxes" in dollars and quality mentioned above. We estimate that the cumulative "tax" burden of the ACATS proposal on the consumer will be about $100 billion! And the old analog channels won't be available for auction or other reuse during this time.
Details of the CICATS Proposal
The CICATS technical proposal for the US digital TV standard is briefly this:
Adopt ACATS Low Levels: That the FCC adopt all ACATS proposals for modulation, error correction, data packetization, and compression for the new digital TV channels.
No Video Format: That the FCC not specify a video data format.
In other words, adopt all low-level ACATS standardization proposals, where low-level means all levels except the video data level, which is not to be standardized by the FCC.
CICATS understands that the FCC may find it impossible to honor the second point above (No Video Format), in which case we propose an alternative second point:
One Required Video Format (Alternative): That the FCC specify a single 480-line (nominal), progressive-scan video format with square pixel spacing, utilizing a base-layer technology concept. Others could be implemented but only one would be required.
As will be explained below, CICATS actually couches this alternative as follows:
One Required Video Format (Alternative): That the FCC specify the CICATS Reference Decoder.
This is as opposed to the ACATS proposal of 18 video formats that do not use a base-layer concept and that include interlaced formats. The CICATS alternative proposal is cost-effective for consumers, immediately gives them higher resolution video, ensures smooth and true interoperability with computers, and is ready for improvements-such as even higher resolutions-as digital component costs drop.
The CICATS Reference Decoder
One way to look at the CICATS proposal is that it severs the decision to go digital from the decision to go high-resolution (or "high-definition"). We believe that going digital is the fundamental revolutionary step. We want to concentrate on doing it right. We believe that adding high resolution is straightforward if the groundwork is in place. We encourage adoption of a posture that allows this to happen when Moore's Law makes it more economically feasible than now, in about 5 years. We re-emphasize, however, that the CICATS base layer alone has higher perceived resolution than today's TV.
The preferred way to think of the CICATS proposal is in terms of a reference decoder. The CICATS Reference Decoder has a memory capable of supporting 1024 horizontal by 512 vertical pixels. This plus the requirement for square pixel spacing implies that the Reference Decoder is capable of decoding any resolution up to and including 1024x512. The following table shows several examples supported by the Reference Decoder on TV displays of various aspect ratios:
|1.33:1 (4:3)||640||480||Current TV format|
|1.78:1||854||480||Approximately the ACATS 16:9 format|
|1.85:1||944||512||Most popular Hollywood format|
|2:1||960||480||Acceptable to Hollywood|
|2:1||1024||512||Acceptable to Hollywood|
|2.37:1||1024||432||Popular widescreen Hollywood format|
Rather than propose a single video format, CICATS proposes that the FCC mandate the Reference Decoder. Then the choice of horizontal resolution becomes a secondary choice. This choice would be left to industry-that is, to market demand.
The CICATS Reference Decoder is a way of specifying a class of video formats acceptable to the computer industry. It is a hardware specification to the same degree that an ACATS video format is a hardware specification. That is, it puts requirements on the hardware but does not specify the implementation that satisfies them. Following are some example uses of the Reference Decoder.
There are arguments for the choice to transmit 640x480 pixels: It is consistent with today's capabilities. Progressively-scanned 640x480 systems have already been demonstrated. The costs and demands for this resolution are well known. The aspect ratio of 4:3 is the current one for which CRT (cathode-ray tube) technology is already well-suited and cost effective. Computer displays are as comfortable with this format as are TV displays. The cost of a converter for one of today's analog sets to receive the new digital signal is minimal for this resolution. CICATS believes this format to be the one most likely to appeal pricewise to consumers now, encouraging them to convert to the new digital standard and thereby release the old analog spectrum.
There are, however, good arguments for other choices within the set allowed by the the Reference Decoder. Consider, for example, 1024 by 512 pixels, the maximum allowed by the Reference Decoder (base layer only). The vertical resolution would be higher than today's analog TV because 512 is greater than 480, but more importantly because progressive 512 lines is equivalent to about 780 interlaced lines. And the horizontal resolution (on a TV set with aspect ratio 2:1) would be very much higher than today's analog TV, as well as spread out much wider. 2:1 aspect is considered desirable by Hollywood. Enhancement later (with an enhancement layer added to the base layer) to a nominal 2048x1024 resolution would be straightforward.
But there are serious counterarguments against the 1024 by 512 choice. The most serious is that displays for such an aspect ratio have not been demonstrated. Even if they were, they would probably be exorbitantly expensive at this time. So the same argument we levy against the expensive ACATS array of formats holds against this format too: Only the wealthy would be able to afford it at first. Sets that displayed in the old 4:3 aspect would either have to letter-box the wide aspect ratio, or pan-and-scan in it, or both (MPEG2 supports all of these choices). Both of these are familiar practices in widescreen films broadcast on TV today. All sets would implement the Reference Decoder but only those capable of 2:1 aspect would get full benefit of the signal.
Notice that a format with approximately 16:9 aspect could be chosen within the parameters of the CICATS Reference Decoder. This is one of the ACATS proposed aspect ratios. This aspect ratio has some of the same problems as just discussed for the 2:1 aspect ratio. In particular, sets to display at that ratio are too expensive for the average consumer. It is not an interesting aspect ratio for Hollywood. On the other hand, CRTs of that aspect have been demonstrated. Pan-and-scan or letterboxing would be required for satisfactory display on sets of smaller aspect ratio, as discussed above for the 2:1 ratio.
In any case, the new digital TV sets would implement the Reference Decoder. They would need 4-5 times less memory than the equivalent ACATS-compliant set so would be optimally cost effective for consumers-and at no loss in quality implied by the conversions required between the 18 ACATS formats at the receiver. The CICATS proposal would be cheaper and better. Over time the cost differential between the two types of sets would diminish (with Moore's Law again) but in the meantime, US consumers would have paid many billions of dollars for unnecessary conversion and suffered unnecessary loss of quality as well.
The full CICATS proposal has both a spatial and a temporal layering system:
Spatial Resolution: A spatial base layer with horizontal resolution determined by the CICATS requirement for square pixel spacing. This is defined by the Reference Decoder above. Future enhancement layers could take the base layer with maximum possible resolution of 1024x512 up to 1536x768 or higher.
Temporal Resolution: A temporal base layer supporting 24, 36, and 72 Hz frame rates.
The notion of a temporal base layer is a new one to these FCC-related discussions and needs some explanation. For example, it might appear that we are proposing three video formats here, one each at 24, 36, and 72 Hz. This is not the case and here's why:
In case of three separate formats, the broadcaster selects one of the three to transmit and the receiver detects which one is sent and converts, if necessary, to its local frame rate. Frame rate conversions are the most difficult, of all the conversions implied by the ACATS proposal, to do with quality at a low price.
In the case of a temporal base layer, all sets would implement the base layer (by definition of a base layer), hence all three frame rates would be implemented. Regardless of transmitted frame rate, a set receiving the proposed temporal base layer signal would operate at 72 Hz frame rate. It would select and decode the appropriate MPEG2 frames (I, P, and B frames in MPEG2 terminology) to form the 72 Hz display. The base layer technology makes this simple to do. It is a selection process rather than a conversion process.
It is important to note that the CICATS temporal base layer does not support 30 Hz or 60Hz. 30 Hz is a relic of interlaced scanning so is not needed in the progressively scanned future. The PC market has determined that 60 Hz is insufficient so it is not included in the CICATS temporal base layer.
But CICATS, again, understands that the FCC might have to support 60 Hz under pressure from the old analog world. In this case, we propose an alternative to the temporal base layer:
Temporal Resolution (Alternative): 24, 60, and 72 Hz frame rates. Not a temporal base layer.
This alternative does extend the CICATS proposal to three video formats, but the three differ only in frame rate. Although we offer this alternative, we want the FCC to understand that it implies conversion hardware and more memory in the receiver, hence more cost to the consumer. Furthermore, the conversions between 60 and 72 Hz are particularly prone to poor quality. Nevertheless, 60 Hz display displayed on 60 Hz sets and 72 Hz displayed on 72 Hz sets would suffer no quality loss.
ACATS: Advisory Committee on Advanced Television Service, to the FCC.
Aspect ratio: The ratio of the width of a picture to its height. Standard (current) TV has an aspect ratio of 4:3 ("4 to 3") = 1.333. The ACATS proposal mixes 4:3 with 16:9 aspect ratios. 16:9 = 1.777 is a strange aspect ratio that is wider than current TV but is not a Hollywood compatible aspect ratio. Hollywood films are most often in 1.85 ("academy") aspect or in 2.37 ("scope") for very wide-screen films. Hollywood would apparently be content with a 2:1 aspect ratio, but not with 16:9.
Base Layer: See layering.
CICATS: Computer Industry Coalition on Advanced Television Service, representing 10 leading personal computer companies (hardware and software): Apple, Compaq, Cray, Dell, Hewlett Packard, Intel, Microsoft, Novell, Oracle, Silicon Graphics, and Tandem.
FCC: The Federal Communications Commission.
Frame rate: The number of video pictures displayed per second. The goal is to seem continuous. Film's frame rate is 24 frames per second, where each frame is repeated 2 (or sometimes 3) times to give the equivalent frame rate of 48 frames per second (or sometimes 72). The word Hertz is used often to abbreviate "frames per second". The highest ACATS frame rate is 60 Hz ("60 Hertz" or 60 frames per second), whereas computer consumers rejected 60 years ago in favor of 70 or more frames per second to avoid objectionable flicker. (Looking at a TV or PC screen out of one's peripheral vision reveals the flicker.) 72 Hz is an attractive frame rate because it is computer friendly and an easy multiple of film rate (film is a major source of all TV content).
Hertz (Hz): One Hertz is short for one cycle per second, or one frame per second. Frequencies were formerly expressed in cycles per second-for example, a radio station might broadcast at 98.1 on the radio dial, meaning at 98.1 megacycles per second. Today this would be expressed as 98.1 megaHz, in honor of electromagnetic pioneer Heinrich Hertz. In a related usage, the "width" of a TV channel is measured in Hz-6 megaHz per channel.
Interlace: Current analog TV scans each frame by first drawing every other horizontal scanline across the face of the TV set, then starting over at the top and drawing all the skipped in-between scanlines. The first set, called a "field", is said to be interlaced with the second set, or second field. Interlaced scanning is opposed to progressive scanning.
Layering: A layered system is a logical system of related frame sizes, rates, and resolutions (as opposed to a grab-bag of unrelated formats as in the ACATS proposal). A layered system has a "base layer" that must be honored plus "enhancement layers" that may be added to the base layer to make it higher resolution. A good example of a layering scheme is that used by Kodak's PhotoCD. Snapshots are taken to a photo house from which they are returned in digital form on a CD, Kodak's PhotoCD. Each of the snapshots will appear on the CD in several resolutions. The base resolution is 768x512 (approximately video resolution), but the CD also contains enhancement layers that are added to the base resolution to make it into 1536x1024 pixels or 3072x2048 pixels. So one CD contains at least these three resolutions. Similarly, a layered TV channel could contain several resolutions simultaneously so long as they were layered logically. ACATS misuses the term "layering" to simply mean a TV picture is layered atop a string of digital bits, which is layered atop a radio frequency modulation technique. Theirs is a much more generic use of the term than the CICATS (or Kodak) use.
Moore's Law: The "law" that says computers get twice as fast every 18 months. In general, anything digital gets twice as good every 1.5 years. For example, memory doubles or the processor gets twice as fast - for a fixed cost - every 1.5 years. To understand how stunningly fast this is, let's restate it as 10 times faster every 5 years (that's the same as 2 times faster every 1.5 years). During the 8 years that ACATS has been working on its proposal, personal computers have increased in speed and memory by a factor of 50 to 100 times (at the same cost). At the beginning of the ACATS process, PCs weren't powerful enough for TV, but now they are. There is good reason to believe that Moore's Law will continue to operate for another 15 years - thus for another improvement of 1000 times over what we have today! This incredible digital revolution is what makes CICATS encourage the FCC not to freeze any digital standards now that it could better make 5-10 years from. We are simply incapable of predicting what an "order of magnitude" (10x) change means conceptually. Any standards made now will look foolish 5 years from now, so only the minimum should be done now. (Just think: two years ago there was no Netscape, and Microsoft was not an Internet company. Things change very fast in the digital world. The old analog modes of thinking do not work.)
Pixel: Short for picture element. In the digital world, a picture is represented by an array of tiny samples or picture elements - so many per line and so many lines. (Pixels, by the way, are single points, not little squares or rectangles as popularly described. We are careful to say square pixel spacing, not "square pixels".)
Progressive: Current PC screens draw each scanline in order from top to bottom. They are said to be "progressively scanned". This is considered opposite to interlaced scanning.
Reference Decoder: CICATS proposes a layered video format scheme by way of a reference decoder, which is a specification of the decoder of the new digital TV signal-separate from the display of that signal. This concept permits a degree of freedom not formerly present in these FCC-related discussions. For example, instead of specifying a specific horizontal resolution, which depends highly on the capabilities of a particular display, the CICATS Reference Decoder says only that the format must have 480 progressive lines (nominally) and square pixel spacing. So, if the display device has electronics and width enough to handle a 2:1 aspect ratio, then the Reference Decoder will honor a width of 960 pixels (assuming the industry has agreed to broadcast this signal). If the display device can only handle 4:3 aspect ratio (as the affordable ones today do), then the Reference Decoder would dictate a horizontal resolution of 640 pixels. The same decoder circuit, at the same parts cost, would handle either situation. The Reference Decoder is not hardware. It is a way of specifying a class of acceptable video formats rather than a single video format. Within this class there is no conversion required, but wide signals (wide aspect ratio) would have to be either letterboxed or pan-and-scanned to a display with smaller aspect ratio.
Resolution: The number of pixels per line and the number of lines - equivalently, the number of horizontal pixels and the number of vertical pixels. Thus a resolution might be given as 2048x1024 pixels, meaning 1024 scanlines with 2048 pixel on each scanline - equivalently, a rectangular array of 2048 times 1024 pixels (about 2 million pixels). Standard (current) TV has 480 lines vertical resolution and about 700 horizontal. But it is interlaced, which brings down its effective resolution to about 320 scanlines. The ACATS "high-definition" format of 1080 vertical lines is really about 700 lines since the 1080 lines are interlaced at the 60 Hz frame rate.
Spectrum: Simply all the channels used for TV, cable TV, AM radio, FM radio, ham radio, and so forth - treated as a single entity. The full electromagnetic spectrum is vast, including X rays, heat, and even ordinary light. The FCC has dominion over only the "radio frequency"-those uses listed in the first sentence of this paragraph. That is, the radio frequency spectrum is a subset of the full electromagnetic spectrum. One TV channel, whether old analog or new digital, is a slice of the radio frequency spectrum.
Square pixel spacing: This just means that the horizontal spacing between pixels is the same as the vertical spacing between pixels (or between scanlines). Although the conversion from non-square pixel spacing of many of the ACATS formats to square pixel spacing is straightforward, there are over 200 million PC operating systems in existence that assume square pixel spacing and do not have the software for doing the conversion.