ATSC Finding: Relative Timing of Sound and Vision for Broadcast Operations - IS 191

contact site map home


Doc. IS-191
26 June 2003

ATSC Implementation Subcommittee Finding:
Relative Timing of Sound and Vision for Broadcast Operations

Advanced Television Systems Committee
1750 K Street, N.W.
Suite 1200
Washington, D.C. 20006

The Advanced Television Systems Committee, Inc., is an international, non-profit organization developing voluntary standards for digital television. The ATSC member organizations represent the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries.

Specifically, ATSC is working to coordinate television standards among different communications media focusing on digital television, interactive systems, and broadband multimedia communications. ATSC is also developing digital television implementation strategies and presenting educational seminars on the ATSC standards.

ATSC was formed in 1982 by the member organizations of the Joint Committee on InterSociety Coordination (JCIC): the Electronic Industries Association (EIA), the Institute of Electrical and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Television Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE). Currently, there are approximately 160 members representing the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries.

ATSC Digital TV Standards include digital high definition television (HDTV), standard definition television (SDTV), data broadcasting, multi-channel surround-sound audio, and satellite direct-to-home broadcasting.

ATSC Implementation Subcommittee Finding:
Relative Timing of Sound and Vision for Broadcast Operations


The end-to-end DTV audio-video production, distribution, and broadcast system is a complex array of digital processing, compression, decompression, and storage devices. Each component in the system imposes latency on the audio and/or video signals flowing through it. System design goals often call for the relative audio-video latency through each component to be in the sub-millisecond range. Operationally, unequal delays can be imposed on the audio and video signals respectively, and these delays compromise audio-video synchronization.

One of the overarching goals of the DTV broadcasting system is to deliver audio and video in proper synchronization to the viewer. Because each digital audio and video component in the chain from production to reception imposes some degree of latency on the signals passing through it, and the delays imposed on the audio and video signals are typically unequal, each component harbors the potential to cause an audio-video synchronization error at its output. The overall audio-video synchronization error is the algebraic sum of the individual synchronization errors encountered in the chain. While a given synchronization error may cause either a positive or negative differential shift in audio/video timing, the video signal is typically subjected to greater delay than the audio signal, and the tendency is therefore toward video lagging behind audio. Thus, there is a requirement to monitor audio-video synchronization at various points within in the system and to make corrections, where required, in order to deliver to the viewer audio-video synchronization within the required tolerance.

In addition, there are points within the end-to-end chain that require A/V synchronization to be maintained, such as switching points, monitoring points, and transmission/encoding points.

For the purposes of this discussion, the end-to-end DTV system may be divided into four segments: acquisition and production/post production (contribution system), release facility and distribution system, local broadcast station, and home receiver. IS finds that steps must be taken to ensure that the audio and video signals delivered at the output stage of each of the four segments are synchronized within a tight tolerance (see below).
At the production-post production stage, audio-video synchronization errors can occur in the capture stage, in film-to-video transfer, and in editing. The product may be delivered on video tape or by various electronic means, but whatever the delivery medium, IS finds that steps must be taken to ensure that audio-video synchronization in the delivered product falls within the required tolerance.

The release facility segment contains a number of devices through which the DTV audio and video signals are passed, which variously impose compression and de-compression, processing, and storage and their attendant differential delays on the signals. The process of distributing the signals to affiliate stations typically requires compression and decompression steps. IS finds that it is incumbent on the release facility to correct the differential audio/video delays that the signals experience within the plant so that the initial timing relationship is restored to a tight tolerance before the signals reach the distribution encoder. IS also finds that synchronization to a tight tolerance should be maintained in any encode/decode process that is involved in delivering the signals to the affiliate station, so that the tight A/V synchronization can be monitored at switching and other points.
The affiliate station segment contains a number of devices that are similar to those encountered in the release facility segment and that generate the same types of differential audio-video delays, including switching and monitoring points. IS finds that audio-video synchronization should be restored to a tight tolerance before the signals are input to the broadcast station’s ATSC audio and video encoding devices to assure that the presentation time stamps placed on the audio and video access units by the encoder faithfully represent correct synchronization.

IS finds that under all operational situations, at the inputs to the DTV encoding devices, the sound program should be tightly synchronized to the video program. The sound program should never lead the video program by more than 15 milliseconds, and should never lag the video program by more than 45 milliseconds.

MPEG-2 models the end-to-end delay from an encoder’s signal input to a decoder’s signal output as constant. This end-to-end delay is the sum of the delays from encoding, encoder buffering, multiplexing, transmission, de-multiplexing, decoder buffering, decoding, and presentation. Presentation time stamps are required in the MPEG bit stream at intervals not exceeding 700 milliseconds. The MPEG System Target Decoder model allows a maximum decoder buffer delay of one second. Audio and video presentation units that represent sound and pictures that are to be presented simultaneously may be separated in time within the transport stream by as much as one second. In order to produce synchronized output, IS finds that the receiver must recover the encoder’s System Time Clock (STC) and use the Presentation Time Stamps (PTS) to present the audio-video content to the viewer with a tolerance of +/-15 milliseconds of the time indicated by PTS.

Although real aural and visual presentation devices typically have finite and different inherent delays, and may have additional delays imposed by post-processing or output functions, the System Target Decoder models these delays as zero. IS finds that such delays must be corrected before the audio and video signals are presented to the viewer.
The IS has undertaken additional work to develop tolerances for system design. Pending that finding, designers should strive for zero differential offset throughout the system.




Note: ITU R BT.1359-1 (1998) was carefully considered and found inadequate for purposes of audio and video synchronization for DTV broadcasting.