12-06-2014, 03:42 PM
The Internet Multimedia Conferencing Architecture
The Internet Multimedia.pdf (Size: 63.02 KB / Downloads: 71)
Introduction
This article provides an overview of multimedia conferencing on the Internet. The protocols mentioned
are all specified elsewhere as internet-drafts or RFCs. Each RFC gives details of the protocol itself,
how it works and what it does. This document attempts to provide the reader with an overview of how
the components fit together and some of the assumptions made.
The term “conferencing” is used in two different ways: firstly, to refer to bulletin boards and mail
list style asynchronous exchanges of messages between multiple users; secondly, to refer to synchronous
or so-called “real-time” conferencing, including audio, video, shared whiteboads and other applications.
This document is about the architecture for this latter application, in the Internet. There are other
infrastructures for conferencing in the world: POTS (Plain Old Telephone System) networks often provide
voice conferencing and phone-bridges, while the ISDN provides H.320[1] for small, strictly organised
video-telephony conferencing.
The architecture that has evolved in the Internet is far more general as well as being scalable to
very large groups, and permits the open introduction of new media and new applications as they are
devised.
Multicast Service Model
The IP multicast service model is as follows:
? Senders send datagrams to a multicast group address.
? Receivers express an interest in (join) certain multicast groups.
? Multicast routers conspire to deliver multicast group addressed datagrams from the senders to
the receivers.
The important factor here is that senders do not have to know who the receivers are in order to be
able to send to them. In fact, in most situations, no single point in the network needs to know who
all the receivers are, and it is this that makes IP multicast scalable to very large groups. In addition,
receivers do not need to know who the senders are in order to be able to receive traffic from them, and
this solves many conference setup and resource location problems without needing explicit machinery.
There are many multicast routing protocols [2],[3],[4],[5] but all of them satisfy the above service
model. They differ in their mechanisms and in how they scale with number of senders and groups.
On a LAN, group membership is expressed by IGMP[6]. IGMP version 3 allows receivers to express
an interest in only receiving some of the senders to a particular multicast group. Earlier versions
of IGMP only allow a receiver to request to receive all the sources sending to a multicast group.
Address Allocation
How does an application choose a multicast address to use?
In the absence of any other information, we can bootstrap a multicast application by using wellknown
multicast addresses. Routing (unicast and multicast) and group membership protocols[6] can
do just that. However, this is not the best way of manage applications of which there is more than one
instance at any one time.
For these, we need a mechanism for allocating group addresses dynamically, and a directory service
which can hold these allocations together with some key (session information for example - see
later), so that users can look up the address associated with the application. The address allocation and
directory functions should be distributed to scale well.
Internet Service Models
Traditionally the internet has provide best-effort delivery of datagram traffic from senders to receivers.
No guarantees are made regarding when or if a datagram will be delivered to a receiver, however datagrams
are normally only dropped when a router exceeds a queue size limit due to congestion. The
best-effort internet service model does not assume FIFO queuing, although many routers have implemented
this.
With best-effort service, if a link is not congested, queues will not build at routers, datagrams will
not be discarded in routers, and delays will consist of serialisation delays at each hop plus propagation
delays. With sufficiently fast link speeds, serialisation delays are insignificant compared to propagation
delays.
If a link is congested, with best-effort service queuing delays will start to influence end-to-end delays,
and packets will start to be lost as queue size limits are exceeded.
Non-best effort service
Real-time internet traffic is defined as datagrams that are delay sensitive. It could be argued that all
datagrams are delay sensitive to some extent, but for these purposes we refer only to datagrams where
exceeding an end-to-end delay bound of a few hundred milliseconds renders the datagrams useless
for the purpose they were intended. For the purposes of this definition, TCP traffic is normally not
considered to be real-time traffic, although there may be exceptions to this rule.
On congested links, best-effort service queuing delays will adversely affect real-time traffic. This
does not mean that best-effort service cannot support real-time traffic - merely that congested besteffort
links seriously degrade the service provided. For such congested links, a better-that-best-effort
service is desirable.
To achieve this, the service model of the routers can be modified. At a minimum, FIFO queuing can
be replaced by packet forwarding strategies that discriminate different “flows” of traffic. The idea of
a flow is very general. A flow might consist of “all marketing site web traffic”, or “all fileserver traffic
to and from teller machines” or “all traffic from the CEOs laptop wherever it is”. On the other hand, a
flow might consist of a particular sequence of packets from an application in a particular machine to a
peer application in another particular machine between specific times of a specific day.
Receiver Adaptation
Best-effort traffic is delayed by queues in routers between the sender and the receivers. Even reserved
priority traffic may see small transient queues in routers, and so packets comprising a flow will be
delayed for different times. Such delay variance is known as jitter.
Real-time applications such as audio and video need to be able to buffer real-time data at the receiver
for sufficient time to remove the jitter added by the network and recover the original timing
relationships between the media data. In order to know how long to buffer for, each packet must carry
a timestamp which gives the time at the sender when the data was captured. Note that for audio and
video data timing recovery, it is not necessary to know the absolute time that the data was captured at
the sender, only the time relative to the other data packets.
RTP
The transport protocol for real-time flows is RTP[7]. This provides a standard format packet header
which gives media specific timestamp data, as well as payload format information and sequence numbering
amongst other things. RTP is normally carried using UDP. It does not provide or require any
connection setup, nor does it provide any enhanced reliability over UDP. For RTP to provide a useful
media flow, there must be sufficient capacity in the relevant traffic class to accomodate the traffic. How
this capacity is ensured is independant of RTP.
Conference Membership and Reception Feedback
IP multicast allows sources to send to a multicast group without being a receiver of that group. However,
for many conferencing purposes it is useful to know who is listening to the conference, and whether
the media flows are reaching receivers properly. Accurately performing both these tasks restricts the
scaling of the conference. IP multicast means that no-one knows the precise membership of a conference
at a specific time, and this information cannot be discovered, as to try to do so would cause an
implosion of messages, many of which would be lost1. Instead, RTCP provides approximate membership
information through periodic multicast of session messages which, in addition to information
about the recipient, also give information about the reception quality at that receiver. RTCP session
messages are restricted in rate, so that as a conference grows, the rate of session messages remains
constant, and each receiver reports less often.
Session Directories and Invitiation
Recent work i nthe IETF MMusic group has produced specifications for the Session Directory, and
now for a Session Invitation Protocol. Until the advent of these, the Mbone has been somewhat akin
to Citizen Band Radio: groups of users somewhat anarchically tuning in and listening or sending on
dynamically randomly allocated addresses.
To provide some level of coordination, the session directory was designed so that users can coordinate
the allocation of addresses to named sessions, and disseminate information about a session (media
types, start and end times, related information on world wide web, and so on can all be sent out in a
session advertisement). Session advertisements are sent out in UDP multicast packets to a well known
multicast address, by daemons perioically, and cached when users run listeners. Users can create sessions
through a GUI, and they can even browse sessions from a Web browser.