06-10-2016, 03:07 PM
1458060275-ieeenetworkinstantmessaging.pdf (Size: 119.45 KB / Downloads: 8)
Abstract
Instant messaging (IM) and network chat communication have seen an enormous
rise in popularity over the last several years. However, since many of these systems
are proprietary, little has been described about the network technology behind them.
This analysis helps bridge this gap by providing an overview of the available features,
functions, system architectures, and protocol specifications of the three most popular
network IM protocols: AOL Instant Messenger, Yahoo! Messenger, and Microsoft
Messenger. We describe common features across these systems and highlight distinctions
between them. Where possible, we discuss the advantages and disadvantages of
different technical approaches used in these systems to support different features
and functions. We also briefly discuss ongoing efforts to standardize IM and chatbased
protocols in IETF and other standards bodies.
Instant messaging (IM) and Internet chat communication
have seen enormous growth over the last several years.
IM is the private network communication between two
users, whereas a chat session is the network communication
between two or more users. Chat sessions can either be
private, where each user is invited to join the session, or public,
where anyone can join the session. There are on the order
of 100 million Internet IM users, where a user is defined as a
unique name on one of the major public IM networks —
AOL Instant Messenger (AIM), Microsoft Messenger (MSN),
or Yahoo! Messenger (YMSG). To date, little has been documented
about the network protocols used by these systems.
The protocols are not standardized, many of them are proprietary,
and they are even seen as a control point in this business
by the companies involved. This is demonstrated by the
repeated attempts of the IM services to lock out users of
other systems, in an attempt to keep their customers private.
However, enough information is available to determine the
broad characteristics of these systems. We have also used
packet tracing of IM traffic in order to glean further details
into these protocols and systems.
In this article we present an overview of IM protocols as
exemplified by the three popular systems: AIM, MSN, and
YMSG. While each has been designed and implemented separately,
the overall group exhibits similar characteristics with
respect to network and system architecture. For example, all
of the IM protocols allow authenticating with a central server,
engaging in private messages, and conversing in public chat
rooms. In addition, some IM systems allow file transfers, Web
cam usage, using privacy controls, maintaining buddy lists,
voice chat sessions, and other options. We discuss these topics
in more detail in the sections to follow. We analyze the most
recent IM clients available. However, all of the major IM protocols
have undergone significant revisions over the years, and
changes to the protocols occur on a regular basis.
As with all networked applications, IM and chat protocols
have a large potential design space. This survey helps expose some of the dimensions available to a protocol designer and
how existing IM systems chose to decide them. Where possible,
we describe advantages and disadvantages of each design
choice, especially when the choice affects security.
Features and Functions
Most IM systems, including the three that we analyze herein,
use a client-server architecture. IM providers typically host a
set of servers that customers log in to and exchange messages
with. A fundamental issue faced by IM service providers, and
thus designers of the protocols, is how the systems will scale
with large numbers of users. Ideally, each provider desires to
have millions of customers logged on to their systems at each
time. This in turn requires that organizations have a system
architecture that can scale with the number of users. Two
approaches are available here: symmetric and asymmetric. In a
symmetric architecture, each server performs identical functions,
such that a client need not distinguish which server it
contacts to engage in an activity with. In an asymmetric
approach, each server is dedicated to a particular activity such
as logging in, discovering other users on the network, maintaining
a chat room, or forwarding an instant message.
The client-server architecture allows IM service providers
to keep some degree of control over their users. On the positive
side, it helps overcome some of the technical issues associated
with traversing the firewalls that the clients are often
behind. On the negative side, since both control and data
paths go through the central servers, scaling the service to
millions of users is difficult. The scalability issue is particularly
difficult for voice chat sessions. As IM services are beginning
to support voice-chat communications, peer-to-peer data
paths are being used.
AIM uses a client-server architecture for normal operations
but uses a peer-to-peer approach for voice-chat sessions
where the initiator talks directly to the recipient after coordinating
through the system. Two clients thus communicate directly, without using a chat room, using a proprietary voice
protocol. YMSG also uses a client-server architecture for normal
operations as well as voice-chat service. YMSG voice traffic
is routed through a centralized voice-chat server. Clients
first contact a setup server “vc.yahoo.com” which then redirects
the client to the voice-chat hosting server. One benefit of
the YMSG centralized voice server approach is that it can
support multiple users within the same voice-chat session and
each user can specify their own voice specification with the
central voice server based on their network speed. MSN uses
a client-server architecture for normal operations and peer-topeer
for voice-chat communication. MSN voice-chat sessions
are also limited between two users.
All three services provide a range of administrative and
management functions. Most IM systems have mechanisms
for maintaining lists of friends (and even enemies). These are
typically called “buddy lists,” “allow lists,” and “block lists.”
These lists are maintained as persistent state on the server,
which the clients synchronize with when they log in. The lists
are used for several purposes. Buddy lists identify people that
a user wishes to monitor the presence of (for example, to be
notified when they log in). Block lists identify people that a
user wishes to be isolated from, so that the user is not bothered
or harassed by those people. Block lists are a form of
blacklisting; some systems have the complementary feature of
a whitelist called allow lists, which specify that only people on
the list may communicate with the user. AIM, YMSG, and
MSN all have buddy lists and block lists. AIM and MSN also
have allow lists. MSN even has “reverse forward lists,” which
informs you of those users that have you on their forward
(allow) lists. AIM has an additional feature that specifies a
granularity of blocking, called a warning. Warnings are sent in
response to received messages that the client finds unpleasant
or inappropriate. Recipients of warning messages are penalized
by having their sending rate lowered. Warning levels
degrade slowly over time.
A usability feature that some IM systems provide is metamessages
that indicate that the other user in an IM session is
typing. This improves interactivity, allowing the user to realize
that the other party is in the process of composing a message
and potentially hold off on their own typing. The “typing”
messages are consequently a message type in the IM protocol.
AIM, YMSG, and MSN have such message types. AIM even
has three granularities: typing, not typing, and typed but
erased. One option YMSG provides that the others do not is
the ability to send IM’s to users that are not currently logged
on to the system. The system saves the messages on persistent
storage and then delivers them to the recipient when that person
logs on to the service.
An interesting feature offered by AIM is the ability to
engage in secure communications by encrypting the IM session.
Clients can obtain public keys from AOL, as well as the
corresponding certificates to verify them. Secure instant messages
are done using SSL and the two peer public keys.
Secure chat rooms are created using a shared 256-bit AES
secret key chosen by the chat room creator; invitations to the
chat room include the secret key. YMSG and MSN do not
have any similar capability. Peer-to-peer text communication
is also offered by some systems using direct TCP connections
between clients, sometimes called “side chats.” AIM and
YMSG have this feature, but MSN does not.
System Architecture
All three commercial systems use server clusters for scalability.
AIM and MSN take the asymmetric approach. AIM defines
several types of servers: login, BOS (basic OSCAR services),
icon, user search, chat room setup, and chat room hosting.
MSN defines three types: dispatch, notification, and switchboard.
We describe how these servers are used in more detail
below.
In contrast, YMSG takes the symmetric approach. Clients
need only contact one type of server and then route all kinds
of activities though that particular server. For example,
YMSG connects to a random server in the
cs#.msg.dcn.yahoo.com domain, where # is a two-digit
decimal number. All subsequent communication is routed
through that server.
Session Distribution
We now examine in detail how the different systems distribute
sessions across the servers in response to different actions.
The AIM system architecture is depicted in Fig. 1. In AIM,
after the client logs in with the main authentication server
(step 1 in Fig. 1), the client is directed to a BOS server. The
client opens a single TCP connection to the BOS server (step
2), which is effectively the control channel. Most subsequent
communication occurs over this connection, such as basic
instant messages. Persistent connections are also made to the
email server (step 3) and the user interest server (step 4). New
services (checking email status, looking up a user, etc.) require
sending a service request to the BOS server, which replies with
a new IP address and TCP port number to contact for that
particular service. A new connection is then made to that server.
The exception is when a user wishes to join or create a chat
room session. In this case, the client first contacts the BOS
server to get access to the chat room setup server (step 5),
which grants permission to a chat room. The credentials from
the chat room setup server are then presented to the BOS
server (step 6), which then points the client to a particular chat
room server (step 7). Each chat room session is maintained
using a separate TCP connection. The connection to the chat
room setup server persists until several minutes after all chat
room sessions are ended. The BOS server can force a client to
switch to another BOS server through a migration message.
In 1998, AOL purchased Mirabilis Ltd., the creator of the
ICQ instant-messaging software, and converted the AIM network
to use a version of the ICQ OSCAR protocol. OSCAR,
which stands for Open System for Communication in Realtime,
is somewhat misleading, since AOL has never published
the specifications of the protocol. There are some differences
between features supported by ICQ and AIM but overall the
underlying protocol is the same.
The MSN system architecture is shown in Fig. 2. MSN also
has an asymmetric architecture, but with only three types of
servers: dispatch, notification, and switchboard. A client initially
contacts the well-known dispatch server (step 1 in Fig. 2)
if it does not know of any notification servers. The dispatch
server then redirects the client to a notification server. The
client then opens a single connection to the notification server
(step 2) and maintains this connection as long as the client is
logged into the system. This is the control channel in the
MSN architecture. The notification server maintains the presence
of users in the system, and points the client to individual
switchboard servers when a new instant message or chat session
is created (step 4); step 3 will be discussed in the next
subsection. The switchboard server is used both for chat sessions
and instant messages to other clients; this differs from
the other services in that MSN treats instant messages and
private chat rooms identically. Instant messages are actually
chat rooms set up between two users where additional users
can be invited to the chat room. The TCP connection to the
switchboard is open for the lifetime of the chat or IM communication to the other client. The switchboard server also handles
invitations for file transfers, video, and voice. While MSN
does not have an explicit migration mechanism, the notification
server can close the client connection, forcing the client
to start over.
YMSG, on the other hand, is very simple due to its symmetric
architecture, and is shown in Fig. 3. The same connection
is used for all instant messages and chat sessions.
Many corporate environments employ firewalls to screen
unwanted traffic, with a common default to allow HTTP traffic.
Because of this, many IM systems allow tunneling over
HTTP as a way around these firewalls. Interestingly, the three
commercial IM systems all use the same symmetric architecture
when tunneled over HTTP; namely, the client only interacts
with a single HTTP front-end server. The native IM
protocol is effectively encapsulated on top of HTTP, with
commands and responses being multiplexed over HTTP connections.
AIM uses two HTTP connections; one for submitting
requests asynchronously, and the other that blocks
waiting for the responses. YMSG uses a single synchronous
connection, such that each request blocks until a response is
received from the network. MSN also uses a single connection,
but submits requests asynchronously and either receives
a response or polls for a response depending upon the type of
request.