draft-ietf-sip-media-security-requirements-01.txt   draft-ietf-sip-media-security-requirements-02.txt 
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Object not found!</title>
<link rev="made" href="mailto:ietf-action@ietf.org" />
<style type="text/css"><!--/*--><![CDATA[/*><!--*/
body { color: #000000; background-color: #FFFFFF; }
a:link { color: #0000CC; }
p, address {margin-left: 3em;}
span {font-size: smaller;}
/*]]>*/--></style>
</head>
SIP D. Wing, Ed. <body>
Internet-Draft Cisco <h1>Object not found!</h1>
Intended status: Informational S. Fries <p>
Expires: May 21, 2008 Siemens AG
H. Tschofenig
Nokia Siemens Networks
F. Audet
Nortel
November 18, 2007
Requirements and Analysis of Media Security Management Protocols
draft-ietf-sip-media-security-requirements-01.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on May 21, 2008.
Copyright Notice
Copyright (C) The IETF Trust (2007).
Abstract
This documents describes requirements for a protocol to negotiate
security context for SIP-signaled SRTP media. In addition to the
natural security requirements, this negotiation protocol must
interoperate well with SIP in certain ways. A number of proposals
have been published and a summary of these proposals is in the
appendix of this document.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Document Organization . . . . . . . . . . . . . . . . . . 4
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Attack Scenarios . . . . . . . . . . . . . . . . . . . . . . . 5
4. Call Scenarios . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1. Clipping Media Before Signaling Answer . . . . . . . . . . 7
4.2. Retargeting and Forking . . . . . . . . . . . . . . . . . 8
4.3. Shared Key Conferencing . . . . . . . . . . . . . . . . . 11
4.4. Recording . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5. PSTN gateway . . . . . . . . . . . . . . . . . . . . . . . 13
5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1. Key Management Protocol Requirements . . . . . . . . . . . 14
5.2. Attack Scenario Requirements . . . . . . . . . . . . . . . 16
5.3. Requirements Outside of the Key Management Protocol . . . 18
6. Security Considerations . . . . . . . . . . . . . . . . . . . 18
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9.1. Normative References . . . . . . . . . . . . . . . . . . . 19
9.2. Informative References . . . . . . . . . . . . . . . . . . 19
Appendix A. Overview of Keying Mechanisms . . . . . . . . . . . . 22
A.1. Signaling Path Keying Techniques . . . . . . . . . . . . . 23
A.1.1. MIKEY-NULL . . . . . . . . . . . . . . . . . . . . . . 23
A.1.2. MIKEY-PSK . . . . . . . . . . . . . . . . . . . . . . 23
A.1.3. MIKEY-RSA . . . . . . . . . . . . . . . . . . . . . . 24
A.1.4. MIKEY-RSA-R . . . . . . . . . . . . . . . . . . . . . 24
A.1.5. MIKEY-DHSIGN . . . . . . . . . . . . . . . . . . . . . 24
A.1.6. MIKEY-DHHMAC . . . . . . . . . . . . . . . . . . . . . 24
A.1.7. MIKEY-ECIES and MIKEY-ECMQV (MIKEY-ECC) . . . . . . . 25
A.1.8. Security Descriptions with SIPS . . . . . . . . . . . 25
A.1.9. Security Descriptions with S/MIME . . . . . . . . . . 25
A.1.10. SDP-DH (expired) . . . . . . . . . . . . . . . . . . . 25
A.1.11. MIKEYv2 in SDP (expired) . . . . . . . . . . . . . . . 25
A.2. Media Path Keying Technique . . . . . . . . . . . . . . . 26
A.2.1. ZRTP . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.3. Signaling and Media Path Keying Techniques . . . . . . . . 26
A.3.1. EKT . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.3.2. DTLS-SRTP . . . . . . . . . . . . . . . . . . . . . . 27
A.3.3. MIKEYv2 Inband (expired) . . . . . . . . . . . . . . . 27
Appendix B. Evaluation Criteria - SIP . . . . . . . . . . . . . . 27
B.1. Secure Retargeting and Secure Forking . . . . . . . . . . 27
B.2. Clipping Media Before SDP Answer . . . . . . . . . . . . . 30
B.3. Centralized Keying . . . . . . . . . . . . . . . . . . . . 31
B.4. SSRC and ROC . . . . . . . . . . . . . . . . . . . . . . . 33
Appendix C. Evaluation Criteria - Security . . . . . . . . . . . 35
C.1. Public Key Infrastructure . . . . . . . . . . . . . . . . 35
C.2. Perfect Forward Secrecy . . . . . . . . . . . . . . . . . 37
C.3. Best Effort Encryption . . . . . . . . . . . . . . . . . . 39
C.4. Upgrading Algorithms . . . . . . . . . . . . . . . . . . . 40
Appendix D. Out-of-Scope . . . . . . . . . . . . . . . . . . . . 42
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42
Intellectual Property and Copyright Statements . . . . . . . . . . 44
1. Introduction
The work on media security started when the Session Initiation
Protocol (SIP) was still in its infancy. With the increased SIP
deployment and the availability of new SIP extensions and related
protocols, the need for end-to-end security was re-evaluated. The
procedure of re-evaluating prior protocol work and design decisions
is not an uncommon strategy and, to some extent, considered necessary
protocol work to ensure that the developed protocols indeed meet the
previously envisioned needs for the users in the Internet.
This document summarizes media security requirements, i.e.,
requirements for mechanisms that negotiate security context such as
cryptographic keys and parameters for SRTP.
1.1. Document Organization
The organization of this document is as follows: Section 2 introduces
terminology, Section 3 describes various attack scenarios against the
signaling path and media path, Section 4 provides an overview about
possible call scenarios, Section 5 lists requirements for media
security. The main part of the document concludes with the security
considerations Section 6, IANA considerations Section 7 and an
acknowledgement section in Section 8. Appendix A lists and compares
available solution proposals. The following Appendix B compares the
different approaches regarding their suitability for the SIP
signaling scenarios described in Appendix A, while Appendix C
provides a comparison regarding security aspects. Appendix D lists
non-goals for this document.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119], with the
important qualification that, unless otherwise stated, these terms
apply to the design of the media security key management protocol,
not its implementation or application.
Additionally, the following items are used in this document:
AOR (Address-of-Record): A SIP or SIPS URI that points to a domain
with a location service that can map the URI to another URI where
the user might be available. Typically, the location service is
populated through registrations. An AOR is frequently thought of
as the "public address" of the user.
SSRC: The 32-bit value that defines the synchronization source, used
in RTP. These are generally unique, but collisions can occur.
two-time pad: The use of the same key and the same key index to
encrypt different data. For SRTP, a two-time pad occurs if two
senders are using the same key and the same RTP SSRC value.
PKI: Public Key Infrastructure (see [RFC3280])
Perfect Forward Secrecy (PFS): The property that disclosure of the
long-term secret keying material that is used to derive an agreed
ephemeral key does not compromise the secrecy of agreed keys from
earlier runs.
active adversary: An active adversary attempts to alter system
resources or affect their operation (see [RFC4949]).
passive adversary: A passive adversary attempts to learn or make use
of information from a system but does not affect resources of that
system (see [RFC4949]).
signaling path: The signaling path is the route taken by SIP
signaling messages transmitted between the calling and called user
agents. This can be either direct signaling between the calling
and called user agents or, more commonly involves the SIP proxy
servers that were involved in the call setup.
media path: The media path is the route taken by media packets
exchanged by the endpoints. In the simplest case, the endpoints
exchange media directly, and the "media path" is defined by a
quartet of IP addresses and TCP/UDP ports, along with an IP route.
In other cases, this path may include RTP relays, mixers,
transcoders, session border controllers, NATs, or media gateways.
3. Attack Scenarios
The discussion in this section refers to requirements R6, R7, R14,
R17, and R27.
This document classifies adversaries according to their access and
their capabilities. An adversary might have access to:
1. only the media path,
2. only the signaling path,
3. both the media path and the signaling path.
An attacker that can solely be located along the signaling path, and
does not have access to media (item 2), is not considered in this
document.
There are two different types of adversaries, active and passive. An
active adversary may need to be active with regard to the key
exchange relevant information traveling along the media path or
traveling along the signaling path.
Based on their robustness against the adversary capabilities
described above, we can group security mechanisms using the following
labels, ordered from least secure at the top to most secure at the
bottom:
no-signaling-passive-media:
Access to only the media path is sufficient to reveal the content
of the media traffic. This is how unencrypted RTP functions.
passive-signaling-passive-media:
Passive attack on the signaling and passive attack on the media
path is necessary to reveal the content of the media traffic.
active-signaling-passive-media:
Active attack on the signaling path and passive attack on the
media path is necessary to reveal the content of the media
traffic.
active-signaling-active-media:
Active attack on both the signaling path and the media path is
necessary to reveal the content of the media traffic.
detect-attack:
Active attack on both signaling and media path is necessary to
reveal the content of the media traffic (active-signaling-active-
media), but the attack is detectable by the end points when
adversary tampers with the signaling and/or media messages.
For example, Security Descriptions [RFC4568], when protected by TLS
(as it is commonly implemented and deployed), belongs in the passive-
signaling-passive-media category since the adversary needs to learn
the Security Descriptions key by seeing the SIP signaling message at
a SIP proxy (assuming that the adversary is in control of the SIP
proxy). The media traffic can be decrypted using that learned key.
As another example, DTLS-SRTP falls into active-signaling-active-
media category when DTLS-SRTP is used with a public key based
ciphersuite with self-signed certificates and without SIP-Identity
[RFC4474]. An adversary would have to modify the fingerprint that is
sent along the signaling path and subsequently to modify the
certificates carried in the DTLS handshake that travel along the
media path. If DTLS-SRTP is used with SIP-Identity [RFC4474] and
protects both the offer and the answer, it would belong to the
detect-attack category.
The above discussion of DTLS-SRTP demonstrates how a single security
protocol can be in different classes depending on the mode in which
it is operated. Other protocols can achieve similar effect by adding
functions outside of the on-the-wire key management protocol itself.
Although it may be appropriate to deploy lower-classed mechanisms in
some cases, the ultimate security requirement for a media security
negotiation protocol is that it have a mode of operation available in
which it is detect-attack, which provides protection against the
passive and active attacks and provides detection of such attacks.
That is, there must be a way to use the protocol so that an active
attack is required against both the signaling and media paths, and so
that such attacks are detectable by the endpoints.
4. Call Scenarios
The following subsections describe call scenarios that pose the most
challenge to the key management system for media data in cooperation
with SIP signaling.
4.1. Clipping Media Before Signaling Answer
The discussion in this section refers to requirement R5.
Per the SDP Offer/Answer Model [RFC3264],
"Once the offerer has sent the offer, it MUST be prepared to
receive media for any recvonly streams described by that offer.
It MUST be prepared to send and receive media for any sendrecv
streams in the offer, and send media for any sendonly streams in
the offer (of course, it cannot actually send until the peer
provides an answer with the needed address and port information)."
To meet this requirement with SRTP, the offerer needs to know the
SRTP key for arriving media. If either endpoint receives encrypted
media before it has access to the associated SRTP key, it cannot play
the media -- causing clipping.
For key exchange mechanisms that send the answerer's key in SDP, a
SIP provisional response [RFC3261], such as 183 (session progress),
is useful. However, the 183 messages are not reliable unless both
the calling and called end point support PRACK [RFC3262], use TCP
across all SIP proxies, implement Security Preconditions [RFC5027],
or the both ends implement ICE [I-D.ietf-mmusic-ice] and the answerer
implements the reliable provisional response mechanism described in
ICE. Unfortunately, there is not wide deployment of any of these
techniques and there is industry reluctance to set requirements
regarding these techniques to avoid the problem described in this
section.
Note that the receipt of an SDP answer is not always sufficient to
allow media to be played to the offerer. Sometimes, the offerer must
send media in order to open up firewall holes or NAT bindings before
media can be received. In this case, even a solution that makes the
key available before the SDP answer arrives will not help.
Fixes to early media might make the requirements to become obsolete,
but at the time of writing no progress has been accomplished.
4.2. Retargeting and Forking
The discussion in this section relates to requirements R1, R2, and
R3.
In SIP, a request sent to a specific AOR but delivered to a different
AOR is called a "retarget". A typical scenario is a "call
forwarding" feature. In Figure 1 Alice sends an Invite in step 1
that is sent to Bob in step 2. Bob responds with a redirect (SIP
response code 3xx) pointing to Carol in step 3. This redirect
typically does not propagate back to Alice but only goes to a proxy
(i.e., the retargeting proxy) that sends the original Invite to Carol
in step 4.
+-----+
|Alice|
+--+--+
|
| Invite (1)
V
+----+----+
| proxy |
++-+-----++
| ^ |
Invite (2) | | | Invite (4)
& redirect (3) | | |
V | V
++-++ ++----+
|Bob| |Carol|
+---+ +-----+
Figure 1: Retargeting
Using retargeting might lead to situations where the UAC does not
know where its request will be going. This might not immediately
seem like a serious problem; after all, when one places a telephone
call on the PSTN, one never really knows if it will be forwarded to a
different number, who will pick up the line when it rings, and so on.
However, when considering SIP mechanisms for authenticating the
called party, this function can also make it difficult to
differentiate an intermediary that is behaving legitimately from an
attacker. From this perspective, the main problems with retargeting
ares:
Not detectable by the caller: The originating user agent has no
means of anticipating that the condition will arise, nor any means
of determining that it has occurred until the call has already
been set up, i.e. the negative consequences have already been
realized.
Not preventable by the caller: There is no existing security
mechanism that might be employed by the originating user agent in
order to guarantee that the call will not be re-targeted.
The mechanism used by SIP for identifying the calling party is SIP
Identity [RFC3261]. However, due to the nature of retargeting SIP
Identity can only identify the calling party (that is, the party that
initiated the SIP request). Some key exchange mechanisms predate SIP
Identity and include their own identity mechanism. However, those
built-in identity mechanism also suffer from the SIP retargeting
problem. Going forward, Connected Identity [RFC4916] allows
identifying the called party.
In SIP, 'forking' is the delivery of a request to multiple locations.
This happens when a single AOR is registered more than once. An
example of forking is when a user has a desk phone, PC client, and
mobile handset all registered with the same AOR.
+-----+
|Alice|
+--+--+
|
| Invite
V
+-----+-----+
| proxy |
++---------++
| |
Invite | | Invite
V V
+--+--+ +--+--+
|Bob-1| |Bob-2|
+-----+ +-----+
Figure 2: Forking
With forking, both Bob-1 and Bob-2 might send back SDP answers in SIP
responses. Alice will see those intermediate (18x) and final (200)
responses. It is useful for Alice to be able to associate the SIP
response with the incoming media stream. Although this association
can be done with ICE [I-D.ietf-mmusic-ice], and ICE is useful to make
this association with RTP, it is not desirable to require ICE to
accomplish this association.
Forking and retargeting are often used together. For example, a boss
and secretary might have both phones ring (forking) and rollover to
voice mail if neither phone is answered (retargeting).
To maintain security of the media traffic, only the end point that
answers the call should know the SRTP keys for the session. This is
only an issue when the key management is encrypted with a key
corresponding to the responder. It does not lead to problems with
DH-based approaches. For key exchange mechanisms that do not provide
secure forking or secure retargeting, one workaround is to re-key
immediately after forking or retargeting. However, because the
originator may not be aware that the call forked this mechanism
requires rekeying immediately after every session is established.
This doubles the number of messages processed by the network.
Retargeting securely introduces a more significant problem. With
retargeting, the actual recipient of the request is not the original
recipient. This means that if the offerer encrypted material (such
as the session key or the SDP) using the original recipient's public
key (or a shared secret established previously), the actual recipient
will not be able to decrypt that material because the recipient won't
have the original recipient's private key. In some cases, this is
the intended behavior, i.e., you wanted to establish a secure
connection with a specific individual. In other cases, it is not
intended behavior (you want all voice media to be encrypted,
regardless of who answers).
Further compounding this problem is a particularity of SIP that when
forking is used, there is always only one final error response
delivered to the sender of the request: the forking proxy is
responsible for choosing which final response to choose in the event
where forking results in multiple final error responses being
received by the forking proxy. This means that if a request is
rejected, say with information that the keying information was
rejected and providing the far end's credentials, it is very possible
that the rejection will never reach the sender. This problem, called
the Heterogeneous Error Response Forking Problem (HERFP)
[I-D.mahy-sipping-herfp-fix], is difficult to solve in SIP. Because
we expect the HERFP to continue to be a problem in SIP for the
foreseeable future, a media security system should function even in
the presence of HERFP behavior.
4.3. Shared Key Conferencing
The consensus on the RTPSEC mailing list was to concentrate on
unicast, point-to-point sessions. Thus, there are no requirements
related to shared key conferencing. This section is retained for
informational purposes.
For efficient scaling, large audio and video conference bridges
operate most efficiently by encrypting the current speaker once and
distributing that stream to the conference attendees. Typically,
inactive participants receive the same streams -- they hear (or see)
the active speaker(s), and the active speakers receive distinct
streams that don't include themselves. In order to maintain
confidentiality of such conferences where listeners share a common
key, all listeners must rekeyed when a listener joins or leaves a
conference.
An important use case for mixers/translators is a conference bridge:
+----+
A --- 1 --->| |
<-- 2 ----| M |
| I |
B --- 3 --->| X |
<-- 4 ----| E |
| R |
C --- 5 --->| |
<-- 6 ----| |
+----+
Figure 3: Centralized Keying
In the figure above, 1, 3, and 5 are RTP media contributions from
Alice, Bob, and Carol, and 2, 4, and 6 are the RTP flows to those
devices carrying the 'mixed' media.
Several scenarios are possible:
a. Multiple inbound sessions: 1, 3, and 5 are distinct RTP sessions,
b. Multiple outbound sessions: 2, 4, and 6 are distinct RTP
sessions,
c. Single inbound session: 1, 3, and 5 are just different sources
within the same RTP session,
d. Single outbound session: 2, 4, and 6 are different flows of the
same (multi-unicast) RTP session
If there are multiple inbound sessions and multiple outbound sessions
(scenarios a and b), then every keying mechanism behaves as if the
mixer were an end point and can set up a point-to-point secure
session between the participant and the mixer. This is the simplest
situation, but is computationally wasteful, since SRTP processing has
to be done independently for each participant. The use of multiple
inbound sessions (scenario a) doesn't waste computational resources,
though it does consume additional cryptographic context on the mixer
for each participant and has the advantage of non-repudiation of the
originator of the incoming stream.
To support a single outbound session (scenario d), the mixer has to
dictate its encryption key to the participants. Some keying
mechanisms allow the transmitter to determine its own key, and others
allow the offerer to determine the key for the offerer and answerer.
Depending on how the call is established, the offerer might be a
participant (such as a participant dialing into a conference bridge)
or the offerer might be the mixer (such as a conference bridge
calling a participant). The use of offerless Invites may help some
keying mechanisms reverse the role of offerer/answerer. A
difficulty, however, is knowing a priori if the role should be
reversed for a particular call.
4.4. Recording
The discussion in this section relates to requirement R23.
Some business environments, such as stock brokers, banks, and catalog
call centers, require recording calls with customers. This is the
familiar "this call is being recorded for quality purposes" heard
during calls to these sorts of businesses. In these environments,
media recording is typically performed by an intermediate device
(with RTP, this is typically implemented in a 'sniffer').
When performing such call recording with SRTP, the end-to-end
security is compromised. This is unavoidable, but necessary because
the operation of the business requires such recording. It is
desirable that the media security is not unduly compromised by the
media recording. The endpoint within the organization needs to be
informed that there is an intermediate device and needs to cooperate
with that intermediate device.
This scenario does not place a requirement directly on the key
management protocol. The requirement could be met directly by the
key management protocol (e.g., MIKEY-NULL or [RFC4568]) or through an
external out-of-band-mechanism (e.g., [I-D.wing-sipping-srtp-key]).
4.5. PSTN gateway
The discussion in this section relates to requirement R26.
A typical case of using media security is the one where two entities
are having a VoIP conversation over IP capable networks. However,
there are cases where the other end of the communication is not
connected to an IP capable network. In this kind of setting, there
needs to b