Network Working Group P. Hoffman Internet-Draft March 4, 2009 Updates: RFC 3454, 3490, 3491 (if approved) Intended status: Standards Track Expires: September 5, 2009 Internationalizing Domain Names in Applications (IDNA) version 2 draft-hoffman-idna2-02.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 5, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. Hoffman Expires September 5, 2009 [Page 1] Internet-Draft IDNA2 March 2009 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract IDNA has been a world-wide success since it was introduced over five years ago. However, it has some notable deficiencies, including being tied to an old version of the Unicode standard and needless restrictions that prevented some languages from being used. This document describes IDNA version 2, which rectifies those problems while making the fewest changes necessary to the original protocol. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 4 1.2. Conventions Used In This Document . . . . . . . . . . . . . 4 2. Changes to RFC 3490 (IDNA v.1) . . . . . . . . . . . . . . . . 4 3. Changes to RFC 3454 (Stringprep) . . . . . . . . . . . . . . . 4 4. Changes to RFC 3491 (Nameprep) . . . . . . . . . . . . . . . . 6 5. Changes to RFC 3492 (Punycode) . . . . . . . . . . . . . . . . 7 6. Suggestions for Registries . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 8. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 9.1. Normative References . . . . . . . . . . . . . . . . . . . 7 9.2. Informative References . . . . . . . . . . . . . . . . . . 8 Appendix A. Work Still to be Done . . . . . . . . . . . . . . . . 8 Appendix B. Changes between versions . . . . . . . . . . . . . . . 8 B.1. Changes between the -00 and -01 drafts . . . . . . . . . . 8 B.2. Changes between the -01 and -02 drafts . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 9 Hoffman Expires September 5, 2009 [Page 2] Internet-Draft IDNA2 March 2009 1. Introduction This document describes Internationalizing Domain Names in Applications (IDNA) version 2 (hereafter called "IDNAv2"), a direct update to IDNA (hereafter called "IDNAv1"). IDNAv1 consists of four RFCs: o [RFC3490], "Internationalizing Domain Names in Applications (IDNA)", is the main definition of IDNAv1. This defines the processing rules for IDNA and gives the background for how IDNA works. o [RFC3454], "Preparation of Internationalized Strings ("stringprep")", defines the general framework for processing non- ASCII strings that are used in IDNA. o [RFC3491], "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", is a short profile of the rules from the stringprep framework. o [RFC3492], "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", defines the encoding used in IDNAv1 labels. IDNAv2 is backwards-compatible with IDNv1, meaning that any DNS label that was legal in IDNAv1 has exactly the same representation in IDNAv2. New labels are allowed in IDNAv2 that were not allowed in IDNAv1. IDNA needs to be updated for many reasons, some of which are covered in [RFC4690]. If for no other reason, many characters that could appear in domain names have been added since Unicode version 3.2 [UNICODE32], which is the version of the Unicode Standard on which IDNAv1 is based. One explicit goal of this update is to allow labels with characters that have been added since Unicode version 3.2 to be used in IDNA. To that end, IDNAv2 is based on Unicode 5.1 [UNICODE51]. The tables in stringprep and Nameprep are updated to reflect this change. Another explicit goal of this update is to not change the encoding of any label that is legal in IDNAv1. If an internationalized label in IDNAv1 produces an ACE label, IDNAv2 must produce the same ACE label. If an internationalized label in IDNAv1 produces an ASCII label, IDNAv2 must produce the same ASCII label. A third explicit goal is to update the bidirectional ("bidi") algorithm used by IDNAv1 to cover more languages such as Dhivehi and Yiddish. This is done to cover an oversight in IDNAv1 that was discovered after the work was finished. This document updates IDNAv1 to reflect Unicode version 5.1. Of Hoffman Expires September 5, 2009 [Page 3] Internet-Draft IDNA2 March 2009 course, the Unicode Consortium will not stop at Unicode version 5.1. Because of that, IDNAv2 will probably later need to be updated to reflect newer versions of Unicode. 1.1. Acknowledgements The first serious work on updating IDNAv1 was undertaken by John Klensin, Patrik Faltstrom, Harald Alvestrand, and Cary Karp. It led to the formation of the IDNAbis Working Group in the IETF, and they produced many revisions of their documents in that WG. Some of the ideas in this IDNAv2 document (most notably, the update to the bidi algorithm) is derived from their efforts. Many, many people worked on IDNAv1. In addition to the authors of the standards (Marc Blanchet, Adam Costello, Patrik Faltstrom, and me), there were literally dozens of active participants in the original IDN Working Group in the IETF that began in 2000. Their tireless effort led to IDNAv1. 1.2. Conventions Used In This Document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. In sections of this document where changes are made to RFCs, those changes are shown with a vertical line character ("|") in the first column. 2. Changes to RFC 3490 (IDNA v.1) All references to the Unicode Standard are updated to refer to [UNICODE51]. All references to Nameprep are updated to refer to the Nameprep in this document. Similarly, all references to stringprep are updated to refer to the stringprep in this document. In section 3.1, the first bullet point ("1) Whenever dots are used...") is changed to add the following at the end of the sentence: "U+2CFE (Coptic full stop)". 3. Changes to RFC 3454 (Stringprep) [[[ ============================================================ Hoffman Expires September 5, 2009 [Page 4] Internet-Draft IDNA2 March 2009 NOTE FOR EARLY VERSIONS OF THIS DRAFT This section is intentionally incomplete. The tables in Stringprep need to be added to based on the characters added to the repertoire after Unicode 3.2 up to and including Unicode 5.1. Probably the best way for this to be done is a few dedicated individuals go through the new characters one-by-one, and also to go through them programmatically, and see which tables need to be added to. I have done a first pass of doing this one-by-one, but I felt that publishing my results in the first draft would cause others to get lazy about this important task. Future versions of this document will reflect the results of that work. The character review will be similar to what we did in IDNAv1, except that we don't have to create any new buckets. Basically, we have to see whether a particular new character should be mapped to nothing, or whether it should be prohibited for one of the reasons already listed in RFC 3454. In my not-careful first pass, I found very few characters that will need to be added to sections 3 or 5. The case- mapping will happen algorithmically, with a check that the new map does not change any value in the old map. ============================================================ ]]] This document is significantly revised to reflect the use of Unicode version 5.1. All the substantiative changes are additions. There has been no effort to "correct" perceived mistakes in RFC 3454. (One can argue that the extending of the bidi rules in section 6 to allow more languages to be expressed is such a correction; however, the change lets more strings to be allowed, and doesn't cause any string that was allowed in RFC 3454 to not be allowed in the new version.) Most of the changes to RFC 3454 are to add characters to the tables in the document. These characters come from Unicode version 5.1. Thus, the tables become valid for Unicode version 5.1. However, the same tables are still valid for Unicode version 3.2 because a profile that is still using version 3.2 will not ever use the added rows in the updated tables. In all places other than Appendix A, references to "[Unicode3.2]" are updated to refer to [UNICODE51]. Similarly, all text references to "Unicode version 3.2" are updated to "Unicode version 5.1". Characters will be added to the tables in section 3.1 to reflect the differences between Unicode 3.2 and Unicode 5.1. For example, U+E0100 to U+E01EF will be added to the second list in the section. Hoffman Expires September 5, 2009 [Page 5] Internet-Draft IDNA2 March 2009 In section 3.2, change "CaseFolding-3.txt" to "CaseFolding.txt". Characters will be added to the tables in subsections of section 5. An example is that U+2064 will be added to the list in section 5.2. In section 6, at the end of the fourth paragraph (which currently ends with "have bidirectional category "EN"."), the following sentence is added: "The Unicode Standard also defines a bidirectional category "NSM" for "non-spacing marks"." In section 6, the third requirement is changed to read: | 3) If a string contains any RandALCat character, the first | character MUST be a RandALCat chacter, and the last | characters of the string must be either a RandALCat | character or a RandALCat character followed by one or | more NSM charcters. In the references, update the reference for UAX15, and add a reference for [UNICODE51]. Appendix A is changed to read: | The following is the only repertoire covered in this document: | | - Unicode 3.2, as defined in [UNICODE32] | | - Unicode 5.1, as defined in [UNICODE51] A new appendix, "A.2 Unassigned code points in Unicode 5.1", will be added. The tables in appendixes B, C, and D will be added to. 4. Changes to RFC 3491 (Nameprep) All references to IDNA and stringprep are updated to refer to the stringprep in this document. In section 1 and 2, "Unicode 3.2" is changed to "Unicode 5.1". In section 10, change the last table entry to "This is the second version of Nameprep." Hoffman Expires September 5, 2009 [Page 6] Internet-Draft IDNA2 March 2009 5. Changes to RFC 3492 (Punycode) IDNAv2 does not change RFC 3492. 6. Suggestions for Registries This is a placeholder for a short section that covers new advice for registries that was not included in IDNAv1. It will include ideas about multi-script labels and possibly other advice. 7. IANA Considerations IANA is requested to add the following to the stringprep profile registry (www.iana.org/assignments/stringprep-profiles). Name of this profile: Nameprep RFC in which the profile is defined: This document. Indicator whether or not this is the newest version of the profile: This is the second version of Nameprep. 8. Security Considerations The security considerations from RFCs 3454, 3490, 3491, and 3492 all apply to this document. The changes between IDNAv1 and IDNAv2 are not believed to add any new security considerations. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002. [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Hoffman Expires September 5, 2009 [Page 7] Internet-Draft IDNA2 March 2009 Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003. [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003. [UNICODE32] The Unicode Consortium, "The Unicode Standard, Version 3.2", The Unicode Standard version 3.2. [UNICODE51] The Unicode Consortium, "The Unicode Standard, Version 5.1", The Unicode Standard version 5.1. 9.2. Informative References [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and Recommendations for Internationalized Domain Names (IDNs)", RFC 4690, September 2006. Appendix A. Work Still to be Done Figure out exactly how we want the reference to Unicode 3.2 and Unicode 5.1 to look in the references section, then figure out how to wrestle xml2rfc to produce that. Fill in all the tables for the updates to stringprep. Decide if this entire document should be about Unicode 5.2, which is expected out by mid-2009. Appendix B. Changes between versions (This section is to be removed by the RFC Editor.) B.1. Changes between the -00 and -01 drafts In section 1, changed the target for backwards-compatibility to be for strings that have only visible characters. In section 3, removed the first paragraph. In section 3 (about Stringprep section 3.1), added the text about removing U+200C and U+200D from the mapped-to-nothing list. Hoffman Expires September 5, 2009 [Page 8] Internet-Draft IDNA2 March 2009 In section 3 (about Stringprep section 6), replaced: | 3) If a string contains any RandALCat character, a RandALCat | character MUST be the first character of the string, and | either a RandALCat character or NSM charcter MUST be the | last character of the string. with | 3) If a string contains any RandALCat character, the first | character MUST be a RandALCat chacter, and the last | characters of the string must be either a RandALCat | character or a RandALCat character followed by one or | more NSM charcters. Added new placeholder section 6 on advice to registries. In Appendix A, added the thought about targeting Unicode 5.2 instead of Unicode 5.1. B.2. Changes between the -01 and -02 drafts Reversed the changes made in -01 with respect to U+200C and U+200D. Added paragraph at the end of section 1 acknowledging that IDNAv2 will eventually need to be updated as well. Author's Address Paul Hoffman Email: phoffman@imc.org Hoffman Expires September 5, 2009 [Page 9]