Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 19 additions & 23 deletions peps/pep-3131.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@ an additional policy is necessary, anyway.
Specification of Language Changes
=================================

The syntax of identifiers in Python will be based on the Unicode standard annex
UAX-31 [1]_, with elaboration and changes as defined below.
The syntax of identifiers in Python will be based on the `Unicode standard annex
UAX-31 <https://www.unicode.org/reports/tr31/>`__, with elaboration and changes
as defined below.

Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
are the same as in Python 2.5. This specification only introduces additional
Expand All @@ -69,9 +70,10 @@ the ``unicodedata`` module.
The identifier syntax is ``<XID_Start> <XID_Continue>*``.

The exact specification of what characters have the XID_Start or
XID_Continue properties can be found in the DerivedCoreProperties
file of the Unicode data in use by Python (4.1 at the time this
PEP was written), see [6]_. For reference, the construction rules
XID_Continue properties can be found in the `DerivedCoreProperties
file <https://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties.txt>`__
of the Unicode data in use by Python (4.1 at the time this
PEP was written). For reference, the construction rules
for these sets are given below. The XID_* properties are derived
from ID_Start/ID_Continue, which are derived themselves.

Expand All @@ -94,7 +96,7 @@ comparison of identifiers is based on NFKC.

A non-normative HTML file listing all valid identifier characters for
Unicode 4.1 can be found at
http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.
https://web.archive.org/web/20081016132748/http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.

Policy Specification
====================
Expand Down Expand Up @@ -136,8 +138,9 @@ The following changes will need to be made to the parser:
Open Issues
===========

John Nagle suggested consideration of Unicode Technical Standard #39,
[2]_, which discusses security mechanisms for Unicode identifiers.
John Nagle suggested consideration of `Unicode Technical Standard #39
<https://www.unicode.org/reports/tr39/>`__,
which discusses security mechanisms for Unicode identifiers.
It's not clear how that can precisely apply to this PEP; possible
consequences are

Expand All @@ -153,7 +156,8 @@ needs two identifiers to compare them for confusion - is it possible
to somehow apply it to a single identifier only, and warn?

In follow-up discussion, it turns out that John Nagle actually
meant to suggest UTR#36, level "Highly Restrictive", [3]_.
meant to suggest `UTR#36 <https://www.unicode.org/reports/tr36/>`__,
level "Highly Restrictive".

Several people suggested to allow and ignore formatting control
characters (general category Cf), as is done in Java, JavaScript, and
Expand All @@ -164,15 +168,17 @@ later.
Some people would like to see an option on selecting support
for this PEP at run-time; opinions vary on what precisely
that option should be, and what precisely its default value
should be. Guido van Rossum commented in [5]_ that a global
flag passed to the interpreter is not acceptable, as it would
should be. `Guido van Rossum commented
<https://mail.python.org/pipermail/python-3000/2007-May/007925.html>`__
that a global flag passed to the interpreter is not acceptable, as it would
apply to all modules.

Discussion
==========

Ka-Ping Yee summarizes discussion and further objection
in [4]_ as such:
`Ka-Ping Yee summarizes discussion and further objection
<https://mail.python.org/pipermail/python-3000/2007-June/008161.html>`__
as such:

A. Should identifiers be allowed to contain any Unicode letter?

Expand Down Expand Up @@ -250,16 +256,6 @@ F. Which normalization form should be used, NFC or NFKC?
G. Should source code be required to be in normalized form?


References
==========

.. [1] http://www.unicode.org/reports/tr31/
.. [2] http://www.unicode.org/reports/tr39/
.. [3] http://www.unicode.org/reports/tr36/
.. [4] https://mail.python.org/pipermail/python-3000/2007-June/008161.html
.. [5] https://mail.python.org/pipermail/python-3000/2007-May/007925.html
.. [6] http://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties.txt

Copyright
=========

Expand Down