Skip to content

Commit 3c6780c

Browse files
committed
Closes #15956: improve documentation of named groups and how to reference them.
1 parent 60e602d commit 3c6780c

File tree

1 file changed

+26
-14
lines changed

1 file changed

+26
-14
lines changed

Doc/library/re.rst

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -242,21 +242,32 @@ The special characters are:
242242

243243
``(?P<name>...)``
244244
Similar to regular parentheses, but the substring matched by the group is
245-
accessible within the rest of the regular expression via the symbolic group
246-
name *name*. Group names must be valid Python identifiers, and each group
247-
name must be defined only once within a regular expression. A symbolic group
248-
is also a numbered group, just as if the group were not named. So the group
249-
named ``id`` in the example below can also be referenced as the numbered group
250-
``1``.
251-
252-
For example, if the pattern is ``(?P<id>[a-zA-Z_]\w*)``, the group can be
253-
referenced by its name in arguments to methods of match objects, such as
254-
``m.group('id')`` or ``m.end('id')``, and also by name in the regular
255-
expression itself (using ``(?P=id)``) and replacement text given to
256-
``.sub()`` (using ``\g<id>``).
245+
accessible via the symbolic group name *name*. Group names must be valid
246+
Python identifiers, and each group name must be defined only once within a
247+
regular expression. A symbolic group is also a numbered group, just as if
248+
the group were not named.
249+
250+
Named groups can be referenced in three contexts. If the pattern is
251+
``(?P<quote>['"]).*?(?P=quote)`` (i.e. matching a string quoted with either
252+
single or double quotes):
253+
254+
+---------------------------------------+----------------------------------+
255+
| Context of reference to group "quote" | Ways to reference it |
256+
+=======================================+==================================+
257+
| in the same pattern itself | * ``(?P=quote)`` (as shown) |
258+
| | * ``\1`` |
259+
+---------------------------------------+----------------------------------+
260+
| when processing match object ``m`` | * ``m.group('quote')`` |
261+
| | * ``m.end('quote')`` (etc.) |
262+
+---------------------------------------+----------------------------------+
263+
| in a string passed to the ``repl`` | * ``\g<quote>`` |
264+
| argument of ``re.sub()`` | * ``\g<1>`` |
265+
| | * ``\1`` |
266+
+---------------------------------------+----------------------------------+
257267

258268
``(?P=name)``
259-
Matches whatever text was matched by the earlier group named *name*.
269+
A backreference to a named group; it matches whatever text was matched by the
270+
earlier group named *name*.
260271

261272
``(?#...)``
262273
A comment; the contents of the parentheses are simply ignored.
@@ -667,7 +678,8 @@ form.
667678
when not adjacent to a previous match, so ``sub('x*', '-', 'abc')`` returns
668679
``'-a-b-c-'``.
669680

670-
In addition to character escapes and backreferences as described above,
681+
In string-type *repl* arguments, in addition to the character escapes and
682+
backreferences described above,
671683
``\g<name>`` will use the substring matched by the group named ``name``, as
672684
defined by the ``(?P<name>...)`` syntax. ``\g<number>`` uses the corresponding
673685
group number; ``\g<2>`` is therefore equivalent to ``\2``, but isn't ambiguous

0 commit comments

Comments
 (0)