Skip to content

Commit 587dc4d

Browse files
committed
bpo-42353: Add prefixmatch APIs to the re module.
These alleviate common confusion around what "match" means as Python is different than other popular languages in our use of the term as an API name. The original "match" names are NOT being deprecated. Source tooling like linters are expected to suggest using prefixmatch instead of match to improve code health and reduce cognitive burden of understanding the intent when reading code. See the documentation changes within this PR for a better description.
1 parent bf95ff9 commit 587dc4d

File tree

7 files changed

+317
-112
lines changed

7 files changed

+317
-112
lines changed

Doc/library/re.rst

Lines changed: 99 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -610,8 +610,8 @@ form.
610610

611611
Compile a regular expression pattern into a :ref:`regular expression object
612612
<re-objects>`, which can be used for matching using its
613-
:func:`~Pattern.match`, :func:`~Pattern.search` and other methods, described
614-
below.
613+
:func:`~Pattern.prefixmatch`, :func:`~Pattern.search` and other methods,
614+
described below.
615615

616616
The expression's behaviour can be modified by specifying a *flags* value.
617617
Values can be any of the following variables, combined using bitwise OR (the
@@ -620,11 +620,11 @@ form.
620620
The sequence ::
621621

622622
prog = re.compile(pattern)
623-
result = prog.match(string)
623+
result = prog.search(string)
624624

625625
is equivalent to ::
626626

627-
result = re.match(pattern, string)
627+
result = re.search(pattern, string)
628628

629629
but using :func:`re.compile` and saving the resulting regular expression
630630
object for reuse is more efficient when the expression will be used several
@@ -753,19 +753,36 @@ form.
753753
point in the string.
754754

755755

756-
.. function:: match(pattern, string, flags=0)
756+
.. function:: prefixmatch(pattern, string, flags=0)
757757

758758
If zero or more characters at the beginning of *string* match the regular
759759
expression *pattern*, return a corresponding :ref:`match object
760760
<match-objects>`. Return ``None`` if the string does not match the pattern;
761761
note that this is different from a zero-length match.
762762

763-
Note that even in :const:`MULTILINE` mode, :func:`re.match` will only match
764-
at the beginning of the string and not at the beginning of each line.
763+
Note that even in :const:`MULTILINE` mode, :func:`re.prefixmatch` will only
764+
match at the beginning of the string and not at the beginning of each line.
765765

766766
If you want to locate a match anywhere in *string*, use :func:`search`
767767
instead (see also :ref:`search-vs-match`).
768768

769+
Use :func:`~re.match` when your code needs to support older Python versions.
770+
771+
.. versionadded:: 3.11
772+
773+
774+
.. function:: match(pattern, string, flags=0)
775+
776+
The same as :func:`prefixmatch` documented above. Prefer using that more
777+
explicit name when writing code intended only for Python versions 3.11
778+
and up.
779+
780+
The new name was created in order to be explicit about its behavior
781+
to reduce confusion vs the industry norm for regular expression APIs.
782+
See :ref:`prefixmatch-vs-match`.
783+
784+
.. versionchanged:: 3.11
785+
769786

770787
.. function:: fullmatch(pattern, string, flags=0)
771788

@@ -1041,7 +1058,7 @@ attributes:
10411058
>>> pattern.search("dog", 1) # No match; search doesn't include the "d"
10421059

10431060

1044-
.. method:: Pattern.match(string[, pos[, endpos]])
1061+
.. method:: Pattern.prefixmatch(string[, pos[, endpos]])
10451062

10461063
If zero or more characters at the *beginning* of *string* match this regular
10471064
expression, return a corresponding :ref:`match object <match-objects>`.
@@ -1059,6 +1076,23 @@ attributes:
10591076
If you want to locate a match anywhere in *string*, use
10601077
:meth:`~Pattern.search` instead (see also :ref:`search-vs-match`).
10611078

1079+
Use :meth:`~Pattern.match` when your code needs to support older Pythons.
1080+
1081+
.. versionadded:: 3.11
1082+
1083+
1084+
.. method:: Pattern.match(string[, pos[, endpos]])
1085+
1086+
The same as :meth:`Pattern.prefixmatch` documented above. Prefer using that
1087+
more explicit name when writing code intended only for Python versions 3.11
1088+
and up.
1089+
1090+
The new name was created in order to be explicit about its behavior
1091+
to reduce confusion vs the industry norm for regular expression APIs.
1092+
See :ref:`prefixmatch-vs-match`.
1093+
1094+
.. versionchanged:: 3.11
1095+
10621096

10631097
.. method:: Pattern.fullmatch(string[, pos[, endpos]])
10641098

@@ -1179,7 +1213,7 @@ Match objects support the following methods and attributes:
11791213
If a group is contained in a part of the pattern that matched multiple times,
11801214
the last match is returned. ::
11811215

1182-
>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
1216+
>>> m = re.search(r"(\w+) (\w+)", "Isaac Newton, physicist")
11831217
>>> m.group(0) # The entire match
11841218
'Isaac Newton'
11851219
>>> m.group(1) # The first parenthesized subgroup.
@@ -1196,7 +1230,7 @@ Match objects support the following methods and attributes:
11961230

11971231
A moderately complicated example::
11981232

1199-
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
1233+
>>> m = re.search(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
12001234
>>> m.group('first_name')
12011235
'Malcolm'
12021236
>>> m.group('last_name')
@@ -1211,8 +1245,8 @@ Match objects support the following methods and attributes:
12111245

12121246
If a group matches multiple times, only the last match is accessible::
12131247

1214-
>>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times.
1215-
>>> m.group(1) # Returns only the last match.
1248+
>>> m = re.search(r"(..)+", "a1b2c3") # Matches 3 times.
1249+
>>> m.group(1) # Returns only the last match.
12161250
'c3'
12171251

12181252

@@ -1221,7 +1255,7 @@ Match objects support the following methods and attributes:
12211255
This is identical to ``m.group(g)``. This allows easier access to
12221256
an individual group from a match::
12231257

1224-
>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
1258+
>>> m = re.search(r"(\w+) (\w+)", "Isaac Newton, physicist")
12251259
>>> m[0] # The entire match
12261260
'Isaac Newton'
12271261
>>> m[1] # The first parenthesized subgroup.
@@ -1240,15 +1274,15 @@ Match objects support the following methods and attributes:
12401274

12411275
For example::
12421276

1243-
>>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
1277+
>>> m = re.search(r"(\d+)\.(\d+)", "24.1632")
12441278
>>> m.groups()
12451279
('24', '1632')
12461280

12471281
If we make the decimal place and everything after it optional, not all groups
12481282
might participate in the match. These groups will default to ``None`` unless
12491283
the *default* argument is given::
12501284

1251-
>>> m = re.match(r"(\d+)\.?(\d+)?", "24")
1285+
>>> m = re.search(r"(\d+)\.?(\d+)?", "24")
12521286
>>> m.groups() # Second group defaults to None.
12531287
('24', None)
12541288
>>> m.groups('0') # Now, the second group defaults to '0'.
@@ -1261,7 +1295,7 @@ Match objects support the following methods and attributes:
12611295
the subgroup name. The *default* argument is used for groups that did not
12621296
participate in the match; it defaults to ``None``. For example::
12631297

1264-
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
1298+
>>> m = re.search(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
12651299
>>> m.groupdict()
12661300
{'first_name': 'Malcolm', 'last_name': 'Reynolds'}
12671301

@@ -1367,38 +1401,38 @@ representing the card with that value.
13671401
To see if a given string is a valid hand, one could do the following::
13681402

13691403
>>> valid = re.compile(r"^[a2-9tjqk]{5}$")
1370-
>>> displaymatch(valid.match("akt5q")) # Valid.
1404+
>>> displaymatch(valid.search("akt5q")) # Valid.
13711405
"<Match: 'akt5q', groups=()>"
1372-
>>> displaymatch(valid.match("akt5e")) # Invalid.
1373-
>>> displaymatch(valid.match("akt")) # Invalid.
1374-
>>> displaymatch(valid.match("727ak")) # Valid.
1406+
>>> displaymatch(valid.search("akt5e")) # Invalid.
1407+
>>> displaymatch(valid.search("akt")) # Invalid.
1408+
>>> displaymatch(valid.search("727ak")) # Valid.
13751409
"<Match: '727ak', groups=()>"
13761410

13771411
That last hand, ``"727ak"``, contained a pair, or two of the same valued cards.
13781412
To match this with a regular expression, one could use backreferences as such::
13791413

1380-
>>> pair = re.compile(r".*(.).*\1")
1381-
>>> displaymatch(pair.match("717ak")) # Pair of 7s.
1414+
>>> pair = re.compile(r"^.*(.).*\1")
1415+
>>> displaymatch(pair.search("717ak")) # Pair of 7s.
13821416
"<Match: '717', groups=('7',)>"
1383-
>>> displaymatch(pair.match("718ak")) # No pairs.
1384-
>>> displaymatch(pair.match("354aa")) # Pair of aces.
1417+
>>> displaymatch(pair.search("718ak")) # No pairs.
1418+
>>> displaymatch(pair.search("354aa")) # Pair of aces.
13851419
"<Match: '354aa', groups=('a',)>"
13861420

13871421
To find out what card the pair consists of, one could use the
13881422
:meth:`~Match.group` method of the match object in the following manner::
13891423

1390-
>>> pair = re.compile(r".*(.).*\1")
1391-
>>> pair.match("717ak").group(1)
1424+
>>> pair = re.compile(r"^.*(.).*\1")
1425+
>>> pair.search("717ak").group(1)
13921426
'7'
13931427

13941428
# Error because re.match() returns None, which doesn't have a group() method:
1395-
>>> pair.match("718ak").group(1)
1429+
>>> pair.search("718ak").group(1)
13961430
Traceback (most recent call last):
13971431
File "<pyshell#23>", line 1, in <module>
1398-
re.match(r".*(.).*\1", "718ak").group(1)
1432+
re.search(r".*(.).*\1", "718ak").group(1)
13991433
AttributeError: 'NoneType' object has no attribute 'group'
14001434

1401-
>>> pair.match("354aa").group(1)
1435+
>>> pair.search("354aa").group(1)
14021436
'a'
14031437

14041438

@@ -1456,32 +1490,54 @@ search() vs. match()
14561490
.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
14571491

14581492
Python offers two different primitive operations based on regular expressions:
1459-
:func:`re.match` checks for a match only at the beginning of the string, while
1460-
:func:`re.search` checks for a match anywhere in the string (this is what Perl
1461-
does by default).
1493+
:func:`re.prefixmatch` and its older equivalent named :func:`re.match` checks
1494+
for a match only at the beginning of the string, while :func:`re.search` checks
1495+
for a match anywhere in the string (this is what Perl does by default).
14621496

14631497
For example::
14641498

1465-
>>> re.match("c", "abcdef") # No match
1466-
>>> re.search("c", "abcdef") # Match
1499+
>>> re.match("c", "abcdef") # No match
1500+
>>> re.prefixmatch("c", "abcdef") # No match
1501+
>>> re.search("c", "abcdef") # Match
14671502
<re.Match object; span=(2, 3), match='c'>
14681503

14691504
Regular expressions beginning with ``'^'`` can be used with :func:`search` to
14701505
restrict the match at the beginning of the string::
14711506

1472-
>>> re.match("c", "abcdef") # No match
1473-
>>> re.search("^c", "abcdef") # No match
1474-
>>> re.search("^a", "abcdef") # Match
1507+
>>> re.match("c", "abcdef") # No match
1508+
>>> re.prefixmatch("c", "abcdef") # No match
1509+
>>> re.search("^c", "abcdef") # No match
1510+
>>> re.search("^a", "abcdef") # Match
14751511
<re.Match object; span=(0, 1), match='a'>
14761512

14771513
Note however that in :const:`MULTILINE` mode :func:`match` only matches at the
14781514
beginning of the string, whereas using :func:`search` with a regular expression
14791515
beginning with ``'^'`` will match at the beginning of each line. ::
14801516

1481-
>>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match
1482-
>>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match
1517+
>>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match
1518+
>>> re.prefixmatch('X', 'A\nB\nX', re.MULTILINE) # No match
1519+
>>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match
14831520
<re.Match object; span=(4, 5), match='X'>
14841521

1522+
.. _prefixmatch-vs-match:
1523+
1524+
prefixmatch() vs. match()
1525+
^^^^^^^^^^^^^^^^^^^^^^^^^
1526+
1527+
Why is the :func:`re.match` name being discouraged in favor of the longer
1528+
:func:`re.prefixmatch` as of Python 3.11?
1529+
1530+
Since regular expressions were introduced in Python, many other languages have
1531+
been created and or gained regex support libraries. However in the most popular
1532+
of those, they use the term "match" in their APIs to mean the unanchored
1533+
behavior provided in Python by :func:`re.search`. Thus any use of the plain
1534+
term "match" can be confusing to those reading or writing Python who are not
1535+
familiar with it's divergence from the collective software industry norm.
1536+
1537+
Quoting from the Zen Of Python (``python3 -m this``): *"Explicit is better than
1538+
implicit"*. Anyone reading the name :func:`re.prefixmatch` is likely to
1539+
understand the semantics intended. When reading :func:`re.match` there remains
1540+
a seed of doubt about the author's actual intended behavior.
14851541

14861542
Making a Phonebook
14871543
^^^^^^^^^^^^^^^^^^
@@ -1600,19 +1656,19 @@ every backslash (``'\'``) in a regular expression would have to be prefixed with
16001656
another one to escape it. For example, the two following lines of code are
16011657
functionally identical::
16021658

1603-
>>> re.match(r"\W(.)\1\W", " ff ")
1659+
>>> re.search(r"\W(.)\1\W", " ff ")
16041660
<re.Match object; span=(0, 4), match=' ff '>
1605-
>>> re.match("\\W(.)\\1\\W", " ff ")
1661+
>>> re.search("\\W(.)\\1\\W", " ff ")
16061662
<re.Match object; span=(0, 4), match=' ff '>
16071663

16081664
When one wants to match a literal backslash, it must be escaped in the regular
16091665
expression. With raw string notation, this means ``r"\\"``. Without raw string
16101666
notation, one must use ``"\\\\"``, making the following lines of code
16111667
functionally identical::
16121668

1613-
>>> re.match(r"\\", r"\\")
1669+
>>> re.search(r"\\", r"\\")
16141670
<re.Match object; span=(0, 1), match='\\'>
1615-
>>> re.match("\\\\", r"\\")
1671+
>>> re.search("\\\\", r"\\")
16161672
<re.Match object; span=(0, 1), match='\\'>
16171673

16181674

Doc/whatsnew/3.11.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,17 @@ os
264264
(Contributed by Dong-hee Na in :issue:`44611`.)
265265

266266

267+
re
268+
--
269+
270+
* :func:`re.prefixmatch` and a corresponding :meth:`re.Pattern.prefixmatch`
271+
have been added as alternate names for the existing :func:`re.match` and
272+
:meth:`re.Pattern.prefixmatch` APIs. These are intended to be used to
273+
alleviate confusion around what "match" means by following *"Explicit is
274+
better than implicit"*. Other popular language regular expression libraries
275+
use an API named ``match`` to mean what Python has always called ``search``.
276+
277+
267278
socket
268279
------
269280

0 commit comments

Comments
 (0)