Skip to content

Commit 7ddea43

Browse files
committed
up
1 parent 1e2b09b commit 7ddea43

File tree

22 files changed

+375
-336
lines changed

22 files changed

+375
-336
lines changed

10-regular-expressions-javascript/01-regexp-introduction/article.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,7 @@ There are only 5 of them in JavaScript:
102102
: Enables full unicode support. The flag enables correct processing of surrogate pairs. More about that in the chapter <info:regexp-unicode>.
103103
104104
`y`
105-
: Sticky mode (covered in [todo])
106-
105+
: Sticky mode (covered in the [next chapter](info:regexp-methods#y-flag))
107106
108107
109108
## The "i" flag

10-regular-expressions-javascript/02-regexp-methods/article.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -345,6 +345,42 @@ alert( regexp.exec(str).index ); // 34, the search starts from the 30th position
345345
```
346346
````
347347
348+
## The "y" flag [#y-flag]
349+
350+
The `y` flag means that the search should find a match exactly at the position specified by the property `regexp.lastIndex` and only there.
351+
352+
In other words, normally the search is made in the whole string: `pattern:/javascript/` looks for "javascript" everywhere in the string.
353+
354+
But when a regexp has the `y` flag, then it only looks for the match at the position specified in `regexp.lastIndex` (`0` by default).
355+
356+
For instance:
357+
358+
```js run
359+
let str = "I love JavaScript!";
360+
361+
let reg = /javascript/iy;
362+
363+
alert( reg.lastIndex ); // 0 (default)
364+
alert( str.match(reg) ); // null, not found at position 0
365+
366+
reg.lastIndex = 7;
367+
alert( str.match(reg) ); // JavaScript (right, that word starts at position 7)
368+
369+
// for any other reg.lastIndex the result is null
370+
```
371+
372+
The regexp `pattern:/javascript/iy` can only be found if we set `reg.lastIndex=7`, because due to `y` flag the engine only tries to find it in the single place within a string -- from the `reg.lastIndex` position.
373+
374+
So, what's the point? Where do we apply that?
375+
376+
The reason is performance.
377+
378+
The `y` flag works great for parsers -- programs that need to "read" the text and build in-memory syntax structure or perform actions from it. For that we move along the text and apply regular expressions to see what we have next: a string? A number? Something else?
379+
380+
The `y` flag allows to apply a regular expression (or many of them one-by-one) exactly at the given position and when we understand what's there, we can move on -- step by step examining the text.
381+
382+
Without the flag the regexp engine always searches till the end of the text, that takes time, especially if the text is large. So our parser would be very slow. The `y` flag is exactly the right thing here.
383+
348384
## Summary, recipes
349385
350386
Methods become much easier to understand if we separate them by their use in real-life tasks.
@@ -365,4 +401,9 @@ To search and replace:
365401
To split the string:
366402
: - `str.split(str|reg)`
367403
368-
Now we know the methods and can use regular expressions. But we need to learn their syntax and capabilities, so let's move on.
404+
We also covered two flags:
405+
406+
- The `g` flag to find all matches (global search),
407+
- The `y` flag to search at exactly the given position inside the text.
408+
409+
Now we know the methods and can use regular expressions. But we need to learn their syntax, so let's move on.

10-regular-expressions-javascript/09-regexp-groups/article.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Bracket groups
1+
# Capturing groups
22

33
A part of the pattern can be enclosed in parentheses `pattern:(...)`. That's called a "capturing group".
44

@@ -21,6 +21,35 @@ Without parentheses, the pattern `pattern:/go+/` means `subject:g`, followed by
2121

2222
Parentheses group the word `pattern:(go)` together.
2323

24+
Let's make something more complex -- a regexp to match an email.
25+
26+
Examples of emails:
27+
28+
```
29+
my@mail.com
30+
john.smith@site.com.uk
31+
```
32+
33+
The pattern: `pattern:[-.\w]+@([\w-]+\.)+[\w-]{2,20}`.
34+
35+
- The first part before `@` may include wordly characters, a dot and a dash `pattern:[-.\w]+`, like `match:john.smith`.
36+
- Then `pattern:@`
37+
- And then the domain. May be a second-level domain `site.com` or with subdomains like `host.site.com.uk`. We can match it as "a word followed by a dot" repeated one or more times for subdomains: `match:mail.` or `match:site.com.`, and then "a word" for the last part: `match:.com` or `match:.uk`.
38+
39+
The word followed by a dot is `pattern:(\w+\.)+` (repeated). The last word should not have a dot at the end, so it's just `\w{2,20}`. The quantifier `pattern:{2,20}` limits the length, because domain zones are like `.uk` or `.com` or `.museum`, but can't be longer than 20 characters.
40+
41+
So the domain pattern is `pattern:(\w+\.)+\w{2,20}`. Now we replace `\w` with `[\w-]`, because dashes are also allowed in domains, and we get the final result.
42+
43+
That regexp is not perfect, but usually works. It's short and good enough to fix errors or occasional mistypes.
44+
45+
For instance, here we can find all emails in the string:
46+
47+
```js run
48+
let reg = /[-.\w]+@([\w-]+\.)+[\w-]{2,20}/g;
49+
50+
alert("my@mail.com @ his@site.com.uk".match(reg)); // my@mail.com,his@site.com.uk
51+
```
52+
2453

2554
## Contents of parentheses
2655

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,33 @@
1-
Сначала неправильный способ.
21

3-
Если перечислить языки один за другим через `|`, то получится совсем не то:
2+
The first idea can be to list the languages with `|` in-between.
3+
4+
But that doesn't work right:
45

56
```js run
6-
var reg = /Java|JavaScript|PHP|C|C\+\+/g;
7+
let reg = /Java|JavaScript|PHP|C|C\+\+/g;
78

8-
var str = "Java, JavaScript, PHP, C, C++";
9+
let str = "Java, JavaScript, PHP, C, C++";
910

1011
alert( str.match(reg) ); // Java,Java,PHP,C,C
1112
```
1213

13-
Как видно, движок регулярных выражений ищет альтернации в порядке их перечисления. То есть, он сначала смотрит, есть ли `match:Java`, а если нет -- ищет `match:JavaScript`.
14+
The regular expression engine looks for alternations one-by-one. That is: first it checks if we have `match:Java`, otherwise -- looks for `match:JavaScript` and so on.
1415

15-
Естественно, при этом `match:JavaScript` не будет найдено никогда.
16+
As a result, `match:JavaScript` can never be found, just because `match:Java` is checked first.
1617

17-
То же самое -- с языками `match:C` и `match:C++`.
18+
The same with `match:C` and `match:C++`.
1819

19-
Есть два решения проблемы:
20+
There are two solutions for that problem:
2021

21-
1. Поменять порядок, чтобы более длинное совпадение проверялось первым: `pattern:JavaScript|Java|C\+\+|C|PHP`.
22-
2. Соединить длинный вариант с коротким: `pattern:Java(Script)?|C(\+\+)?|PHP`.
22+
1. Change the order to check the longer match first: `pattern:JavaScript|Java|C\+\+|C|PHP`.
23+
2. Merge variants with the same start: `pattern:Java(Script)?|C(\+\+)?|PHP`.
2324

24-
В действии:
25+
In action:
2526

2627
```js run
27-
var reg = /Java(Script)?|C(\+\+)?|PHP/g;
28+
let reg = /Java(Script)?|C(\+\+)?|PHP/g;
2829

29-
var str = "Java, JavaScript, PHP, C, C++";
30+
let str = "Java, JavaScript, PHP, C, C++";
3031

3132
alert( str.match(reg) ); // Java,JavaScript,PHP,C,C++
3233
```
33-
Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
1-
# Найдите языки программирования
1+
# Find programming languages
22

3-
Существует много языков программирования, например Java, JavaScript, PHP, C, C++.
3+
There are many programming languages, for instance Java, JavaScript, PHP, C, C++.
44

5-
Напишите регулярное выражение, которое найдёт их все в строке "Java JavaScript PHP C++ C"
5+
Create a regexp that finds them in the string `subject:Java JavaScript PHP C++ C`:
66

7+
```js
8+
let reg = /your regexp/g;
9+
10+
alert("Java JavaScript PHP C++ C".match(reg)); // Java JavaScript PHP C++ C
11+
```
Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
1-
Решение задачи: `pattern:/"(\\.|[^"\\])*"/g`.
1+
The solution: `pattern:/"(\\.|[^"\\])*"/g`.
22

3-
То есть:
3+
Step by step:
44

5-
- Сначала ищем кавычку `pattern:"`
6-
- Затем, если далее слэш `pattern:\\` (удвоение слэша -- техническое, для вставки в регэксп, на самом деле там один слэш), то после него также подойдёт любой символ (точка).
7-
- Если не слэш, то берём любой символ, кроме кавычек (которые будут означать конец строки) и слэша (чтобы предотвратить одинокие слэши, сам по себе единственный слэш не нужен, он должен экранировать какой-то символ) `pattern:[^"\\]`
8-
- ...И так жадно, до закрывающей кавычки.
5+
- First we look for an opening quote `pattern:"`
6+
- Then if we have a backslash `pattern:\\` (we technically have to double it in the pattern, because it is a special character, so that's a single backslash in fact), then any character is fine after it (a dot).
7+
- Otherwise we take any character except a quote (that would mean the end of the string) and a backslash (to prevent lonely backslashes, the backslash is only used with some other symbol after it): `pattern:[^"\\]`
8+
- ...And so on till the closing quote.
99

10-
В действии:
10+
In action:
1111

1212
```js run
13-
var re = /"(\\.|[^"\\])*"/g;
14-
var str = '.. "test me" .. "Скажи \\"Привет\\"!" .. "\\r\\n\\\\" ..';
13+
let reg = /"(\\.|[^"\\])*"/g;
14+
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
1515

16-
alert( str.match(re) ); // "test me","Скажи \"Привет\"!","\r\n\\"
16+
alert( str.match(reg) ); // "test me","Say \"Hello\"!","\\ \""
1717
```

10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/task.md

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,32 @@
1-
# Найдите строки в кавычках
1+
# Find quoted strings
22

3-
Найдите в тексте при помощи регэкспа строки в двойных кавычках `subject:"..."`.
3+
Create a regexp to find strings in double quotes `subject:"..."`.
44

5-
В строке поддерживается экранирование при помощи слеша -- примерно в таком же виде, как в обычных строках JavaScript. То есть, строка может содержать любые символы, экранированные слэшем, в частности: `subject:\"`, `subject:\n`, и даже сам слэш в экранированном виде: `subject:\\`.
5+
The important part is that strings should support escaping, in the same way as JavaScript strings do. For instance, quotes can be inserted as `subject:\"` a newline as `subject:\n`, and the slash itself as `subject:\\`.
66

7-
Здесь особо важно, что двойная кавычка после слэша не оканчивает строку, а считается её частью. В этом и состоит основная сложность задачи, которая без этого условия была бы элементарной.
7+
```js
8+
let str = "Just like \"here\".";
9+
```
10+
11+
For us it's important that an escaped quote `subject:\"` does not end a string.
812

9-
Пример совпадающих строк:
13+
So we should look from one quote to the other ignoring escaped quotes on the way.
14+
15+
That's the essential part of the task, otherwise it would be trivial.
16+
17+
Examples of strings to match:
1018
```js
11-
.. *!*"test me"*/!* .. (обычная строка)
12-
.. *!*"Скажи \"Привет\"!"*/!* ... (строка с кавычками внутри)
13-
.. *!*"\r\n\\"*/!* .. (строка со спец. символами и слэшем внутри)
19+
.. *!*"test me"*/!* ..
20+
.. *!*"Say \"Hello\"!"*/!* ... (escaped quotes inside)
21+
.. *!*"\\"*/!* .. (double slash inside)
22+
.. *!*"\\ \""*/!* .. (double slash and an escaped quote inside)
1423
```
1524

16-
Заметим, что в JavaScript такие строки удобнее всего задавать в одинарных кавычках, и слеши придётся удвоить (в одинарных кавычках они являются экранирующими символами):
25+
In JavaScript we need to double the slashes to pass them right into the string, like this:
1726

18-
Пример задания тестовой строки в JavaScript:
1927
```js run
20-
var str = ' .. "test me" .. "Скажи \\"Привет\\"!" .. "\\r\\n\\\\" .. ';
28+
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
2129

22-
// эта строка будет такой:
23-
alert(str); // .. "test me" .. "Скажи \"Привет\"!" .. "\r\n\\" ..
30+
// the in-memory string
31+
alert(str); // .. "test me" .. "Say \"Hello\"!" .. "\\ \"" ..
2432
```
25-
Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11

2-
Начало шаблона очевидно: `pattern:<style`.
2+
The pattern start is obvious: `pattern:<style`.
33

4-
А вот дальше... Мы не можем написать просто `pattern:<style.*?>`, так как `match:<styler>` удовлетворяет этому регэкспу.
4+
...But then we can't simply write `pattern:<style.*?>`, because `match:<styler>` would match it.
55

6-
Нужно уточнить его. После `match:<style` должен быть либо пробел, после которого может быть что-то ещё, либо закрытие тега.
6+
We need either a space after `match:<style` and then optionally something else or the ending `match:>`.
77

8-
На языке регэкспов: `pattern:<style(>|\s.*?>)`.
8+
In the regexp language: `pattern:<style(>|\s.*?>)`.
99

10-
В действии:
10+
In action:
1111

1212
```js run
13-
var re = /<style(>|\s.*?>)/g;
13+
let reg = /<style(>|\s.*?>)/g;
1414

15-
alert( "<style> <styler> <style test>".match(re) ); // <style>, <style test>
15+
alert( '<style> <styler> <style test="...">'.match(reg) ); // <style>, <style test="...">
1616
```
17-
Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
1-
# Найдите тег style
1+
# Find the full tag
22

3-
Напишите регулярное выражение, которое будет искать в тексте тег `<style>`. Подходят как обычный тег `<style>`, так и вариант с атрибутами `<style type="...">`.
3+
Write a regexp to find the tag `<style...>`. It should match the full tag: it may have no attributes `<style>` or have several of them `<style type="..." id="...">`.
44

5-
Но регулярное выражение не должно находить `<styler>`!
5+
...But the regexp should not match `<styler>`!
66

7-
Использование:
7+
For instance:
88

99
```js
10-
var re = ваш регэксп
10+
let reg = /your regexp/g;
1111

12-
alert( "<style> <styler> <style test>".match(re) ); // <style>, <style test>
12+
alert( '<style> <styler> <style test="...">'.match(reg) ); // <style>, <style test="...">
1313
```
14-

10-regular-expressions-javascript/12-regexp-ahchors/1-start-end/solution.md

Lines changed: 0 additions & 6 deletions
This file was deleted.

0 commit comments

Comments
 (0)