0 added
0 removed
Original
2026-01-01
Modified
2026-02-21
1
<p>In this lesson, we will look at additional features and different grouping types.</p>
1
<p>In this lesson, we will look at additional features and different grouping types.</p>
2
<h3>Backreferences</h3>
2
<h3>Backreferences</h3>
3
<p>We have a group of symbols from which we choose either ta or tu:</p>
3
<p>We have a group of symbols from which we choose either ta or tu:</p>
4
<p>/(ta|tu)/</p>
4
<p>/(ta|tu)/</p>
5
<p>ta-tu ta-ta tu-tu</p>
5
<p>ta-tu ta-ta tu-tu</p>
6
<p>Suppose we want to find only those substrings in which the left and right parts match: ta - ta and tu - tu.</p>
6
<p>Suppose we want to find only those substrings in which the left and right parts match: ta - ta and tu - tu.</p>
7
<p>Let us try to add another "or" condition to our expression. That way, we will see that we have not got what we wanted:</p>
7
<p>Let us try to add another "or" condition to our expression. That way, we will see that we have not got what we wanted:</p>
8
<p>/(ta|tu)-(ta|tu)/</p>
8
<p>/(ta|tu)-(ta|tu)/</p>
9
<p>ta-tu ta-ta tu-tu</p>
9
<p>ta-tu ta-ta tu-tu</p>
10
<p>It is the case when<strong>backreferencing</strong>helps. It works as follows. We use the special notation \1, which shows that we should substitute the characters from the first group for \1.</p>
10
<p>It is the case when<strong>backreferencing</strong>helps. It works as follows. We use the special notation \1, which shows that we should substitute the characters from the first group for \1.</p>
11
<p>Thus, we will find substrings with the same left and right parts:</p>
11
<p>Thus, we will find substrings with the same left and right parts:</p>
12
<p>/(ta|tu)-\1/</p>
12
<p>/(ta|tu)-\1/</p>
13
<p>ta-tu ta-ta tu-tu</p>
13
<p>ta-tu ta-ta tu-tu</p>
14
<p>By default, we create all character groups, write them to a specific memory area, and label them with characters from \1 to \9.</p>
14
<p>By default, we create all character groups, write them to a specific memory area, and label them with characters from \1 to \9.</p>
15
<p>When we use quantification, it does not affect the result. The quantification is not involved in the backreference, so we take only the first occurrence in the memory area:</p>
15
<p>When we use quantification, it does not affect the result. The quantification is not involved in the backreference, so we take only the first occurrence in the memory area:</p>
16
<p>/(ta|tu)+-\1/</p>
16
<p>/(ta|tu)+-\1/</p>
17
<p>ta-tu ta-ta tu-tu</p>
17
<p>ta-tu ta-ta tu-tu</p>
18
<h3>Named groups</h3>
18
<h3>Named groups</h3>
19
<p>When programmers have multiple groups, they do not find it very convenient to remember them by number. It is much easier to use names. To do this, you must add ?<name> after opening the bracket:</p>
19
<p>When programmers have multiple groups, they do not find it very convenient to remember them by number. It is much easier to use names. To do this, you must add ?<name> after opening the bracket:</p>
20
<p>/(?<group1>ta|tu)-\k<group1>/</p>
20
<p>/(?<group1>ta|tu)-\k<group1>/</p>
21
<p>ta-tu ta-ta tu-tu</p>
21
<p>ta-tu ta-ta tu-tu</p>
22
<p>Now you can refer to the group using the name group1 to perform operations on the group1 in your code.</p>
22
<p>Now you can refer to the group using the name group1 to perform operations on the group1 in your code.</p>
23
<h3>Disabling backreferencing</h3>
23
<h3>Disabling backreferencing</h3>
24
<p>We can turn off backreferencing by putting a ?: inside our group:</p>
24
<p>We can turn off backreferencing by putting a ?: inside our group:</p>
25
<p>/(?:ta|tu)-\1/</p>
25
<p>/(?:ta|tu)-\1/</p>
26
<p>ta-tu ta-ta tu-tu</p>
26
<p>ta-tu ta-ta tu-tu</p>
27
<p>After that, we do not save the group to the memory area. An error can occur when calling it since the group does not exist in the memory.</p>
27
<p>After that, we do not save the group to the memory area. An error can occur when calling it since the group does not exist in the memory.</p>
28
<p>If you use this approach, the regular expression will get very difficult to read, but it will work faster. This method works 100% of the time if:</p>
28
<p>If you use this approach, the regular expression will get very difficult to read, but it will work faster. This method works 100% of the time if:</p>
29
<ul><li>You have a lot of groups and do not need them</li>
29
<ul><li>You have a lot of groups and do not need them</li>
30
<li>You want to avoid using them to save up space and avoid interference with further grouping</li>
30
<li>You want to avoid using them to save up space and avoid interference with further grouping</li>
31
</ul><h3>Atomic grouping</h3>
31
</ul><h3>Atomic grouping</h3>
32
<p>Another interesting kind of grouping without backreferencing is<strong>atomic grouping</strong>.</p>
32
<p>Another interesting kind of grouping without backreferencing is<strong>atomic grouping</strong>.</p>
33
<p>JavaScript, Python, and other popular programming languages do not support atomic grouping. But you can google solutions to emulate them with existing constructions.</p>
33
<p>JavaScript, Python, and other popular programming languages do not support atomic grouping. But you can google solutions to emulate them with existing constructions.</p>
34
<p>For atomic grouping, we use : instead of >:</p>
34
<p>For atomic grouping, we use : instead of >:</p>
35
<p>/a(?>bc|b|x)cc/</p>
35
<p>/a(?>bc|b|x)cc/</p>
36
<p>abccaxcc</p>
36
<p>abccaxcc</p>
37
<p>If we remove ?>, the regex will find two substrings - abcc and axcc:</p>
37
<p>If we remove ?>, the regex will find two substrings - abcc and axcc:</p>
38
<p>/a(bc|b|x)cc/</p>
38
<p>/a(bc|b|x)cc/</p>
39
<p>abccaxcc</p>
39
<p>abccaxcc</p>
40
<p>When we add the atomic grouping characters, ?>, the following happens: we find first a, then bc, then cc. Usually, in the example above, the search would have rolled back to a and continued checking from b since the alternation character | is present. Then, we would get to cc, and the check would work.</p>
40
<p>When we add the atomic grouping characters, ?>, the following happens: we find first a, then bc, then cc. Usually, in the example above, the search would have rolled back to a and continued checking from b since the alternation character | is present. Then, we would get to cc, and the check would work.</p>
41
<p>But with atomic grouping, the return along the string back to a is disabled. It continues moving along the alternatives bc -> b -> x. After x we find cc.</p>
41
<p>But with atomic grouping, the return along the string back to a is disabled. It continues moving along the alternatives bc -> b -> x. After x we find cc.</p>
42
<p>Once we find the first match from the atomic group (?>bc|b|x), other variants from this group do not get considered. Then the next character of the analyzed string is searched from the first character of the regular expression.</p>
42
<p>Once we find the first match from the atomic group (?>bc|b|x), other variants from this group do not get considered. Then the next character of the analyzed string is searched from the first character of the regular expression.</p>
43
<p>We would only be able to find a match for the whole string with atomic grouping if we added another c to it:</p>
43
<p>We would only be able to find a match for the whole string with atomic grouping if we added another c to it:</p>
44
<p>/a(?>bc|b|x)cc/</p>
44
<p>/a(?>bc|b|x)cc/</p>
45
<p>abcccaxcc</p>
45
<p>abcccaxcc</p>