HTML Diff
0 added 0 removed
Original 2026-01-01
Modified 2026-02-21
1 <p>In this lesson, we will look at additional features and different grouping types.</p>
1 <p>In this lesson, we will look at additional features and different grouping types.</p>
2 <h3>Backreferences</h3>
2 <h3>Backreferences</h3>
3 <p>We have a group of symbols from which we choose either ta or tu:</p>
3 <p>We have a group of symbols from which we choose either ta or tu:</p>
4 <p>/(ta|tu)/</p>
4 <p>/(ta|tu)/</p>
5 <p>ta-tu ta-ta tu-tu</p>
5 <p>ta-tu ta-ta tu-tu</p>
6 <p>Suppose we want to find only those substrings in which the left and right parts match: ta - ta and tu - tu.</p>
6 <p>Suppose we want to find only those substrings in which the left and right parts match: ta - ta and tu - tu.</p>
7 <p>Let us try to add another "or" condition to our expression. That way, we will see that we have not got what we wanted:</p>
7 <p>Let us try to add another "or" condition to our expression. That way, we will see that we have not got what we wanted:</p>
8 <p>/(ta|tu)-(ta|tu)/</p>
8 <p>/(ta|tu)-(ta|tu)/</p>
9 <p>ta-tu ta-ta tu-tu</p>
9 <p>ta-tu ta-ta tu-tu</p>
10 <p>It is the case when<strong>backreferencing</strong>helps. It works as follows. We use the special notation \1, which shows that we should substitute the characters from the first group for \1.</p>
10 <p>It is the case when<strong>backreferencing</strong>helps. It works as follows. We use the special notation \1, which shows that we should substitute the characters from the first group for \1.</p>
11 <p>Thus, we will find substrings with the same left and right parts:</p>
11 <p>Thus, we will find substrings with the same left and right parts:</p>
12 <p>/(ta|tu)-\1/</p>
12 <p>/(ta|tu)-\1/</p>
13 <p>ta-tu ta-ta tu-tu</p>
13 <p>ta-tu ta-ta tu-tu</p>
14 <p>By default, we create all character groups, write them to a specific memory area, and label them with characters from \1 to \9.</p>
14 <p>By default, we create all character groups, write them to a specific memory area, and label them with characters from \1 to \9.</p>
15 <p>When we use quantification, it does not affect the result. The quantification is not involved in the backreference, so we take only the first occurrence in the memory area:</p>
15 <p>When we use quantification, it does not affect the result. The quantification is not involved in the backreference, so we take only the first occurrence in the memory area:</p>
16 <p>/(ta|tu)+-\1/</p>
16 <p>/(ta|tu)+-\1/</p>
17 <p>ta-tu ta-ta tu-tu</p>
17 <p>ta-tu ta-ta tu-tu</p>
18 <h3>Named groups</h3>
18 <h3>Named groups</h3>
19 <p>When programmers have multiple groups, they do not find it very convenient to remember them by number. It is much easier to use names. To do this, you must add ?&lt;name&gt; after opening the bracket:</p>
19 <p>When programmers have multiple groups, they do not find it very convenient to remember them by number. It is much easier to use names. To do this, you must add ?&lt;name&gt; after opening the bracket:</p>
20 <p>/(?&lt;group1&gt;ta|tu)-\k&lt;group1&gt;/</p>
20 <p>/(?&lt;group1&gt;ta|tu)-\k&lt;group1&gt;/</p>
21 <p>ta-tu ta-ta tu-tu</p>
21 <p>ta-tu ta-ta tu-tu</p>
22 <p>Now you can refer to the group using the name group1 to perform operations on the group1 in your code.</p>
22 <p>Now you can refer to the group using the name group1 to perform operations on the group1 in your code.</p>
23 <h3>Disabling backreferencing</h3>
23 <h3>Disabling backreferencing</h3>
24 <p>We can turn off backreferencing by putting a ?: inside our group:</p>
24 <p>We can turn off backreferencing by putting a ?: inside our group:</p>
25 <p>/(?:ta|tu)-\1/</p>
25 <p>/(?:ta|tu)-\1/</p>
26 <p>ta-tu ta-ta tu-tu</p>
26 <p>ta-tu ta-ta tu-tu</p>
27 <p>After that, we do not save the group to the memory area. An error can occur when calling it since the group does not exist in the memory.</p>
27 <p>After that, we do not save the group to the memory area. An error can occur when calling it since the group does not exist in the memory.</p>
28 <p>If you use this approach, the regular expression will get very difficult to read, but it will work faster. This method works 100% of the time if:</p>
28 <p>If you use this approach, the regular expression will get very difficult to read, but it will work faster. This method works 100% of the time if:</p>
29 <ul><li>You have a lot of groups and do not need them</li>
29 <ul><li>You have a lot of groups and do not need them</li>
30 <li>You want to avoid using them to save up space and avoid interference with further grouping</li>
30 <li>You want to avoid using them to save up space and avoid interference with further grouping</li>
31 </ul><h3>Atomic grouping</h3>
31 </ul><h3>Atomic grouping</h3>
32 <p>Another interesting kind of grouping without backreferencing is<strong>atomic grouping</strong>.</p>
32 <p>Another interesting kind of grouping without backreferencing is<strong>atomic grouping</strong>.</p>
33 <p>JavaScript, Python, and other popular programming languages do not support atomic grouping. But you can google solutions to emulate them with existing constructions.</p>
33 <p>JavaScript, Python, and other popular programming languages do not support atomic grouping. But you can google solutions to emulate them with existing constructions.</p>
34 <p>For atomic grouping, we use : instead of &gt;:</p>
34 <p>For atomic grouping, we use : instead of &gt;:</p>
35 <p>/a(?&gt;bc|b|x)cc/</p>
35 <p>/a(?&gt;bc|b|x)cc/</p>
36 <p>abccaxcc</p>
36 <p>abccaxcc</p>
37 <p>If we remove ?&gt;, the regex will find two substrings - abcc and axcc:</p>
37 <p>If we remove ?&gt;, the regex will find two substrings - abcc and axcc:</p>
38 <p>/a(bc|b|x)cc/</p>
38 <p>/a(bc|b|x)cc/</p>
39 <p>abccaxcc</p>
39 <p>abccaxcc</p>
40 <p>When we add the atomic grouping characters, ?&gt;, the following happens: we find first a, then bc, then cc. Usually, in the example above, the search would have rolled back to a and continued checking from b since the alternation character | is present. Then, we would get to cc, and the check would work.</p>
40 <p>When we add the atomic grouping characters, ?&gt;, the following happens: we find first a, then bc, then cc. Usually, in the example above, the search would have rolled back to a and continued checking from b since the alternation character | is present. Then, we would get to cc, and the check would work.</p>
41 <p>But with atomic grouping, the return along the string back to a is disabled. It continues moving along the alternatives bc -&gt; b -&gt; x. After x we find cc.</p>
41 <p>But with atomic grouping, the return along the string back to a is disabled. It continues moving along the alternatives bc -&gt; b -&gt; x. After x we find cc.</p>
42 <p>Once we find the first match from the atomic group (?&gt;bc|b|x), other variants from this group do not get considered. Then the next character of the analyzed string is searched from the first character of the regular expression.</p>
42 <p>Once we find the first match from the atomic group (?&gt;bc|b|x), other variants from this group do not get considered. Then the next character of the analyzed string is searched from the first character of the regular expression.</p>
43 <p>We would only be able to find a match for the whole string with atomic grouping if we added another c to it:</p>
43 <p>We would only be able to find a match for the whole string with atomic grouping if we added another c to it:</p>
44 <p>/a(?&gt;bc|b|x)cc/</p>
44 <p>/a(?&gt;bc|b|x)cc/</p>
45 <p>abcccaxcc</p>
45 <p>abcccaxcc</p>