Skip to content

Commit 4e793e8

Browse files
danmoseleyCopilot
andcommitted
Document RegexOptions.AnyNewLine in conceptual docs
Add AnyNewLine mode section to Regular Expression Options article. Add tips/notes about AnyNewLine in: - Multiline mode section (as alternative to \r?\$ workaround) - Anchors doc (\$ and \Z sections) - Character classes doc (Any character: . section) - Quick reference (options table) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 318da46 commit 4e793e8

File tree

3 files changed

+53
-1
lines changed

3 files changed

+53
-1
lines changed

docs/standard/base-types/anchors-in-regular-expressions.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ Anchors, or atomic zero-width assertions, specify a position in the string where
6363
The `$` anchor specifies that the preceding pattern must occur at the end of the input string, or before `\n` at the end of the input string.
6464

6565
If you use `$` with the <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType> option, the match can also occur at the end of a line. Note that `$` is satisfied at `\n` but not at `\r\n` (the combination of carriage return and newline characters, or CR/LF). To handle the CR/LF character combination, include `\r?$` in the regular expression pattern. Note that `\r?$` will include any `\r` in the match.
66+
67+
Starting with .NET 11, you can use <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> to make `$` recognize all common newline sequences instead of only `\n`. Unlike the `\r?$` workaround, `AnyNewLine` treats `\r\n` as an atomic sequence, so `\r` is not included in the match. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode).
6668

6769
The following example adds the `$` anchor to the regular expression pattern used in the example in the [Start of String or Line](#start-of-string-or-line-) section. When used with the original input string, which includes five lines of text, the <xref:System.Text.RegularExpressions.Regex.Matches%28System.String%2CSystem.String%29?displayProperty=nameWithType> method is unable to find a match, because the end of the first line does not match the `$` pattern. When the original input string is split into a string array, the <xref:System.Text.RegularExpressions.Regex.Matches%28System.String%2CSystem.String%29?displayProperty=nameWithType> method succeeds in matching each of the five lines. When the <xref:System.Text.RegularExpressions.Regex.Matches%28System.String%2CSystem.String%2CSystem.Text.RegularExpressions.RegexOptions%29?displayProperty=nameWithType> method is called with the `options` parameter set to <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType>, no matches are found because the regular expression pattern does not account for the carriage return character `\r`. However, when the regular expression pattern is modified by replacing `$` with `\r?$`, calling the <xref:System.Text.RegularExpressions.Regex.Matches%28System.String%2CSystem.String%2CSystem.Text.RegularExpressions.RegexOptions%29?displayProperty=nameWithType> method with the `options` parameter set to <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType> again finds five matches.
6870

@@ -83,6 +85,8 @@ Anchors, or atomic zero-width assertions, specify a position in the string where
8385
The `\Z` anchor specifies that a match must occur at the end of the input string, or before `\n` at the end of the input string. It is identical to the `$` anchor, except that `\Z` ignores the <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType> option. Therefore, in a multiline string, it can only be satisfied by the end of the last line, or the last line before `\n`.
8486

8587
Note that `\Z` is satisfied at `\n` but is not satisfied at `\r\n` (the CR/LF character combination). To treat CR/LF as if it were `\n`, include `\r?\Z` in the regular expression pattern. Note that this will make the `\r` part of the match.
88+
89+
Starting with .NET 11, you can use <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> to make `\Z` recognize all common newline sequences instead of only `\n`. Unlike the `\r?\Z` workaround, `AnyNewLine` treats `\r\n` as an atomic sequence, so `\r` is not included in the match. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode).
8690

8791
The following example uses the `\Z` anchor in a regular expression that is similar to the example in the [Start of String or Line](#start-of-string-or-line-) section, which extracts information about the years during which some professional baseball teams existed. The subexpression `\r?\Z` in the regular expression `^((\w+(\s?)){2,}),\s(\w+\s\w+),(\s\d{4}(-(\d{4}|present))?,?)+\r?\Z` is satisfied at the end of a string, and also at the end of a string that ends with `\n` or `\r\n`. As a result, each element in the array matches the regular expression pattern.
8892

docs/standard/base-types/character-classes-in-regular-expressions.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,8 @@ The period character (.) matches any character except `\n` (the newline characte
162162

163163
- If a regular expression pattern is modified by the <xref:System.Text.RegularExpressions.RegexOptions.Singleline?displayProperty=nameWithType> option, or if the portion of the pattern that contains the `.` character class is modified by the `s` option, `.` matches any character. For more information, see [Regular Expression Options](regular-expression-options.md).
164164

165+
- Starting with .NET 11, if the <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> option is specified, `.` excludes all common newline characters instead of only `\n`. If both `Singleline` and `AnyNewLine` are specified, `Singleline` takes precedence and `.` matches every character. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode).
166+
165167
The following example illustrates the different behavior of the `.` character class by default and with the <xref:System.Text.RegularExpressions.RegexOptions.Singleline?displayProperty=nameWithType> option. The regular expression `^.+` starts at the beginning of the string and matches every character. By default, the match ends at the end of the first line; the regular expression pattern matches the carriage return character, `\r`, but it doesn't match `\n`. Because the <xref:System.Text.RegularExpressions.RegexOptions.Singleline?displayProperty=nameWithType> option interprets the entire input string as a single line, it matches every character in the input string, including `\n`.
166168

167169
:::code language="csharp" source="snippets/character-classes-in-regular-expressions/csharp/Program.cs" id="AnyCharacterMultiline":::

docs/standard/base-types/regular-expression-options.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ By default, the comparison of an input string with any literal characters in a r
3030
| <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript> | Not available | Enable ECMAScript-compliant behavior for the expression. | [ECMAScript matching behavior](#ecmascript-matching-behavior) |
3131
| <xref:System.Text.RegularExpressions.RegexOptions.CultureInvariant> | Not available | Ignore cultural differences in language. | [Comparison using the invariant culture](#compare-using-the-invariant-culture) |
3232
| <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking> | Not available | Match using an approach that avoids backtracking and guarantees linear-time processing in the length of the input. (Available in .NET 7 and later versions.)| [Nonbacktracking mode](#nonbacktracking-mode) |
33+
| <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine> | Not available | Make `^`, `$`, `\Z`, and `.` recognize all common newline sequences instead of only `\n`. (Available in .NET 11 and later versions.) | [AnyNewLine mode](#anynewline-mode) |
3334

3435
## Specify options
3536

@@ -75,7 +76,7 @@ The following five regular expression options can be set both with the options p
7576

7677
- <xref:System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace?displayProperty=nameWithType>
7778

78-
The following five regular expression options can be set using the `options` parameter but cannot be set inline:
79+
The following seven regular expression options can be set using the `options` parameter but cannot be set inline:
7980

8081
- <xref:System.Text.RegularExpressions.RegexOptions.None?displayProperty=nameWithType>
8182

@@ -87,6 +88,10 @@ The following five regular expression options can be set using the `options` par
8788

8889
- <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript?displayProperty=nameWithType>
8990

91+
- <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType>
92+
93+
- <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType>
94+
9095
## Determine options
9196

9297
You can determine which options were provided to a <xref:System.Text.RegularExpressions.Regex> object when it was instantiated by retrieving the value of the read-only <xref:System.Text.RegularExpressions.Regex.Options%2A?displayProperty=nameWithType> property.
@@ -150,6 +155,9 @@ By default, `$` will be satisfied only at the end of the input string. If you sp
150155

151156
In neither case does `$` recognize the carriage return/line feed character combination (`\r\n`). `$` always ignores any carriage return (`\r`). To end your match with either `\r\n` or `\n`, use the subexpression `\r?$` instead of just `$`. Note that this will make the `\r` part of the match.
152157

158+
> [!TIP]
159+
> Starting with .NET 11, you can use <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> to make `^`, `$`, `\Z`, and `.` recognize all common newline sequences instead of only `\n`, removing the need for `\r?` workarounds. `AnyNewLine` also treats `\r\n` as an atomic newline sequence, so `\r` is never included in the match. For more information, see the [AnyNewLine mode](#anynewline-mode) section.
160+
153161
The following example extracts bowlers' names and scores and adds them to a <xref:System.Collections.Generic.SortedList%602> collection that sorts them in descending order. The <xref:System.Text.RegularExpressions.Regex.Matches%2A> method is called twice. In the first method call, the regular expression is `^(\w+)\s(\d+)$` and no options are set. As the output shows, because the regular expression engine cannot match the input pattern along with the beginning and end of the input string, no matches are found. In the second method call, the regular expression is changed to `^(\w+)\s(\d+)\r?$` and the options are set to <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType>. As the output shows, the names and scores are successfully matched, and the scores are displayed in descending order.
154162

155163
[!code-csharp[Conceptual.Regex.Language.Options#3](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.regex.language.options/cs/multiline1.cs#3)]
@@ -405,6 +413,44 @@ The <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayPro
405413

406414
For more information about backtracking, see [Backtracking in regular expressions](backtracking-in-regular-expressions.md).
407415

416+
## AnyNewLine mode
417+
418+
By default, .NET's regular expression engine treats only `\n` as a newline character. The anchors `^` and `$` (in <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType> mode), `\Z`, and the wildcard `.` all use `\n` as the sole line boundary. This means that `$` doesn't match before `\r\n` (Windows-style line endings), and `.` matches `\r`, which leads to common bugs when processing text with mixed or non-Unix line endings.
419+
420+
The <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> option, which was introduced in .NET 11, makes these constructs recognize all common newline sequences: `\r\n` (CR+LF), `\r` (CR), `\n` (LF), `\u0085` (NEL), `\u2028` (LS), and `\u2029` (PS). This is consistent with [Unicode TR18 RL1.6](https://unicode.org/reports/tr18/#RL1.6).
421+
422+
For example, without `AnyNewLine`, matching lines in a string with Windows line endings requires manual workarounds like `\r?$`:
423+
424+
```csharp
425+
// BUG: .+$ captures trailing \r on Windows line endings
426+
var match = Regex.Match("foo\r\nbar", @".+$", RegexOptions.Multiline);
427+
Console.WriteLine(match.Value); // "foo\r" -- not "foo"!
428+
```
429+
430+
With `AnyNewLine`, the anchors handle all newline types automatically:
431+
432+
```csharp
433+
var match = Regex.Match("foo\r\nbar", @".+$",
434+
RegexOptions.Multiline | RegexOptions.AnyNewLine);
435+
Console.WriteLine(match.Value); // "foo"
436+
```
437+
438+
The following table summarizes how `AnyNewLine` affects each construct:
439+
440+
|Construct|Default behavior|With `AnyNewLine`|
441+
|---------|----------------|-----------------|
442+
|`.`|Matches any character except `\n`|Matches any character except `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`|
443+
|`$` (Multiline)|Matches before `\n`|Matches before `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`|
444+
|`^` (Multiline)|Matches after `\n`|Matches after `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`|
445+
|`$` (default) / `\Z`|Matches before `\n` at end of string|Matches before any newline sequence at end of string|
446+
447+
Key design points:
448+
449+
- **`\r\n` is treated atomically**: `$` matches before the full `\r\n` sequence, never between `\r` and `\n`.
450+
- **`Singleline` takes precedence**: `.` with both `Singleline` and `AnyNewLine` matches every character (including newlines), consistent with `Singleline`'s existing behavior.
451+
- **`\A` and `\z` are unaffected**: Absolute start-of-string and end-of-string anchors don't change.
452+
- **Incompatible options**: `AnyNewLine` cannot be combined with <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType> or <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript?displayProperty=nameWithType>. Attempting to do so throws an <xref:System.ArgumentOutOfRangeException>.
453+
408454
## See also
409455

410456
- [Regular Expression Language - Quick Reference](regular-expression-language-quick-reference.md)

0 commit comments

Comments
 (0)