Skip to content

Replace PowerShell operators with .NET methods for parsing and serialization performance #30

@MariusStorhaug

Description

The ConvertFrom-Yaml and ConvertTo-Yaml functions process every line and scalar through several private helpers. These helpers currently use PowerShell operators (-split, -replace, -match, -contains) where equivalent .NET string methods ([string]::Split(), .Replace(), .Contains(), .IndexOfAny()) would execute significantly faster. On large YAML documents the cumulative overhead is measurable — PowerShell operators carry regex compilation, case-folding, and boxing costs that the underlying .NET methods avoid.

Request

Desired capability

Parsing and serializing YAML should be as fast as pure-PowerShell code allows. All private helper functions should prefer .NET [string] instance methods and [System.Collections.Generic.HashSet[T]] lookups over PowerShell operators when the operator's extra capabilities (regex, wildcard, case-insensitive matching) are not needed.

Acceptance criteria

  • Every literal (non-regex) -replace is converted to [string]::Replace()
  • Every -split on a literal delimiter is converted to [string]::Split()
  • Every -match / -notmatch used for simple substring or character detection is converted to .Contains() or .IndexOfAny()
  • The @() array + -contains lookup in Test-YamlPlainSafe is converted to a [HashSet[char]] with .Contains()
  • The .ToCharArray() loop for control-character scanning is converted to a for index loop over [string] indexing (avoids allocating a char[] copy)
  • All existing tests continue to pass with no behavioral changes

Technical decisions

No behavioral changes. Every replacement is a strict performance-only refactor — same inputs produce the same outputs. No new parameters, no changed return types.

Operator → method mapping. The following table lists each anti-pattern, where it occurs, and the replacement. Line numbers reference the feature/2-convert-yaml-functions branch head.

-replace (literal) → [string]::Replace()

PowerShell's -replace compiles a regex even for literal patterns. .Replace() does a direct substring scan.

File Line Current Replacement
ConvertFrom-YamlLineStream.ps1 24 $Text -replace "`r`n", "`n" $Text.Replace("`r`n", "`n")
ConvertFrom-YamlMapping.ps1 35 ($rawKey.Substring(…)) -replace "''", "'" ($rawKey.Substring(…)).Replace("''", "'")
ConvertFrom-YamlScalar.ps1 21 ($inner -replace "''", "'") $inner.Replace("''", "'")

-split (literal) → [string]::Split()

Same regex overhead avoided.

File Line Current Replacement
ConvertFrom-YamlLineStream.ps1 25 $normalized -split "`n" $normalized.Split("`n")

-match / -notmatch (simple character check) → .Contains() / .IndexOfAny()

When the regex is just a single character or character class, .Contains() or .IndexOfAny() is 3–10× faster.

File Line Current Replacement
Format-YamlKey.ps1 18 $text -notmatch "'" -not $text.Contains("'")
Format-YamlString.ps1 35 $Text -notmatch "'" -not $Text.Contains("'")
Test-YamlPlainSafe.ps1 47 $Text -match '[:#]' $Text.IndexOfAny(@([char]':', [char]'#')) -ge 0
Test-YamlPlainSafe.ps1 53 $Text -match '[\[\]\{\},&*!|>''"%@`]' $Text.IndexOfAny($dangerousChars) -ge 0 where $dangerousChars is a module-level [char[]] constant

-imatch (regex) → [regex]::IsMatch()

The -imatch operator creates a transient Regex object per call. A static [regex]::IsMatch() call with an explicit option avoids repeated compilation. Alternatively, compile a [regex] once at module scope and call .IsMatch().

File Line Current Replacement
ConvertFrom-YamlScalar.ps1 49 $value -imatch '^[+-]?(infinity|nan)$' [regex]::IsMatch($value, '^[+-]?(infinity|nan)$', [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)

Array -contains[HashSet[char]]::Contains()

The @(…) -contains $x pattern does an O(n) linear scan on every call. Test-YamlPlainSafe is called for every scalar and every mapping key — this is the hottest path.

File Line Current Replacement
Test-YamlPlainSafe.ps1 38–39 $disallowedFirst = @('-', '?', …); if ($disallowedFirst -contains [string] $first) Declare a module-scoped [HashSet[char]] once, then call .Contains($first) — O(1) per lookup

.ToCharArray() loop → index-based for loop

$Text.ToCharArray() allocates a new char[] array. Iterating $Text[$i] directly avoids the allocation.

File Lines Current Replacement
Format-YamlDoubleQuoted.ps1 16 foreach ($ch in $Text.ToCharArray()) for ($i = 0; $i -lt $Text.Length; $i++) { $ch = $Text[$i]
Format-YamlString.ps1 18 foreach ($ch in $Text.ToCharArray()) for ($i = 0; $i -lt $Text.Length; $i++) { $ch = $Text[$i]
Test-YamlPlainSafe.ps1 41 foreach ($ch in $Text.ToCharArray()) for ($i = 0; $i -lt $Text.Length; $i++) { $ch = $Text[$i]

Test approach: Run the full Pester suite (tests/ConvertFrom-Yaml.Tests.ps1, tests/ConvertTo-Yaml.Tests.ps1) before and after. No test changes expected — this is a pure internal refactor.


Implementation plan

Core changes

  • Convert -replace to .Replace() in ConvertFrom-YamlLineStream.ps1 (line 24)
  • Convert -split to .Split() in ConvertFrom-YamlLineStream.ps1 (line 25)
  • Convert -replace to .Replace() in ConvertFrom-YamlMapping.ps1 (line 35)
  • Convert -replace to .Replace() in ConvertFrom-YamlScalar.ps1 (line 21)
  • Convert -imatch to [regex]::IsMatch() in ConvertFrom-YamlScalar.ps1 (line 49)
  • Convert -notmatch to .Contains() in Format-YamlKey.ps1 (line 18)
  • Convert -notmatch to .Contains() in Format-YamlString.ps1 (line 35)
  • Replace @() + -contains with [HashSet[char]] in Test-YamlPlainSafe.ps1 (lines 38–39)
  • Convert -match '[:#]' to .IndexOfAny() in Test-YamlPlainSafe.ps1 (line 47)
  • Convert -match '[\[\]…]' to .IndexOfAny() with [char[]] constant in Test-YamlPlainSafe.ps1 (line 53)
  • Convert .ToCharArray() loops to index-based for loops in Format-YamlDoubleQuoted.ps1, Format-YamlString.ps1, Test-YamlPlainSafe.ps1

Validation

  • Run full Pester suite — all existing tests must pass unchanged

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions