Optimization of work with strings in Powershell

introduction: this article describes how to get acceleration in 5-10 (or more times) during the processing of a large number of strings using String instead of StringBuilder object.

Constructor invocation System.Text.StringBuilder:

the
$SomeString = New-Object System.Text.StringBuilder

The reverse converting to String:

the
$Result = $Str.ToString()

While writing a script that processes a lot of text files that were detected feature work with strings in powershell — namely, significantly reduced the speed of parsing, if you try to process strings using the standard string object.

The original data file full of lines like:

key;888;0xA9498353,888_FilialName


In the raw version of the script for the control treatment was applied the intermediate text files, loss of time to the processing of the file, 1000 lines 24 seconds, if you increase the file size, the delay quickly increases. Example:

the
function test 
{
$Path = 'C:\Powershell\test\test.txt'

$PSGF = Get-Content $Path

# create a file
$PSGFFileName = $Path + '-compare.txt'
Remove-Item-Path $PSGFFileName -ErrorAction SilentlyContinue | Out-Null
New-Item $PSGFFileName -Type File-ErrorAction SilentlyContinue | Out-Null

# ToDo
# in this block is lost time, it is necessary to optimize.
# do not use the intermediate file Add-Content, loss on it
foreach ($Key in $PSGF)
{
$Val = $Key.ToString().Split(';')
$test = $val[2]
$Val = $test.ToString().Split(',')
$test = $Val[0]
Add-Content $PSGFFileName -Value $Test
}

$Result = Get-Content $PSGFFileName
Remove-Item-Path $PSGFFileName -ErrorAction SilentlyContinue | Out-Null
### optimized code # end ################################
return $Result
}

The result of the run:

99 lines — 1.8 seconds
1000 rows — 24.4 seconds
2000 lines — 66,17 seconds

Optimize number 1


It is clear that it is no good. Model the download to file operations in memory:

the
function test 
{
$Path = 'C:\Powershell\test\test.txt'

$PSGF = Get-Content $Path
$Result = "

# 
foreach ($Key in $PSGF)
{
$Val = $Key.ToString().Split(';')
$test = $val[2]
$Val = $test.ToString().Split(',')
$test = $Val[0]
$Result = $Result + "$test R'n"
}

return $Result
}

Measure-Command { test }

The result of the run:

99 lines — 0.0037 seconds
1000 rows — 0.055 seconds
2000 lines is 0.190 seconds

Everything seems to be fine, acceleration is obtained, but let's see what happens if the rows in the project:

10000 rows is 1.92 seconds
20,000 lines — of 8.07 seconds
40000 rows — 26,01 seconds

This method of treatment is suitable for lists of no more than 5-8 thousand rows, then begin the loss on the constructor of the object, the memory Manager allocates new memory when you add a row and copies the object.

Optimization No. 2


Will try to do better, use the "programmatic" approach:

the
function test 
{
$Path = 'C:\Powershell\test\test.txt'

$PSGF = Get-Content $Path

# take the  object  from dotnet
$Str = New-Object System.Text.StringBuilder

foreach ($Key in $PSGF)
{
$Val = $Key.ToString().Split(';')
$temp = $val[2].ToString().Split(',')
$Val = $temp
$temp = $Str.Append( "$Val R'n" )
}

$Result = $Str.ToString()
}

Measure-Command { test }

The result of the run: 40000 lines — 1.8 seconds.

Further improvement of the type of replacement foreach to for, the ejection of the internal variable $test was not given a significant speed increase.

Briefly:

To work efficiently with large number of rows use the System.Text.StringBuilder. Constructor call:

the
$SomeString = New-Object System.Text.StringBuilder

Conversion to string:

the
$Result = $Str.ToString()

Explanation the StringBuilder (the secret is in the more efficient operation of a memory Manager).
Article based on information from habrahabr.ru

Популярные сообщения из этого блога

Approval of WSUS updates: import, export, copy

Kaspersky Security Center — the fight for automation

The Hilbert curve vs. Z-order