Optimization of work with strings in Powershell
introduction: this article describes how to get acceleration in 5-10 (or more times) during the processing of a large number of strings using String instead of StringBuilder object.
Constructor invocation System.Text.StringBuilder:
the
The reverse converting to String:
the
While writing a script that processes a lot of text files that were detected feature work with strings in powershell — namely, significantly reduced the speed of parsing, if you try to process strings using the standard string object.
The original data file full of lines like:
In the raw version of the script for the control treatment was applied the intermediate text files, loss of time to the processing of the file, 1000 lines 24 seconds, if you increase the file size, the delay quickly increases. Example:
the
The result of the run:
99 lines — 1.8 seconds
1000 rows — 24.4 seconds
2000 lines — 66,17 seconds
It is clear that it is no good. Model the download to file operations in memory:
the
The result of the run:
99 lines — 0.0037 seconds
1000 rows — 0.055 seconds
2000 lines is 0.190 seconds
Everything seems to be fine, acceleration is obtained, but let's see what happens if the rows in the project:
10000 rows is 1.92 seconds
20,000 lines — of 8.07 seconds
40000 rows — 26,01 seconds
This method of treatment is suitable for lists of no more than 5-8 thousand rows, then begin the loss on the constructor of the object, the memory Manager allocates new memory when you add a row and copies the object.
Will try to do better, use the "programmatic" approach:
the
The result of the run: 40000 lines — 1.8 seconds.
Further improvement of the type of replacement foreach to for, the ejection of the internal variable $test was not given a significant speed increase.
Briefly:
To work efficiently with large number of rows use the System.Text.StringBuilder. Constructor call:
the
Conversion to string:
the
Explanation the StringBuilder (the secret is in the more efficient operation of a memory Manager).
Article based on information from habrahabr.ru
Constructor invocation System.Text.StringBuilder:
the
$SomeString = New-Object System.Text.StringBuilder
The reverse converting to String:
the
$Result = $Str.ToString()
While writing a script that processes a lot of text files that were detected feature work with strings in powershell — namely, significantly reduced the speed of parsing, if you try to process strings using the standard string object.
The original data file full of lines like:
key;888;0xA9498353,888_FilialName
In the raw version of the script for the control treatment was applied the intermediate text files, loss of time to the processing of the file, 1000 lines 24 seconds, if you increase the file size, the delay quickly increases. Example:
the
function test
{
$Path = 'C:\Powershell\test\test.txt'
$PSGF = Get-Content $Path
# create a file
$PSGFFileName = $Path + '-compare.txt'
Remove-Item-Path $PSGFFileName -ErrorAction SilentlyContinue | Out-Null
New-Item $PSGFFileName -Type File-ErrorAction SilentlyContinue | Out-Null
# ToDo
# in this block is lost time, it is necessary to optimize.
# do not use the intermediate file Add-Content, loss on it
foreach ($Key in $PSGF)
{
$Val = $Key.ToString().Split(';')
$test = $val[2]
$Val = $test.ToString().Split(',')
$test = $Val[0]
Add-Content $PSGFFileName -Value $Test
}
$Result = Get-Content $PSGFFileName
Remove-Item-Path $PSGFFileName -ErrorAction SilentlyContinue | Out-Null
### optimized code # end ################################
return $Result
}
The result of the run:
99 lines — 1.8 seconds
1000 rows — 24.4 seconds
2000 lines — 66,17 seconds
Optimize number 1
It is clear that it is no good. Model the download to file operations in memory:
the
function test
{
$Path = 'C:\Powershell\test\test.txt'
$PSGF = Get-Content $Path
$Result = "
#
foreach ($Key in $PSGF)
{
$Val = $Key.ToString().Split(';')
$test = $val[2]
$Val = $test.ToString().Split(',')
$test = $Val[0]
$Result = $Result + "$test R'n"
}
return $Result
}
Measure-Command { test }
The result of the run:
99 lines — 0.0037 seconds
1000 rows — 0.055 seconds
2000 lines is 0.190 seconds
Everything seems to be fine, acceleration is obtained, but let's see what happens if the rows in the project:
10000 rows is 1.92 seconds
20,000 lines — of 8.07 seconds
40000 rows — 26,01 seconds
This method of treatment is suitable for lists of no more than 5-8 thousand rows, then begin the loss on the constructor of the object, the memory Manager allocates new memory when you add a row and copies the object.
Optimization No. 2
Will try to do better, use the "programmatic" approach:
the
function test
{
$Path = 'C:\Powershell\test\test.txt'
$PSGF = Get-Content $Path
# take the object from dotnet
$Str = New-Object System.Text.StringBuilder
foreach ($Key in $PSGF)
{
$Val = $Key.ToString().Split(';')
$temp = $val[2].ToString().Split(',')
$Val = $temp
$temp = $Str.Append( "$Val R'n" )
}
$Result = $Str.ToString()
}
Measure-Command { test }
The result of the run: 40000 lines — 1.8 seconds.
Further improvement of the type of replacement foreach to for, the ejection of the internal variable $test was not given a significant speed increase.
Briefly:
To work efficiently with large number of rows use the System.Text.StringBuilder. Constructor call:
the
$SomeString = New-Object System.Text.StringBuilder
Conversion to string:
the
$Result = $Str.ToString()
Explanation the StringBuilder (the secret is in the more efficient operation of a memory Manager).