Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Syntax error location unprecise when using ForEach-Object instead of foreach

Most of the time the error reporting in powershell is very useful. I see errors, I see the origin, but I noticed that any errors in ForEach-Object pipelines lose their origin line and the error stack simply points to the code line with the ForEach-Object.

Example of a bad error location

Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"

[cultureinfo]::CurrentUICulture = 'en-US'

function Test-Something($val)
{
    # var is not defined here
    Write-Host($undefined_variable)
}

@("a","b") | ForEach-Object{
    $entry = $_

    Test-Something $entry
}

Results in

ForEach-Object : The variable '$undefined_variable' cannot be retrieved because it has not been set.
In D:\dummy.ps1:12 Line:14
+ @("a","b") | ForEach-Object{
+              ~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (undefined_variable:String) [ForEach-Object], RuntimeException
    + FullyQualifiedErrorId : VariableIsUndefined,Microsoft.PowerShell.Commands.ForEachObjectCommand

The line 12 points to @("a","b") | ForEach-Object{, which is obviously not the error location.

Much more usable example with foreach

Now I am using a foreach (only the last 4 code lines have changed)

Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"

[cultureinfo]::CurrentUICulture = 'en-US'

function Test-Something($val)
{
    # var is not defined here
    Write-Host($undefined_variable)
}

$data = @("a","b")

foreach($entry in $data)
{
    Test-Something $entry
}

The error now is much more usable, it points actually to the error line 9 to Write-Host($undefined_variable)

The variable '$undefined_variable' cannot be retrieved because it has not been set.
In D:\dummy.ps1:9 Line:16
+     Write-Host($undefined_variable)
+                ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (undefined_variable:String) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : VariableIsUndefined

Is this related to the piping nature of the pipeline operators or am understanding something wrong. The first scenario makes error tracing very hard when there are more functions in between. In most cases I can switch to foreach but I would love to understand what actually is happening.

like image 649
Samuel Avatar asked Sep 03 '25 17:09

Samuel


2 Answers

This might be an oversimplification, and someone with deeper formal knowledge may have a more precise technical answer, but it's worth noting that for and foreach are built-in PowerShell statements (called "language commands" in the documentation links) that contain a nested statement list:

for (<Init>; <Condition>; <Repeat>)
{
    <Statement list>
}

foreach ($<item> in $<collection>)
{
    <statement list>
}

They're part of the PowerShell syntax and the engine knows which statement from the list is being executed, whereas ForEach-Object is "just" a cmdlet like e.g. Invoke-Command or Get-ChldItem.

ForEach-Object
    ... snip ...
    [-Begin    <ScriptBlock>]
    [-Process] <ScriptBlock[]>
    [-End      <ScriptBlock>]
    ... snip ...

As a result, your foreach-object example is equivalent to this:

1 | $myScriptBlock = {
2 |     $entry = $_
3 | 
4 |     Test-Something $entry
5 | }
6 |
7 | @("a", "b") | foreach-object -process $myScriptBlock

ForEach-Object's positional parameters allow you to invoke the cmdlet with syntax that is similar to for / foreach, but the underlying mechanics are different, and as far as the PowerShell engine is concerned the top-level line of code reporting the error is line 7, not 4.

It doesn't trace back to the line of source code that defines the scriptblock - in your case it probably could, but I'm guessing it would be difficult in a more contrived case where the scriptblock is being generated from dynamic code e.g.

1 | $myScriptBlock = [scriptblock]::Create(
2 |     "`$entry = `$_" +
3 |     "`r`n" + "`r`n" + "`r`n" +
4 |     "Test-Something `$entry"
5 | )
6 |
7 | @("a", "b") | foreach-object -process $myScriptBlock

In which case, which line number should the error report?

Update

Per @zett42's comment re stack traces, PowerShell does contain information about which line throws the exception in the $_.ScriptStackTrace property of the exception - see for example:

Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"

[cultureinfo]::CurrentUICulture = 'en-US'

function Test-Something($val)
{
    # var is not defined here
    Write-Host($undefined_variable)
}

try
{
    $myScriptBlock = [scriptblock]::Create(
        "`$entry = `$_" + "`r`n" + "`r`n" +
        "Test-" +
        "Something `$entry"
    )
    @("a", "b") | foreach-object -process $myScriptBlock
}
catch
{
    $_.ScriptStackTrace
    throw
}

Running this gives:

C:\> pwsh C:\src\so\stack.ps1
at Test-Something, C:\src\so\stack.ps1: line 9
at <ScriptBlock>, <No file>: line 3
at <ScriptBlock>, C:\src\so\stack.ps1: line 19
ForEach-Object: C:\src\so\stack.ps1:19
Line |
  19 |      @("a", "b") | foreach-object -process $myScriptBlock
     |                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | The variable '$undefined_variable' cannot be retrieved because it has not been set.

So the simple answer is it looks like PowerShell just reports the top-level line that the exception bubbled up to. In the case of for and foreach that's the line that contains the nested statements, and for foreach-object it's the line the foreach-object appears on.

And to answer my own rhetorical question:

which line number should the error report?

For a dynamic scriptblock, it’s the internal line number relative to the scriptblock's source (and not the main script's line number) - see:

at <ScriptBlock>, <No file>: line 3
like image 55
mclayton Avatar answered Sep 05 '25 06:09

mclayton


Use .{process {<code>}} for a faster and more informative version of ForEach-Object {<code>}

This version is more optimized and executes in less time, but is arguably less readable and lacks some of ForEach-Object's functionality. Yet, since the extra functionality is seldom used, it is a one for one replacement for most cases. In cases where you desire to use -Begin or -End, uncomment as needed the begin {} or end {} lines and add your desired code.

This code:

Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"

[cultureinfo]::CurrentUICulture = 'en-US'

function Test-Something($val) {
   # var is not defined here
   Write-Host($undefined_variable)
}

@("a", "b") | .{
   # begin {}
   process {
      $entry = $_

      Test-Something $entry
   }
   # end {}
}

Produces this output:

The variable '$undefined_variable' cannot be retrieved because it has not been set.
At D:\Sync\Work\StackOverflow\PowerShell\77778774\SeluwuFE-Oiof.ps1:8 char:15
+    Write-Host($undefined_variable)
+               ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (undefined_variable:String) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : VariableIsUndefined

Explanation:

(Skip to number 7 for summery of 2 through to 6)

  1. First of all, I learned this from an answer given here on Stack Overflow, unfortunately I don't have a link to that answer, but almost 100% sure it was this answer given by Santiago.
  2. This solution uses dot-sourcing, a good brief statement on this can be found here.
  3. Dot-sourcing executes a script block, and "Like functions, script blocks can include the DynamicParam, Begin, Process, and End keywords."
  4. I believe the section on Using delay-bind script blocks with parameters applies to this usage of a script block, but to be honest, I need to do more research to make sure I fully understand what is going on.
  5. From what I've seen, the begin {}, process {}, and end {} sections in a function or script block are designed for handling pipeline input. The begin {} section executes before any pipeline input is processed, process {} executes once per each item on the pipeline, and afterwards, the end {} section is executed.
    • The thing I find sometimes very useful about begin {} is that you can define functions in it and also variables that can be used to accumulate info related to items on the pipeline (such as a count or sum).
    • The thing I find sometimes very useful about end {} is that you can build the pipeline output you want and send it just before the script block exit (this could include info such as a count or sum).
  6. From what I can tell, the performance increase comes from the fact that Foreach-Object isn't actually part of the PowerShell language, but instead is a cmdLet provided by module Microsoft.PowerShell.Core. Where dot-sourcing and script blocks are truly part of the PowerShell language. I believe the Foreach-Object cmdlet is a C# class that inherits from PSCmdlet. As such, an instance of the class has to be created and initialized, and multiple methods of the class are called. This, and probably other factors, produces overhead that decreases performance. Cmdlets are great, and probably 99% of the time the performance hit is of no consequence and can be ignored.
  7. To put it all together, pipeline input is fed into a dot-sourced script block having a process {} section that executes once per each pipeline item, with each item assigned to the automatic variable $_, and having the added performance boost of not using a cmdlet to get the job done.

Performance:

Santiago, in the comments, pointed out that Call operator & is internally optimized, where the Dot Sourcing operator . isn't. I believe he figured this out by looking at PowerShell's source code. With this in mind, using the Call operator should have better performance over the Dot Sourcing operator, so ran a performance test.

The problem with running test on a computer is that the external processes may cause performance issues that interfere with the execution time of PowerShell code. To distribute these issues evenly over each method, the test results for the Call operator, Dot Sourcing, Foreach loop, and Foreach-Object cmdlet involve each method executing in turns to sum one million random numbers (0-99), with a total of 100 executions per method. The total time was calculated as the sum of all executions for each method.

As you can see in the following tables, the Call operator & preformed the task in the least amount of time.

PowerShell 5.1.19041.3803:

Call operator   : 41.71 seconds (2.20 times faster than Dot Sourcing)
Dot Sourcing    : 1 minute and 31.653 seconds
Foreach loop    : 10.463 seconds (8.76 times faster than Dot Sourcing)
Foreach-Object  : 7 minutes and 55.407 seconds (5.19 times slower than Dot Sourcing)

PowerShell 7.3.9:

Call operator   : 30.395 seconds (1.80 times faster than Dot Sourcing)
Dot Sourcing    : 54.766 seconds
Foreach loop    : 5.65 seconds (9.69 times faster than Dot Sourcing)
Foreach-Object  : 3 minutes and 29.799 seconds (3.83 times slower than Dot Sourcing)

Call Operator & test code:

$CallOperator = {
   $RandomIntegers | &{
      begin {
         $Sum = 0
      }
      process {
         $Sum += $_
      }
      end {
         'Sum: {0:N0}' -f $Sum
      }
   }
}

Dot sourcing operator . test code:

$DotSourcing = {
   $RandomIntegers | .{
      begin {
         $Sum = 0
      }
      process {
         $Sum += $_
      }
      end {
         'Sum: {0:N0}' -f $Sum
      }
   }
}

Foreach loop test code:

$ForeachLoop = {
   $Sum = 0
   foreach ($RandomInteger in $RandomIntegers) {
      $Sum += $RandomInteger
   }
   'Sum: {0:N0}' -f $Sum
}

Foreach-Object cmdlet test code:

$ForeachObject = {
   $RandomIntegers | Foreach-Object -Begin {
      $Sum = 0
   } -Process {
      $Sum += $_
   } -End {
      'Sum: {0:N0}' -f $Sum
   }
}

Code for running the test:
Special thanks to Santiago for providing Measure-Command trick (using Call operator &) for obtaining correct execution time!

Write-Host "Creating Random Numbers:"
[int[]]$RandomIntegers = 1..1000000 | .{ process { Get-Random -Maximum 100 } }
Write-Host "Defining Variables:"
$CallOperatorMilliseconds = 0
$DotSourcingMilliseconds = 0
$ForeachLoopMilliseconds = 0
$ForeachObjectMilliseconds = 0
Write-Host "Staring loop"
[int]$LoopCount = 100

for ($i = 0; $i -lt $LoopCount; $i++) {
   Write-Host "[$i]"
   $CallOperatorMilliseconds += (Measure-Command -Expression { & $CallOperator }).TotalMilliseconds
   $DotSourcingMilliseconds += (Measure-Command -Expression { & $DotSourcing }).TotalMilliseconds
   $ForeachLoopMilliseconds += (Measure-Command -Expression { & $ForeachLoop }).TotalMilliseconds
   $ForeachObjectMilliseconds += (Measure-Command -Expression { & $ForeachObject }).TotalMilliseconds
}

Functions used in printing test results:

function Get-ReadableTime {
   [CmdletBinding()]
   param (
      [Parameter(Mandatory = $true, Position = 0)]
      [Double]$Milliseconds
   )
   $TimeSpan = [TimeSpan]::FromMilliseconds($Milliseconds)
   [string[]]$TimeList =
   @( $TimeSpan.Days, 'day' ),
   @( $TimeSpan.Hours, 'hour' ),
   @( $TimeSpan.Minutes, 'minute' ),
   @( [float]($TimeSpan.Seconds + $TimeSpan.Milliseconds / 1000), 'second' ) |
   .{ process {
         if ($_[0] -ne 1 -and $_[0] -ne 1.0) { $_[1] += 's' }
         if (($_[0])) { '{0} {1}' -f $_[0], $_[1] }
      } }
   if ($TimeList.Count -eq 0) { $TimeList += '0 seconds' }
   $LastIndex = $TimeList.Count - 1
   if ($LastIndex -ge 2) { $TimeList[$LastIndex] = 'and ' + $TimeList[$LastIndex]; $TimeList -join ', ' }
   else { $TimeList -join ' and ' }
}
function Get-RelativeMultiple {
   [CmdletBinding()]
   param (
      [Parameter(Mandatory = $true, Position = 0)]
      [Double]$ThisValue,
      [Parameter(Mandatory = $true, Position = 1)]
      [Double]$OtherValue
   )
   $e = [char]27
   if ($ThisValue -lt $OtherValue) {
      "{0:n2} times $e[38;2;0;255;0mfaster$e[0m" -f [math]::Abs($OtherValue / $ThisValue)
   }
   else {
      "{0:n2} times $e[38;2;255;0;0mslower$e[0m" -f [math]::Abs($ThisValue / $OtherValue)
   }
}

Printing results of the test:

$Width = 16
"{0,-$Width}: {1:hh\:mm\:ss\,fff}" -f 'Call operator', [TimeSpan]::FromMilliseconds($CallOperatorMilliseconds)
"{0,-$Width}: {1:hh\:mm\:ss\,fff}" -f 'Dot Sourcing', [TimeSpan]::FromMilliseconds($DotSourcingMilliseconds)
"{0,-$Width}: {1:hh\:mm\:ss\,fff}" -f 'Foreach loop', [TimeSpan]::FromMilliseconds($ForeachLoopMilliseconds)
"{0,-$Width}: {1:hh\:mm\:ss\,fff}" -f 'Foreach-Object', [TimeSpan]::FromMilliseconds($ForeachObjectMilliseconds)
like image 31
Darin Avatar answered Sep 05 '25 07:09

Darin