Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powershell - Faster way of getting rows from a csv file and outputting to a different csv file

Tags:

powershell

I have a csv file that contains approx 250,000 rows. The first of the headers on this csv file is "Campaign Ref Code". I also have a list of campaign reference codes that I need to look up in this first column (ie COLMABQ140, COLMABQ141). If I find this campaign ref code in the first column I want to output the information for this campaign code (ie address, date contacted etc) into a separate csv file. I have a number of these codes and need to produce a number of separate files specific to each separate campaign code. I have actually managed to get a working script for this BUT it takes over an hour to run and I'd like to find a quicker way of processing this file. Basically, I create an array of the campaign codes then loop through them using Import-csv & exporting to a separate file using Export-csv if I find a match (see a snippet of the code below - there are 20 different $refs arrays & corresponding for loops). Like I say, it does work perfectly well in as much as it gets the information I need in a format I specified but there has to be a quicker way of doing things...even if the original file is a quarter of a million rows!!

$file = 'Orig.csv'
$newfile = "File1.$today.csv"
        
$refs = @('COLMABQ140','COLMABQ141','COLMABQ142','COLMABQ143','COLMABQ144','COLMABQ176','COLMABQ177','COLMABQ178','COLMABQ179','COLMABQ180')

    foreach ($ref in $refs) {

        Import-csv $file | Where-Object {$_.'Campaign Ref Code' -like "$ref"} | Export-CSV -notype -Path $workdir\$newfile -Append 
    
    }
        
$newfile = "File2.$today.csv"
        
$refs = @('COLMABP140','COLMABP141','COLMABP142','COLMABP143','COLMABP144','COLMABP176','COLMABP177','COLMABP178','COLMABP179','COLMABP180')

    foreach ($ref in $refs) {

        Import-csv $file | Where-Object {$_.'Campaign Ref Code' -like "$ref"} | Export-CSV -notype -Path $workdir\$newfile -Append 
    
    }
        
$newfile = "File3.$today.csv"
        
$refs = @('COLMABS140','COLMABS141','COLMABS142','COLMABS143','COLMABS144','COLMABS176','COLMABS177','COLMABS178','COLMABS179','COLMABS180')

    foreach ($ref in $refs) {

        Import-csv $file | Where-Object {$_.'Campaign Ref Code' -like "$ref"} | Export-CSV -notype -Path $workdir\$newfile -Append 
    
    }
            
like image 700
DarrenB Avatar asked Oct 23 '25 14:10

DarrenB


1 Answers

The performance issues with your script are:

  • You're iterating multiple times through the Orig.csv input file.
    It is faster to create a hashtable with your files and the specific $refs values to test for.
  • You're reopening multiple times the output file to append to it.
    It is faster to use (3) stepablepipelines for this, see: What good does a SteppablePipeline.
    Note that a correctly setup PowerShell pipeline had a low memory usage.
  • The -eq operator is probably slightly faster than the -like operator
    (apparently you don't need the -like operator as you don't have any wildcards in the values.)
    You might simply depend on the common comparison operator feature:

When the input of an operator is a scalar value, the operator returns a Boolean value. When the input is a collection, the operator returns the elements of the collection that match the right-hand value of the expression. If there are no matches in the collection, comparison operators return an empty array.

$workdir = ...
$refs =@{
    "File1.csv" = 'COLMABQ140','COLMABQ141','COLMABQ142','COLMABQ143','COLMABQ144','COLMABQ176','COLMABQ177','COLMABQ178','COLMABQ179','COLMABQ180'
    "File2.csv" = 'COLMABP140','COLMABP141','COLMABP142','COLMABP143','COLMABP144','COLMABP176','COLMABP177','COLMABP178','COLMABP179','COLMABP180'
    "File3.csv" = 'COLMABS140','COLMABS141','COLMABS142','COLMABS143','COLMABS144','COLMABS176','COLMABS177','COLMABS178','COLMABS179','COLMABS180'
}

$Pipelines = @{}
Import-csv .\Orig.csv |ForEach-Object -Begin {
    foreach ($file in $refs.keys) {
        $Pipelines[$file] = { Export-CSV -notype -Path "$workdir\$file" }.GetSteppablePipeline()
        $Pipelines[$file].Begin($True)
    }
} -Process {
    foreach ($file in $refs.keys) {
        if ($refs[$file] -eq $_.'Campaign Ref Code') { $Pipelines[$file].Process($_) }
    }
} -End {
    foreach ($file in $refs.keys) {
        $Pipelines[$file].End()
    }
}

Mastering the (steppable) pipeline
The power of the PowerShell Pipeline is often misunderstood and underestimated (especially by experienced programmers of languages that don't have a similar feature). Therefore, I have written a PowerShell community blogpost: Mastering the (steppable) pipeline.

like image 167
iRon Avatar answered Oct 26 '25 07:10

iRon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!