PowerShell : Finding Duplicate Files, Part 3 : A Resumeable Script Using Workflow

In the the previous part we looked at making the original script survive a restart without losing progress. There is actually a built-in PowerShell system which allows this functionality, the workflow. If you run a workflow as a job it allows you to pause, resume and restart the workflow so progress is saved.

The syntax is pretty straight-forward, but there are some strange rules about using workflows which makes it a little more tricky.

Here is the full script (as usual, notes will follow);

workflow Find-Duplicates
    if (!(Test-Path -PathType Container $Path))
        Write-Error "Invalid path specified."
    Write-Verbose "Scanning Path : $Path"
    $Files=gci -File -Recurse -path $Path | Select-Object -property FullName,Length
    ForEach ($SourceFile in $Files)
        Write-Verbose "Source : $($SourceFile.FullName)"
            $MatchingFiles=$Using:Files |Where-Object {$_.Length -eq $Using:SourceFile.Length}
            Foreach ($TargetFile in $MatchingFiles)
                Write-Verbose "Checking File $($TargetFile.FullName)"
                ($CurrentMatches | Select-Object -ExpandProperty File) | % {Write-Verbose "List1 : $_"}

                if (($Using:SourceFile.FullName -ne $TargetFile.FullName) -and !(($CurrentMatches |
                     Select-Object -ExpandProperty File) -contains $TargetFile.FullName))
                    Write-Verbose "Matched $($Using:SourceFile.FullName) and $($TargetFile.FullName)"
                    if ((c:\windows\system32\fc.exe /A $Using:SourceFile.FullName $TargetFile.FullName)  -contains "FC: no differences encountered")
                        Write-Verbose "Match found."
            if ($MatchedFiles.Count -gt 0)
                Write-Verbose "Found Matching Files.  Adding Object."

    $MatchedSourceFiles | Export-CSV $LogFile -NoTypeInformation

As you can see, workflows look very much like functions; they’re structured and called in much the same way.  Their ability to automatically save their state is the additional functionality we’re interested in, but workflows also allow you to run multiple tasks in parallel and refer to multiple machines as you go (amongst other things).  This means they have quite a few restrictions in how they’re written.  The ones that affect our script are highlighted below;

  • You can’t use subexpressions (“$test=$($value.name)“).
  • You can’t run methods of objects (“$test.update($true)“).
  • You can’t update properties of objects (“$test.name=’First’“).
  • You can use all the above in an InlineScript;  a block of code that runs as one ‘unit’.
  • Unfortunately by default InlineScripts cannot refer to variables defined outside their scope;  to refer to a variable in their parent (workflow) scope you need to use the Using modifier on a variable (“File=$Using:SourceFile.FullName“).

So while there are lots of new restrictions we can avoid most of them by running our existing script within an InlineScript.  This returns our working variable ($MatchedSourceFiles, which holds all the files with duplicates and a path to each of their clones) that is updated for each file we’re examining. The working variable is written to a CSV at the end (“$MatchedSourceFiles | Export-CSV $LogFile -NoTypeInformation“).

The only other major change was to add the “Using” keyword to variables within the InlineScript that need to refer to variables defined outside of it.

Last I’ve added a Checkpoint-Workflow command to the end of the main loop (examining each file in the folder).  This manually saves the state of the workflow so that if it’s restarted we don’t have to start again!

4 Replies to “PowerShell : Finding Duplicate Files, Part 3 : A Resumeable Script Using Workflow”

  1. I’ve tried your script. The first one works great where it writes the info to the screen but this one shows the files found in the first column but in the Matching Files column the output only shows “System.Object[]”.

    1. Hi! I’ve just tried re-running the script in the 3rd part and it seems to run ok. Here’s what I did;

      1) Loaded the script into PowerShell.
      2) Ran the workflow; Find-Duplicates ‘D:\temp\New folder’
      3) And this is the output in c:\temp\results.csv (there were some extra columns too);

      D:\temp\New folder\IMG_0161.JPG D:\temp\New folder\IMG_0162.JPG

      Things I can suggest;

      Start with no results file; maybe it’s appending incorrectly?
      I’m using PS 5 though I wrote this on PS 3.

      What your seeing is normally when it tries to write a complex object to a plain text field, rather than each field from the object to its own text field.


  2. Thank you for posting this. It is very helpful. However, I’m having the same problem as the Anonymous comment above. Everything displays correctly on the screen, and all of my variables contain exactly what I expect them to contain, but in the CSV file, instead of the Matched Files I only get “SystemObject[]”. No matter what I try, I can’t figure out how to export the list of matched files. So right now I’m able to find all my duplicates, but I’m not able to get the data in a usable format. Can you help me?

    I am using PS 5.

    1. Hi. I’ve just re-run it, and it works fine when I do so not sure what’s going wrong. Looking at the error you’re getting though, it might be trying to write an array of matching files instead of a string. So in the script I added a bit of code to force the MatchingFiles attribute to hold $MatchedFiles.ToString().

      It might work 🙂

Leave a Reply to Scotto Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: