PowerShell : Finding Duplicate Files, Part 2 : A Resumeable Script

The main issue I had with my original script here was that with the sheer number of pictures we had it didn’t finish in a reasonable time.  What I needed was a way to allow the script to work and resume from an interrupt (like a reboot).  So I took the original script and whacked it with the ScriptHammer(TM) again.

Updated script after the jump with notes following!

[CmdletBinding()]
param
(
    [parameter(Mandatory=$True)]
    [string]$Path
)
if (!(Test-Path -PathType Container $Path))
{
    Write-Error "Invalid path specified."
    Exit
}
$ProcessedFile="d:\temp\Processed.txt"
$LogFile="d:\temp\comparisonresults.csv"
$Processed=@()
if (Test-Path $ProcessedFile)
{
    $Processed=Get-Content $ProcessedFile
}

Write-Verbose "Scanning Path : $Path"
$Files=gci -File -Recurse -path $Path | Select-Object -property FullName,Length
$Count=1
$MatchedSourceFiles=@()
if (Test-Path $LogFile)
{
    $MatchedSourceFiles=Import-csv $LogFile
}

$Directories=gci -Recurse -Directory -path $Path | ? {$Processed -notcontains $_.FullName}
$Directories+=get-item $Path
$TotalDirectories=$Directories.Count
Foreach ($Directory in $Directories)
{
    Write-Verbose "Directory : $($Directory)"
    $LocalFiles=gci -File -path $Directory.FullName | Select-Object -property FullName,Length
    Write-Progress -Activity "Processing Directories" -status "Processing Directory $Count / $TotalDirectories" -PercentComplete ($Count / $TotalDirectories * 100)
    ForEach ($SourceFile in $LocalFiles)
    {
        if (!(($MatchedSourceFiles.MatchingFiles | % {$_ -like "*" + $SourceFile.FullName+"*"}) -eq $True))
            {
            $MatchingFiles=@()
            Foreach ($TargetFile in $Files)
            {
                if (($SourceFile.FullName -ne $TargetFile.FullName))
                {
                    #Write-Verbose "Matching $($SourceFile.FullName) and $($TargetFile.FullName)"

                    if ($SourceFile.Length -eq $TargetFile.Length)
                    {
                        if ((c:\Windows\System32\fc.exe /A $SourceFile.FullName $TargetFile.FullName)  -contains "FC: no differences encountered")
                        {
                            Write-Verbose "Match found."
                            $MatchingFiles+=$TargetFile.FullName
                        }
                    }
                }
            }
            if ($MatchingFiles.Count -gt 0)
            {
                $NewObject=[pscustomobject][ordered]@{
                    File=$SourceFile.FullName
                    MatchingFiles=[string]$MatchingFiles
                }
                $MatchedSourceFiles+=$NewObject
            }
        }

    }
    $Count+=1
    Add-Content -Path $ProcessedFile $Directory.FullName
    $MatchedSourceFiles |Export-CSV $LogFile -NoTypeInformation
}
$MatchedSourceFiles

Most of the new work happens at the start;

$ProcessedFile="d:\temp\Processed.txt"
$LogFile="d:\temp\comparisonresults.csv"
$Processed=@()
if (Test-Path $ProcessedFile)
{
    $Processed=Get-Content $ProcessedFile 
}
if (Test-Path $LogFile)
{
    $MatchedSourceFiles=Import-csv $LogFile
}

$Directories=gci -Recurse -Directory -path $Path | ? {$Processed -notcontains $_.FullName}
$Directories+=get-item $Path

We need both a file with all the matches in it ($LogFile, which we update as we go) and a file that contains a list of all the folders already processed ($ProcessedFile).

If $ProcessedFile exists we load it (into $Processed) and then when the list of folders to scan is calculated we only include the folders that aren’t listed in $Processed.

We do a similar thing with $MatchedSourceFiles;  if the $LogFile csv exists we load it into $MatchedSourceFiles.

The last change is to make sure we add any folders we’ve finished processing to the $ProcessedFile;

Add-Content -Path $ProcessedFile $Directory.FullName

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: