The main issue I had with my original script here was that with the sheer number of pictures we had it didn’t finish in a reasonable time. What I needed was a way to allow the script to work and resume from an interrupt (like a reboot). So I took the original script and whacked it with the ScriptHammer(TM) again.
Updated script after the jump with notes following!
[CmdletBinding()]
param
(
[parameter(Mandatory=$True)]
[string]$Path
)
if (!(Test-Path -PathType Container $Path))
{
Write-Error "Invalid path specified."
Exit
}
$ProcessedFile="d:\temp\Processed.txt"
$LogFile="d:\temp\comparisonresults.csv"
$Processed=@()
if (Test-Path $ProcessedFile)
{
$Processed=Get-Content $ProcessedFile
}
Write-Verbose "Scanning Path : $Path"
$Files=gci -File -Recurse -path $Path | Select-Object -property FullName,Length
$Count=1
$MatchedSourceFiles=@()
if (Test-Path $LogFile)
{
$MatchedSourceFiles=Import-csv $LogFile
}
$Directories=gci -Recurse -Directory -path $Path | ? {$Processed -notcontains $_.FullName}
$Directories+=get-item $Path
$TotalDirectories=$Directories.Count
Foreach ($Directory in $Directories)
{
Write-Verbose "Directory : $($Directory)"
$LocalFiles=gci -File -path $Directory.FullName | Select-Object -property FullName,Length
Write-Progress -Activity "Processing Directories" -status "Processing Directory $Count / $TotalDirectories" -PercentComplete ($Count / $TotalDirectories * 100)
ForEach ($SourceFile in $LocalFiles)
{
if (!(($MatchedSourceFiles.MatchingFiles | % {$_ -like "*" + $SourceFile.FullName+"*"}) -eq $True))
{
$MatchingFiles=@()
Foreach ($TargetFile in $Files)
{
if (($SourceFile.FullName -ne $TargetFile.FullName))
{
#Write-Verbose "Matching $($SourceFile.FullName) and $($TargetFile.FullName)"
if ($SourceFile.Length -eq $TargetFile.Length)
{
if ((c:\Windows\System32\fc.exe /A $SourceFile.FullName $TargetFile.FullName) -contains "FC: no differences encountered")
{
Write-Verbose "Match found."
$MatchingFiles+=$TargetFile.FullName
}
}
}
}
if ($MatchingFiles.Count -gt 0)
{
$NewObject=[pscustomobject][ordered]@{
File=$SourceFile.FullName
MatchingFiles=[string]$MatchingFiles
}
$MatchedSourceFiles+=$NewObject
}
}
}
$Count+=1
Add-Content -Path $ProcessedFile $Directory.FullName
$MatchedSourceFiles |Export-CSV $LogFile -NoTypeInformation
}
$MatchedSourceFiles
Most of the new work happens at the start;
$ProcessedFile="d:\temp\Processed.txt"
$LogFile="d:\temp\comparisonresults.csv"
$Processed=@()
if (Test-Path $ProcessedFile)
{
$Processed=Get-Content $ProcessedFile
}
if (Test-Path $LogFile)
{
$MatchedSourceFiles=Import-csv $LogFile
}
$Directories=gci -Recurse -Directory -path $Path | ? {$Processed -notcontains $_.FullName}
$Directories+=get-item $Path
We need both a file with all the matches in it ($LogFile, which we update as we go) and a file that contains a list of all the folders already processed ($ProcessedFile).
If $ProcessedFile exists we load it (into $Processed) and then when the list of folders to scan is calculated we only include the folders that aren’t listed in $Processed.
We do a similar thing with $MatchedSourceFiles; if the $LogFile csv exists we load it into $MatchedSourceFiles.
The last change is to make sure we add any folders we’ve finished processing to the $ProcessedFile;
Add-Content -Path $ProcessedFile $Directory.FullName