Monday, February 4, 2019

windows - Deleting all but latest revision of a file



So I have these massive lists of drawings that I've done for work and I'd like to be able to dump them all into one folder and run a batch that would delete all of the older revs and leave the highest rev there. I'm not even sure this is possible without some seriously deep programming, so I thought I would ask here.




Example file names:




  • 01-XY-001-Rev-0_1-6-2014.pdf

  • 01-XY-001-Rev-2_1-13-2014.pdf

  • 01-XY-001-Rev-9_2-1-2014.pdf

  • 01-XY-001-Rev-11_2-4-2014.pdf

  • 01-XY-002-Rev-0_1-7-2014.pdf

  • 01-XY-002-Rev-4_1-13-2014.pdf


  • 01-XY-002-Rev-7_1-26-2014.pdf

  • 01-XY-002-Rev-11_2-4-2014.pdf

  • 01-XXX-001-Rev-0_1-13-2014.pdf

  • 01-XXX-001-Rev-4_1-21-2014.pdf

  • 01-XXX-001-Rev-6_2-1-2014.pdf

  • 01-XXX-001-Rev-10_2-4-2014.pdf



in the end, I want it to look like:





  • 01-XY-001-Rev-11_2-4-2014.pdf

  • 01-XY-002-Rev-11_2-4-2014.pdf

  • 01-XXX-001-Rev-10_2-4-2014.pdf



so on and so forth. Is this possible, keeping in mind that there are hundreds of these files with different names? The only think that is consistent is the Rev-1, Rev-2, Rev-3, etc. the rest changes as seen above, based on the drawing. I don't really see this as possible, but I'm willing to ask anyways.


Answer



We're not a script writing service, but I had some time and interest so here ya go, in a PowerShell script:




#Set directory to search (. = current directory).
$dir = "."

#Get a list of all the files (only), sorted with newest on top.
$dirFiles = Get-ChildItem -Path $dir | where { ! $_.PSIsContainer } | Sort-Object LastAccessTime -Descending

#Create an array to hold unique file name parts.
$uniqueFileNameParts = @()

#Create an array to hold final file list of files to keep.

$filesToKeep = @()

#Add the file name of the script itself to the files to keep, to prevent it from being deleted if it's in the same folder you're trying to clean.
$filesToKeep += $MyInvocation.MyCommand.Name

#Loop through all the files in the directory list.
foreach ($file in $dirFiles) {
#If it contains "-Rev-" pull the first part of the file name (up to and including "-Rev-").
$filenameTokenLocation = $file.name.IndexOf("-Rev-")
if ($filenameTokenLocation -ge 0) {

$endOfString = $filenameTokenLocation + 5
$subString = $file.name.Substring(0,$endOfString)

#If the file name part doesn't already exist in the array, add it to it.
if ($uniqueFileNameParts -notcontains $subString) {
$uniqueFileNameParts += $subString
}
}
}


#Loop through all the file name parts.
foreach ($fileName in $uniqueFileNameParts) {
#Create a list of all files starting with that file name part, select the one file with the newest "LastWriteTime" attribute, and assign it to $latest.
$latest = Get-ChildItem -Path $dir | where { ! $_.PSIsContainer } | where { $_.name.StartsWith($fileName) } | Sort-Object LastAccessTime -Descending | Select-Object -First 1
#Add that file to the list of files to keep.
$filesToKeep += $latest.name
}

#Get all files in the folder that are not in the list of files to keep, and remove them.
Get-ChildItem -exclude ($filesToKeep) | where { ! $_.PSIsContainer } | Remove-Item



Notes:




  • It uses the Last Write Time of the file to determine which is the "latest", the time/date stamps in the file names themselves are not considered.

  • It's case sensitive, so a file named XYZ.txt is not necessarily the same as one named xYz.TxT

  • It isn't recursive, it only checks the folder/directory you aim it at, ignoring sub-folders.

  • It dangerous as all get-out, so make a backup of the folder before trying it. :)




Hope that helps!


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...