Tuesday, January 3, 2017

batch - How to compress multiple files with similar names?


So I've got some 20 000 files that I want to compress and group by following logic:



  • compress every file that have identical characters up to (

  • also include files that have no (


So the files are like


file_123.foo
file_123(abc).foo
file_123(b9)(ca)[a1].foo
foobar(a).foo
foobar.foo
foobar(123).foo

which should be compressed to


file_123.7z
foobar.7z

I'm open to windows batch files, unix scripts or any compression program (I can work from there), though the most convenient combo would be .7z and windows.


UPDATE


cYrus gave me a perfect answer, the problem was my question wasn't precise enough :) Now that I'm smarter, here's the next set of problems I haven't figured out how to get around yet:


So everything works perfectly unless this happens:


file_123(abc).foo
file_123456789(b9).foo

Those two shouldn't be grouped, i.e., they should end up in two separate files:


file_123.7z
file_123456789.7z

This one:


for pfx in $(for i in *.foo; do echo "${i%%[.(]*}"; done | sort -u); do 7z a "$pfx.7z" $pfx*; done

creates those two separately, but the shorter file works as catch-all, i.e., file_123.7z includes both files, which it shouldn't.


Answer



This should work:


for pfx in $(for i in *.foo; do echo "${i%%[.(]*}"; done | sort -u); do 7z a "$pfx.7z" $pfx[.\(]*; done

Explanation


First we have to iterate all over the input files (*.foo) and strip away the suffix (${i%%[.(]*}) obtaining:


file_123
file_123
file_123
foobar
foobar
foobar

Then we can remove duplicates with sort -u:


file_123
foobar

Finally for each prefix ($pfx) we can build the archive using the prefix itself as both the name of the archive ("$pfx.7z") and the pattern to identify the files ($pfx[.\(]*); obtaining the equivalent of:


7z a file_123.7z 'file_123(abc).foo' 'file_123(b9)(ca)[a1].foo' 'file_123.foo'
7z a foobar.7z 'foobar(123).foo' 'foobar(a).foo' 'foobar.foo'

No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...