So I've got some 20 000 files that I want to compress and group by following logic:
- compress every file that have identical characters up to
(
- also include files that have no
(
So the files are like
file_123.foo
file_123(abc).foo
file_123(b9)(ca)[a1].foo
foobar(a).foo
foobar.foo
foobar(123).foo
which should be compressed to
file_123.7z
foobar.7z
I'm open to windows batch files, unix scripts or any compression program (I can work from there), though the most convenient combo would be .7z and windows.
UPDATE
cYrus gave me a perfect answer, the problem was my question wasn't precise enough :) Now that I'm smarter, here's the next set of problems I haven't figured out how to get around yet:
So everything works perfectly unless this happens:
file_123(abc).foo
file_123456789(b9).foo
Those two shouldn't be grouped, i.e., they should end up in two separate files:
file_123.7z
file_123456789.7z
This one:
for pfx in $(for i in *.foo; do echo "${i%%[.(]*}"; done | sort -u); do 7z a "$pfx.7z" $pfx*; done
creates those two separately, but the shorter file works as catch-all, i.e., file_123.7z
includes both files, which it shouldn't.
Answer
This should work:
for pfx in $(for i in *.foo; do echo "${i%%[.(]*}"; done | sort -u); do 7z a "$pfx.7z" $pfx[.\(]*; done
Explanation
First we have to iterate all over the input files (*.foo
) and strip away the suffix (${i%%[.(]*}
) obtaining:
file_123
file_123
file_123
foobar
foobar
foobar
Then we can remove duplicates with sort -u
:
file_123
foobar
Finally for each prefix ($pfx
) we can build the archive using the prefix itself as both the name of the archive ("$pfx.7z"
) and the pattern to identify the files ($pfx[.\(]*
); obtaining the equivalent of:
7z a file_123.7z 'file_123(abc).foo' 'file_123(b9)(ca)[a1].foo' 'file_123.foo'
7z a foobar.7z 'foobar(123).foo' 'foobar(a).foo' 'foobar.foo'
No comments:
Post a Comment