yes. shell beats applescript, no doubt about that.
allow me to share the whole project and specs with you… i probably had to do this earlier, but i thought that with some pointer i could do it myself, and it turned out to be more complex than i’ve expected.
i have a web gallery that works for our intranet at the office. it has a lot of folders and subfolders ( wich are the categories for the phpwebgallery to read). only the one final category has files, and two folders ( pwg_high, and thumbnails ).
category 1
|— category 1.1
|---- file.jpg
|---- pwg_high
|— file.jpg
|---- thumbnails
|— tn-file.jpg
so, if i have 50.000 images, i really have 150.000 files in total. ( in catogory, there is a preview file, low resolution; in pwg_high, there is the high resolution file, and in thumbnails, the thumbnails…).
i want to make 2 scripts.
the first one, is the complex one, that has to check if in that structure is any duplicate. it will have to generate some kind of database for the md5s, and then check one by one, or see if in the database there is any line repeated, and identify to wich files those belong.
this i planned to do, restricting the folder and file listing to every file in every folder of name ‘pwg_high’, that way checking only 50 thousand of the total. i wouldn’t use the other two type of files, because they were generated before i came to work, and a repeated image may have one the thumbnail in .gif, and the other in .jpg, thus making the md5 of those thumbnails of the same image, different.
this might be slow, yes, but it would be only ran once, to check the current files. so i can spare the time. even hours…
the second one, is part of a larger script.
my currently working script, is a folder action.
i have a copy of the whole folder tree of the one that the phpwebgallery reads. in every category i have attached the folder action. when i add a file ( high resolution one ), it triggers imagemagick and generates a preview file and a thumbnail, and sends all three files to the proper folder in the other folder tree. it also checks if the name is already taken, and if it its, it adds a counter to the end, so that no replacing happens.
to ths folder action, i want to add that before processing the added file, it checks if its md5 is already in the previously made database, if it isn’t, the continue the process, if it is, the its a duplicate so move it to another folder ( any destination, i don’t care ).
so… those are the complete specs of my project.
i’m sorry for not sharing all this info earlier, maybe it would have saved us all time. but i thought i could do it myself and not to occupy your time. but this got beyond my capabilities a while ago… thanks for all your troubles.
Marto.