Using a txt as database - why doesn't it work?

yes. shell beats applescript, no doubt about that.
allow me to share the whole project and specs with you… i probably had to do this earlier, but i thought that with some pointer i could do it myself, and it turned out to be more complex than i’ve expected.
i have a web gallery that works for our intranet at the office. it has a lot of folders and subfolders ( wich are the categories for the phpwebgallery to read). only the one final category has files, and two folders ( pwg_high, and thumbnails ).

category 1
|— category 1.1
|---- file.jpg
|---- pwg_high
|— file.jpg
|---- thumbnails
|— tn-file.jpg

so, if i have 50.000 images, i really have 150.000 files in total. ( in catogory, there is a preview file, low resolution; in pwg_high, there is the high resolution file, and in thumbnails, the thumbnails…).

i want to make 2 scripts.
the first one, is the complex one, that has to check if in that structure is any duplicate. it will have to generate some kind of database for the md5s, and then check one by one, or see if in the database there is any line repeated, and identify to wich files those belong.
this i planned to do, restricting the folder and file listing to every file in every folder of name ‘pwg_high’, that way checking only 50 thousand of the total. i wouldn’t use the other two type of files, because they were generated before i came to work, and a repeated image may have one the thumbnail in .gif, and the other in .jpg, thus making the md5 of those thumbnails of the same image, different.
this might be slow, yes, but it would be only ran once, to check the current files. so i can spare the time. even hours…

the second one, is part of a larger script.
my currently working script, is a folder action.
i have a copy of the whole folder tree of the one that the phpwebgallery reads. in every category i have attached the folder action. when i add a file ( high resolution one ), it triggers imagemagick and generates a preview file and a thumbnail, and sends all three files to the proper folder in the other folder tree. it also checks if the name is already taken, and if it its, it adds a counter to the end, so that no replacing happens.
to ths folder action, i want to add that before processing the added file, it checks if its md5 is already in the previously made database, if it isn’t, the continue the process, if it is, the its a duplicate so move it to another folder ( any destination, i don’t care ).

so… those are the complete specs of my project.
i’m sorry for not sharing all this info earlier, maybe it would have saved us all time. but i thought i could do it myself and not to occupy your time. but this got beyond my capabilities a while ago… thanks for all your troubles.

Marto.

Marto tell me a bit more about your folder structure is something like this ever possible? (my questions in bold)

category 1
|— category 1.1
|---- file.jpg
|---- pwg_high
|— file.jpg
|— Some folder
|— pwg_high
|— file1.jpg

|---- thumbnails
|— tn-file.jpg
|— pwg_file
|— file.jpg

Or will the pwg_high folder always be two levels deep from your “category 1”

this are the rules for the structure:

¢ Each category can contain as many sub-categories as you wish.
¢ A category never contains elements and sub-categories at the same time.
¢ ‘pwg_high’ and ‘thumbnail’ folders are not considered categories and are always together.

so… for example, it goes visually like this.

gallery
|— category 1 ------- here will the previews be
| |---- file.jpg
| |---- pwg_high ------- here will the high resolution files be
| |— file.jpg
| |---- thumbnails ------- here will the thumbnails be
| |— tn-file.jpg
|— category 2
|— category 2.1
|— category 2.2
|— file.jpg
|---- pwg_high
|— file.jpg
|---- thumbnails
|— tn-file.jpg

again, the only comparable files are the high resolution ones. this is because if, for example, aFile and bFile are the same file, just renamed, but aFile has its preview and thumbnail in .gif, and bFile has its preview and thumbnail in .jpg; comparing any other file but the high resolution one, will return that they are no duplicates. so the only filter i see posible, is that it has to compare only the files in every ‘pwg_high’ folder.

thanks.

Marto.

Not a problem I figured a way around it anyways brb.

can’t wait to see… :smiley:

Okay here goes… to ease my testing I made it in two scripts though it could easily be made into one.

Script 1: This script creates your signature file (database).
It work by searching for every folder named “pwg_high” located anywhere in the chosen folder. It then generates a md5 has on all user specified file types (currently .jpg) within those found folders.

--Variables--
set fileName to "*.jpg"
--/Variables--

set tFiles to choose folder with prompt "Generate list of MD5 checksums for these files:" without invisibles
set whereFile to choose folder with prompt "Choose the container of the signature file:"
set nameFile to text returned of (display dialog "Please name the output file:" default answer "")
set sigFilePath to (whereFile as text) & nameFile

getMDs(tFiles, sigFilePath, fileName)

to getMDs(aFolder, sigFilePath, fileName)
	do shell script "find " & quoted form of POSIX path of aFolder & " -name pwg_high -type d -exec find {} -name " & quoted form of fileName & " \\; | xargs md5 >> " & quoted form of POSIX path of sigFilePath
end getMDs

Script 2: Checks for Duplicates
When you run this script and point it at a file and log file it will check the log file for a duplicate hash. In the event of a duplicate(s) it will write the paths of the source file and duplicate file to a log on your desktop.


set ckFile to choose file with prompt "Check signature log for duplicate MD5 checksum of this file:" without invisibles
set sigFile to choose file with prompt "Select your MD5 signature file:" without invisibles
set deskPath to path to desktop as Unicode text
set dupLog to deskPath & "Duplicate_Log.txt"

try
	set matchLine to do shell script "/sbin/md5 -q " & quoted form of POSIX path of ckFile & " | /usr/bin/grep -f - " & quoted form of POSIX path of sigFile
	set fileref to (open for access file dupLog with write permission)
	write "Check file " & quoted form of POSIX path of ckFile & " matches these entries:" & return & return & matchLine & return to fileref starting at eof
	close access fileref
	display dialog "Please check the Duplicate Log for more information" with title "Duplicate(s) found"
on error
	display dialog "Duplicate **not** found"
end try

Let me know how these go. I tested them briefly and they seemed to work in a variety of situations :smiley:

i’ve tried the first one… and it returns nothing.
i’ve tested the shell on its own in the Terminal, and it returns this:

-bash: syntax error near unexpected token `|’

so, i’ve never tested the second one. :frowning:

Looks like word wrap is the problem

do shell script "find " & quoted form of POSIX path of aFolder & " -name pwg_high -type d -exec find {} -name " & quoted form of fileName & " \; |

and

xargs md5 >> " & quoted form of POSIX path of sigFilePath

are all one line!

here ya go with a continuous line break

--Variables--
set fileName to "*.jpg"
--/Variables--

set tFiles to choose folder with prompt "Generate list of MD5 checksums for these files:" without invisibles
set whereFile to choose folder with prompt "Choose the container of the signature file:"
set nameFile to text returned of (display dialog "Please name the output file:" default answer "")
set sigFilePath to (whereFile as text) & nameFile

getMDs(tFiles, sigFilePath, fileName)

to getMDs(aFolder, sigFilePath, fileName)
	do shell script "find " & quoted form of POSIX path of aFolder & " -name pwg_high -type d -exec find {} -name " & ¬
		quoted form of fileName & " \\; | xargs md5 >> " & quoted form of POSIX path of sigFilePath
end getMDs

sorry but no. it doesn’t work for me…
it doesn’t generates the database, and in the scrpteditor windows, it displays “” as a result.
i’ve tested the whole line, and it returns the same error.

isn’t ‘&’ the logical operand that should go?

::ponders::

Um I’m not sure whats going on… it’s working for me… Are you using Script Editor or Script Debugger? not that it should matter though.

Hmm well yeah I did it in Script Editor and yes my result is “” but it is in fact creating the file.

i’m using:
Tiger 10.4.8
AppleScript 1.10.7
Script Editor 2.1.1 (81)

the thing is that i copy the shell script to the Terminal, exactly the way that the script would type it, and it return that error… i’ve also tried with ||, && and &, wich i think are also valid operands for the terminal… but it returns the same.

honestly, i don’t know where i’m going wrong…
thanks!

Do me a favor, put here the terminal command your trying with full paths in please… let me if your syntax is somehow being messed up.

this is exactly whats in my Terminal.

Marto:~ Marto$ find /Users/Marto/Desktop/testfiles/ -name pwg_high -type d -exec find {} -name *.tiff \; & xargs md5 >> /Users/Marto/Desktop/signature.txt

/testfiles/ contains folders in wich are folders named pwg_high with a lot of *.tiff files
and in Desktop i’ve tested creating a file named signature.txt before the shell, and not to.

always the same result.

well we do have a problem with that… when moving to the terminal we have dont need an escape character as well as you cant replace the pipe… try this terminal command

find /Users/Marto/Desktop/testfiles/ -name pwg_high -type d -exec find {} -name “*.tiff” ; | xargs md5 >> /Users/Marto/Desktop/signature.txt

well it didn’t return the error, it completed succesfully in the Terminal, but my signature.txt is empty…

sorry! my test files didn’t have the extension on their names… now it worked just fine.

jeebus whats going on here LOL

well okay… give me example of your file structure you have in that test folder… maybe something I didn’t consider is throwing it off. I’ll recreate the same structure and try it on my end.

LOL ok no we are getting somewhere… try the applescript now and see how we go! :slight_smile:

that brings my to my next question…

why is the extension needed in the shell… because there could be files wich do not have the extension in my gallery… and those will not be registered in the database.

( i also ran the script, works excelent! THANKS!!! )

well you can have a jpg named file or file.jpg and it’s still a jpg. If we want to search though we need to know a search term, “*.tiff” in our example.

If you wanted we could just generate a hash for every file thats inside a pwg_high folder though regardless of it’s type. let me know.