I have a growing collection of movie files. I wanted a text file for each movie file where the text file would contain the metadata for the movie file. As such this script searches the IMDB and writes the text file. The instructions are in the script.
(* This script writes the metadata for movie files to text files by searching the IMDB website. You get one text file for each movie file selected in the Finder *)
(* Please note that this script parses the html code from the IMDB website. This is a particularly difficult task because the format and therefore the html code of the website changes from time to time. As such this script may not always work. *)
--how to use the script--
-- 1. set the variable "movie_poster_folderPath" to the path where you want the text file to be saved
-- 2. select one or more movie files in the front Finder window and run this script
-- 3. the name of the movie is derived from the name of the selected file, the IMDB (internet movie database) is searched for the movie, from the search results the URL of the movie web page is derived, then the text of the movie web page is searched for its metadata. Once metadata is found a dialog box will be presented to you where you can verify that the metadata is correct. If it is not correct then you can enter new search terms in the dialog box and perform a new search. Once you decide that the metadata is correct press the "Write File" button and the text file is written to the path you have set in the script.
-- 4. In cases where the IMDB search does not return appropriate results then the script will try to use Safari to perform the search, so at times you may see Safari launch and some windows open and close in Safari during this process.
--a tip--
-- sometimes the script cannot accurately get the metadata from the IMDB search (maybe the file name isn't correct etc.), as you'll note when the meatadata is presented to you in the dialog box. If after a couple searches you cannot find the correct metadata then I found that the best solution is to manually search the IMDB website yourself. Once you find the proper web page for the movie then copy the movie number of that web page from the URL bar at the top of the Safari window. That movie number can then be inserted into a new search in the metadata dialog box and the proper metadata should be found.
-- get the selected files
tell application "Finder" to set theFiles to the selection
repeat with aFile in theFiles
-- get the title of the movie
set movie_path to aFile as Unicode text
set nmExt to my getName_andExtension(movie_path)
set movie_file_name to item 1 of nmExt
set movie_title to my stripYear(movie_file_name)
-- search IMDB and get movie metadata
repeat
with timeout of 3600 seconds -- ie. do not time out for at least an hour
-- setup the movie title into imdb search form ie. word1+word2+word3 etc
set search_title to my titleIMDB(movie_title)
try
-- perform the search and find only the top result
set search1Header to "http://www.imdb.com/find?s=tt&q="
try
set top_result to do shell script "curl " & quoted form of (search1Header & search_title) & " | grep -i \"popular titles\""
on error
set top_result to do shell script "curl " & quoted form of (search1Header & search_title) & " | grep -i \"exact matches\""
end try
-- obtain the movie number from the top result
set movie_number to my movieNum(top_result)
-- get the movie web page from imdb using the movie number
set search2Header to "http://www.imdb.com/title/"
set movie_page to do shell script "curl " & quoted form of (search2Header & movie_number & "/") without altering line endings
on error
-- sometimes when you search for a movie title, instead of presenting you with a list of movies to pick from the website jumps you directly to the movie's web page. My script errors in these cases so it will do it the hard way and use Safari.
try
tell application "Safari"
activate
open location (search1Header & search_title)
delay 1
my web_page_loading()
set thisurl to the URL of document 1
tell application "System Events" to tell process "safari"
keystroke "w" using command down
keystroke "h" using command down
end tell
end tell
set movie_page to do shell script "curl " & quoted form of thisurl without altering line endings
end try
end try
-- strip out pertinent info from web page
try
set movie_title to do shell script "echo " & quoted form of movie_page & " | grep -i \"<title>\""
set movie_title to parseTitle(movie_title)
set movie_title to my stripYear(movie_title) -- sometimes the release year is added to the name of the movie
on error
set movie_title to "missing value"
end try
try
set release_date to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"release date:\""
set release_date to my parseReleaseDate(release_date)
on error
try
set release_date to do shell script "echo " & quoted form of movie_page & " | grep -i \"Sections/Years\""
set release_date to my parseReleaseDate2(release_date)
on error
set release_date to "missing value"
end try
end try
try
set the_genre to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"genre:\""
set the_genre to my parseGenre(the_genre)
on error
set the_genre to "missing value"
end try
try
set user_rating to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"<b>user rating:</b>\""
set user_rating to my parseUserRating(user_rating)
on error
set user_rating to "missing value"
end try
try
try
set mpaa_rating to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"mpaa\""
set mpaa_rating to my parseMPAARating(mpaa_rating)
on error
set mpaa_rating to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"certification:\" | grep -i \"usa\""
set mpaa_rating to my parseCertificationRating(mpaa_rating)
end try
on error
set mpaa_rating to "missing value"
end try
try
try
set plot_outline to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"plot outline:\""
on error
try
set plot_outline to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"plot summary:\""
on error
set plot_outline to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"tagline:\""
end try
end try
set plot_outline to my parsePlotOutline(plot_outline)
on error
set plot_outline to "missing value"
end try
try
try
set the_cast to do shell script "echo " & quoted form of movie_page & " | grep -i \"Cast overview, first billed only\""
on error
set the_cast to do shell script "echo " & quoted form of movie_page & " | grep -i \"Credited cast\""
end try
set the_cast to my castMembers(the_cast, 4)
on error
set the_cast to "missing value"
end try
-- fix html code in decimal unicode format ie. special characters in the form of ç
set movie_title to my decHTML_to_string(movie_title)
set release_date to my decHTML_to_string(release_date)
set the_genre to my decHTML_to_string(the_genre)
set user_rating to my decHTML_to_string(user_rating)
set mpaa_rating to my decHTML_to_string(mpaa_rating)
set plot_outline to my decHTML_to_string(plot_outline)
set the_cast to my decHTML_to_string(the_cast)
-- compile the results into a list of records
set ann_records to {{ann_heading:"Full Name", ann_value:movie_title}, {ann_heading:"Copyright", ann_value:release_date}, {ann_heading:"Genre", ann_value:the_genre}, {ann_heading:"Warning", ann_value:user_rating}, {ann_heading:"Special Playback Requirements", ann_value:mpaa_rating}, {ann_heading:"Description", ann_value:plot_outline}, {ann_heading:"Performers", ann_value:the_cast}}
-- display the dialog box to choose the next action
set dialog_text to "Movie Title: " & movie_title & return & "Release Date: " & release_date & return & "Genre: " & the_genre & return & "User Rating: " & user_rating & return & "MPAA Rating: " & mpaa_rating & return & "Plot Outline: " & plot_outline & return & "The Cast: " & the_cast
display dialog dialog_text buttons {"Cancel", "New Search", "Write Text File"} default button 1
set buttonEntered to the button returned of result
if buttonEntered is "New Search" then
repeat
display dialog "Type in a new movie title to search." default answer (item 1 of nmExt) with icon note buttons {"Cancel", "OK"} default button "OK"
set {text_entered, button_pressed} to {text returned, button returned} of the result
if text_entered is not "" then
set movie_title to text_entered
exit repeat
end if
end repeat
else if buttonEntered is "Write Text File" then
set target_file to text_file_folderPath & movie_file_name & ".txt"
my writeTo(dialog_text, target_file, false, string)
exit repeat
end if
end timeout
end repeat
end repeat
(*====================== SUBROUTINES ==========================*)
on titleIMDB(movie_title)
set text item delimiters to "."
set search_title to text items of movie_title
set text item delimiters to space
set search_title to search_title as Unicode text
set text item delimiters to "_"
set search_title to text items of search_title
set text item delimiters to space
set search_title to search_title as Unicode text
set search_title to text items of search_title
set text item delimiters to "+"
set search_title to search_title as Unicode text
set text item delimiters to ""
return search_title
end titleIMDB
on movieNum(the_string)
set text item delimiters to "<a href=\"/title/"
set first_cut to text items of the_string
set part_result to item 2 of first_cut
set text item delimiters to "/"
set second_cut to text items of part_result
set text item delimiters to ""
set movie_number to item 1 of second_cut
return movie_number
end movieNum
on parseTitle(movie_title)
set text item delimiters to "<title>"
set a to text items of movie_title
set text item delimiters to ""
set movie_title to a as Unicode text
set text item delimiters to "</title>"
set a to text items of movie_title
set text item delimiters to ""
set movie_title to a as Unicode text
return movie_title
end parseTitle
on parseReleaseDate(release_date)
set text item delimiters to return
set a to text items of release_date
set text item delimiters to ""
set release_date to item 2 of a
return release_date
end parseReleaseDate
on parseReleaseDate2(release_date)
set text item delimiters to "</a>"
set a to text items of release_date
set text item delimiters to ""
set release_date to characters -4 thru -1 of (item 1 of a) as Unicode text
return release_date
end parseReleaseDate2
on parseGenre(the_genre)
set remove_strings to {return, " / ", space, "><"}
repeat with a_string in remove_strings
set text item delimiters to a_string
set a to text items of the_genre
set text item delimiters to ""
set the_genre to a as Unicode text
end repeat
set a to characters of the_genre
set the_count to count of a
set the_genre to {}
repeat with i from 1 to the_count
set i_char to item i of a
if i_char is ">" then
repeat with j from (i + 1) to the_count
set j_char to item j of a
if j_char is ":" or j_char is "=" then
copy j + 1 to i
exit repeat
end if
if j_char is "<" then
set end of the_genre to (items (i + 1) thru (j - 1) of a) as Unicode text
copy j + 1 to i
exit repeat
end if
end repeat
end if
end repeat
set text item delimiters to "," & space
set the_genre to the_genre as Unicode text
set text item delimiters to ""
return the_genre
end parseGenre
on parseUserRating(user_rating)
set text item delimiters to return --(ASCII character 10)
set a to text items of user_rating
set user_rating to item 2 of a
set remove_strings to {space, "<b>", "</b>"}
repeat with a_string in remove_strings
set text item delimiters to a_string
set a to text items of user_rating
set text item delimiters to ""
set user_rating to a as Unicode text
end repeat
return user_rating
end parseUserRating
on parseMPAARating(mpaa_rating)
set text item delimiters to return
set a to text items of mpaa_rating
set text item delimiters to ""
set mpaa_rating to item 2 of a
return mpaa_rating
end parseMPAARating
on parseCertificationRating(mpaa_rating)
set text item delimiters to "certificates=USA:"
set a to text items of mpaa_rating
set mpaa_rating to item 2 of a
set text item delimiters to "&&heading="
set a to text items of mpaa_rating
set text item delimiters to ""
set mpaa_rating to item 1 of a
set mpaa_rating to "USA-" & mpaa_rating
return mpaa_rating
end parseCertificationRating
on parsePlotOutline(plot_outline)
set text item delimiters to return
set a to text items of plot_outline
set text item delimiters to ""
set plot_outline to item 2 of a
set text item delimiters to "<a class="
set a to text items of plot_outline
set text item delimiters to ""
set plot_outline to item 1 of a
return plot_outline
end parsePlotOutline
on castMembers(the_cast, how_many)
set text item delimiters to "<td class=\"nm\">"
set a to text items of the_cast
set text item delimiters to ""
if how_many > ((count of a) - 1) then set how_many to ((count of a) - 1)
set cast_members to {}
repeat with i from 2 to (how_many + 1)
set end of cast_members to my castMember(item i of a)
end repeat
set text item delimiters to ", "
set cast_string to cast_members as Unicode text
set text item delimiters to ""
if cast_string contains "<a href=" then
set text item delimiters to "</a>"
set b to text items of cast_string
set text item delimiters to ""
set c to b as string
set text item delimiters to ""
set i to 0
set a to ""
repeat until i is (count of c)
set i to i + 1
if item i of c is not "<" then
set a to a & item i of c
else
repeat with j from (count of a) to (count of c)
if item j of c is ">" then
set i to j
exit repeat
end if
end repeat
end if
end repeat
set cast_string to a
end if
return cast_string
end castMembers
on castMember(the_string)
set c to characters of the_string
repeat with i from 1 to (count of c)
set i_char to item i of c
if i_char is ">" then
repeat with j from (i + 1) to (count of c)
set j_char to item j of c
if j_char is "<" then
set real_name to (items (i + 1) thru (j - 1) of c) as Unicode text
exit repeat
end if
end repeat
exit repeat
end if
end repeat
set text item delimiters to "<td class=\"char\">"
set d to text items of the_string
set e to item 2 of d
set text item delimiters to "</td>"
set d to text items of e
set text item delimiters to ""
set char_name to item 1 of d
set cast_member to real_name & " as " & char_name as Unicode text
return cast_member
end castMember
on stripYear(movie_title)
if movie_title contains "(" then
set x to offset of "(" in movie_title
if character (x + 5) of movie_title is ")" then
if length of movie_title > (x + 5) then
if character (x - 1) of movie_title is space then
set movie_title to (characters 1 thru (x - 2) of movie_title & characters (x + 6) thru -1 of movie_title) as Unicode text
else
set movie_title to (characters 1 thru (x - 1) of movie_title & characters (x + 6) thru -1 of movie_title) as Unicode text
end if
else
if character (x - 1) of movie_title is space then
set movie_title to (characters 1 thru (x - 2) of movie_title) as Unicode text
else
set movie_title to (characters 1 thru (x - 1) of movie_title) as Unicode text
end if
end if
end if
end if
return movie_title
end stripYear
on getName_andExtension(F)
set F to F as Unicode text
set {name:Nm, name extension:Ex} to info for file F
if Ex is missing value then set Ex to ""
if Ex is not "" then
set Nm to text 1 thru ((count Nm) - (count Ex) - 1) of Nm
set Ex to "." & Ex
end if
return {Nm, Ex}
end getName_andExtension
on web_page_loading()
set theDelay to 10 -- the time in seconds the script will wait to let a web page load
set numTries to 3 -- the number of stop/reload cycles before giving up
set my_delay to 0.25
set myCounter to 0
set finished to false
repeat until finished is true
set startTime to current date
set myCounter to myCounter + 1
set web_page_is_loaded to false
delay my_delay
tell application "Safari"
activate
repeat until web_page_is_loaded is true
-- check time and do this if 10 seconds hasn't elapsed
delay 1
if (startTime + theDelay) > (current date) then
if name of window 1 contains "Loading" then
delay my_delay
else if name of window 1 contains "Untitled" then -- failed
delay 2
if name of window 1 contains "Untitled" then
set web_page_is_loaded to true
set finished to true
set frontApp to getFrontApp() of frontAppLib
tell application frontApp to display dialog "The web page will not load!"
end if
else if name of window 1 contains "Failed to open page" then
tell application "System Events" to tell process "Safari"
keystroke "." using command down -- stop the page
delay my_delay
keystroke "r" using command down -- reload the page
end tell
delay my_delay
set web_page_is_loaded to true
else
delay my_delay * 6
return true
end if
else -- if 10 seconds has elapsed then do this
tell application "System Events" to tell process "Safari"
-- if we tried 3 times then give up
if myCounter is numTries then
keystroke "." using command down -- stop the page
return false
else -- try again because we didn't try 3 times yet
keystroke "." using command down -- stop the page
delay my_delay
keystroke "r" using command down -- reload the page
delay my_delay
set web_page_is_loaded to true
end if
end tell
end if
end repeat
end tell
end repeat
end web_page_loading
on decHTML_to_string(the_string)
set {TIDs, text item delimiters} to {text item delimiters, "&#"}
set b to text items of the_string
set text item delimiters to TIDs
set uniList to {item 1 of b}
repeat with i from 2 to (count of b)
set this_string to item i of b
set string_count to count of this_string
repeat with j from 1 to string_count
if item j of this_string is ";" or item j of this_string is "\\" then
set nDec to text 1 thru (j - 1) of this_string -- get the decimal value
set nHex to do shell script "perl -e 'printf(\"%04X\", " & nDec & ")'" -- convert decimal to hex
set uChar to run script "«data utxt" & nHex & "»" -- convert unicode hex to unicode character
if string_count > j then
set u_string to (uChar & (text (j + 1) thru string_count of this_string)) as Unicode text
else
set u_string to uChar
end if
set end of uniList to u_string
exit repeat
end if
end repeat
end repeat
return uniList as Unicode text
end decHTML_to_string
on writeTo(this_data, target_file, append_data, mode) -- append_data is true or false, mode is string etc. (no quotes around either)
try
set target_file to target_file as Unicode text
if target_file does not contain ":" then set target_file to POSIX file target_file as Unicode text
set the open_target_file to open for access file target_file with write permission
if append_data is false then set eof of the open_target_file to 0
write this_data to the open_target_file starting at eof as mode
close access the open_target_file
return true
on error
try
close access file target_file
end try
return false
end try
end writeTo