Editing a string containing a word that can be converted to a number.

estockly · September 4, 2022, 8:40pm

Can’t do anything without seeing samples of your data, and your desired output.

DJUNQUERA · September 4, 2022, 9:04pm

Data samples, and desired output in Response #3:

Possible situations:

Data samples —> desired output

Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6) —> 2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)
(The first word is not the year of production.)
20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0) —> 1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)
(The first word is a number, but it does not fit the range of the film’s existence; we put the year of production at the beginning of the name).
1945 The Stranger [Orson Welles] 1945 (7,4) —> 1945 The Stranger [Orson Welles] 1945 (7,4)
(The first word is a number, the year of production, which does fit within the range of the film’s existence. The name is not changed to avoid duplication of the year at the beginning of the file name)
1941 [Steven Spielberg] 1979 (5,5) —> 1941 [Steven Spielberg] 1979 (5,5) (!!!)
(Singular case that would not have a solution since the title is compatible with a year compatible with the existence of the cinema).

technomorph · September 5, 2022, 6:07am

You’ll want to use a RegEx capturing
(\d\d\d\d)

Then you need to add conditions to it
IE it must follow “] “ and be followed by a space.
]\s(\d\d\d\d)\s

I see you have some where it’s in twice like at the beginning.
This won’t capture that.
So as far as cleaning that out it’s gonna take some more work
^(\d\d\d\d)\s
Will capture if it’s at the start of the string.
You may need to add further quantifiers to try to establish
That it’s not a movie title with 4 digits.
Which seems like if it is followed by a “[“ it is
^(\d\d\d\d)\s\w
Will make sure that it’s followed by a space and a letter

I can give AppleScript examples later.
RegExs are powerful.
You’ll wanna look at what your analyzing and
Try to find key text that is constant to help you

technomorph · September 5, 2022, 6:14am

Also will have and text where the year is at the head
And not at the tail and needs to be added?

IE
1945 The Stranger [Orson Welles]

Mockman · September 5, 2022, 7:55am

Based upon the four examples I would take this approach. I saved the examples as paragraphs within a text file. If your actual data is inconsistent then the results may vary — especially the ‘]’.

The script reads the paragraphs into a list. Working through each list item, it then uses the closing ‘]’ to isolate the release year. It checks to see if the record begins with the release year (only true for the Stranger and if missing, will prepend the release year to the record. When complete, it returns the resulting records as paragraphs.

Hope this is what you’re looking for.

set filmList to paragraphs of (read (choose file) as «class utf8»)
-- {"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", "20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", "1945 The Stranger  [Orson Welles] 1945 (7,4)  ", "1941 [Steven Spielberg] 1979 (5,5)"}

set nList to {}
set AppleScript's text item delimiters to "]"
repeat with filmString in filmList
	set split2 to last text item of filmString
	set rYear to word 1 of split2 --> release year
	
        -- does filmString begin with release year
	set w1 to word 1 of first text item of filmString
	if w1 is not equal to rYear then
		set end of nList to rYear & space & filmString
	else
		set end of nList to contents of filmString
	end if
	
end repeat
set AppleScript's text item delimiters to linefeed
set newText to nList as text

(*
"2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)
1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)
1945 The Stranger  [Orson Welles] 1945 (7,4)  
1979 1941 [Steven Spielberg] 1979 (5,5)"
*)

You might wish to clear out extraneous spaces beforehand — I counted two.

DJUNQUERA · September 5, 2022, 1:22pm

Hello, technomorph.

First of all, I would like to express my gratitude for your willingness to help.

However, the alternative you propose does not seem suitable for my rudimentary knowledge of AppleScript.

Sincere thanks again.

DJUNQUERA · September 5, 2022, 1:25pm

Hello, Mockman

Thank you very much for your change of approach, especially in finding an alternative to place the year at the beginning of the file name valid in any of the above situations.

Although you approach the selection by “chose file” instead of a previously made file selection treated as a list of aliases, it does not change the approach much.

I have put in the examples a simple format in the film name for clarity, but the format of the film consists of some fixed fields (those shown in the examples) and other variables (original title; icons representing nationality and genres; audio: original version / original version subtitled / dual; and, finally, if it has won awards, especially in the Oscars awards)

Year Title (Original Title) [Director]Icons for nationality and genres, audio (OV/OVS/Dual), Year, (score), Awards
Example:

— > 2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations

For this reason it is not possible to use the closing bracket as a reference to locate the year. Using a similar strategy (the first pair of parentheses) I get the original title or also the nationality (the icon immediately after the closing bracket).
We will have to think of another strategy to get the year of production.

However, once the string corresponding to the year is obtained, the way to solve the different options that can be found in the front portion of the file name,


 does filmString begin with release year
   set w1 to word 1 of first text item of filmString
   if w1 is not equal to rYear then…

is simple, powerful, avoiding complicated filters referring to date ranges and other unnecessary details. I think it is simply genial.

Thank you very much for your valuable help.

DJUNQUERA · September 5, 2022, 1:45pm

Hello, stockly.

Thank you very much for your collection of handlers.

I am reading carefully the code of one of them
(findAndFixNumbers(titleToFix))
and I think some modifications could be made to get the numeric string representing the year.

The iteration on the characters of the name of the selected files, separating the numeric ones from the non-numeric ones seems to me a strategy to consider to obtain later the year and use it to make the year range filters or, even better, use it to use the alternative proposed by Mockman.


   set w1 to word 1 of first text item of filmString
   if w1 is not equal to rYear then
       set end of nList to rYear & space & filmString
   else
       set end of nList to contents of filmString
   end if

Thank you very much for your valuable help.

StefanK · September 5, 2022, 2:53pm

This is a Regular Expression solution with help of the Foundation Framework.

The regex pattern “\]\s(\d{4})” searches for a closing bracket followed by a whitespace character and 4 digits and captures the year information.

The result of the operation is in the variable mappedFilmList


use AppleScript version "2.5"
use framework "Foundation"
use scripting additions

set filmList to {"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", "20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", "1945 The Stranger [Orson Welles] 1945 (7,4) ", "1941 [Steven Spielberg] 1979 (5,5)"}

set regexPattern to "\\]\\s(\\d{4})"
set regex to my (NSRegularExpression's regularExpressionWithPattern:regexPattern options:0 |error|:(missing value))
set mappedFilmList to {}
repeat with aFilm in filmList
	set firstMatch to (regex's firstMatchInString:aFilm options:0 range:{0, (count aFilm)})
	set extractedRange to (firstMatch's rangeAtIndex:1)
	set yearLocation to extractedRange's location() as integer
	
	set extractedText to text (yearLocation + 1) thru (yearLocation + 4) of aFilm
	if contents of aFilm begins with extractedText then
		set end of mappedFilmList to contents of aFilm
	else
		set end of mappedFilmList to extractedText & space & contents of aFilm
	end if
end repeat

The fourth example behaves like the first two.

Edit:

To get the result in your example you have to capture also the first year representation, if present.

use AppleScript version "2.5"
use framework "Foundation"
use scripting additions

set filmList to {"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", "20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", "1945 The Stranger [Orson Welles] 1945 (7,4) ", "1941 [Steven Spielberg] 1979 (5,5)"}

set regexPattern to "(\\d{4})?[^]]+\\]\\s(\\d{4})"
set regex to my (NSRegularExpression's regularExpressionWithPattern:regexPattern options:0 |error|:(missing value))
set mappedFilmList to {}
repeat with aFilm in filmList
	set firstMatch to (regex's firstMatchInString:aFilm options:0 range:{0, (count aFilm)})
	set extractedPrefix to (firstMatch's rangeAtIndex:1)
	set hasYearPrefix to extractedPrefix's |length| = 4
	set extractedRange to (firstMatch's rangeAtIndex:2)
	set yearLocation to extractedRange's location() as integer
	
	set extractedText to text (yearLocation + 1) thru (yearLocation + 4) of aFilm
	if contents of aFilm begins with extractedText or hasYearPrefix then
		set end of mappedFilmList to contents of aFilm
	else
		set end of mappedFilmList to extractedText & space & contents of aFilm
	end if
end repeat

The first part “(\d{4})?[^]]+” of the pattern means: Search for 4 digits (optional), capture the value and ignore all subsequent characters which are not “]”.

technomorph · September 5, 2022, 3:11pm

The Magic that your looking for is in the Pattern and the Replace

Pattern: (note it will be different in appleScript as you have to double escape things"
^(\d\d\d\d(?=\s\w))?\s?(.?)\s+[(.)]\s+(\d\d\d\d).*?$

The Replace:
$4 - $2 [$3] ($4)
with the replace I’ve added in a " - " between the YEAR and the Title to help further in the future
I’ve also surrounded the ending year with (19xx) again to help with matching in the future.

This code works super fast and quick.
It only fails with a YEAR that starts out the line, has 4 digits and is followed by a space and a letter…
For:
2001: A Space Odyssey [Stanley Kubrick] 1968
it works because of the colon after 2001.
But For:
2001 A Space Odyssey [Stanley Kubrick] 1968
it fails.

Also notice it catches and eliminates any extra space at the end or between things.

Here are some screen shots from the program RegexKit
(CTRL CLICK OPEN IMAGE IN NEW TAB TO SEE FULL SIZE)
MAIN PATTERN WITH BREAKDOWN

SHOW GROUP MATCH CAPTURES

SHOWING CODE

see next post for AppleScript

technomorph · September 5, 2022, 3:21pm

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

property NSRegularExpression : a reference to current application's NSRegularExpression
property NSRegularExpressionCaseInsensitive : a reference to 1
property NSRegularExpressionUseUnicodeWordBoundaries : a reference to 40
property NSRegularExpressionAnchorsMatchLines : a reference to 16
property NSRegularExpressionSearch : a reference to 1024
property NSString : a reference to current application's NSString

property myTestName : ""

property mySourceA : ""
property mySourceB : ""
property myPattern1 : ""
property myPattern2 : ""
property myReplace : ""

property myTestA1 : ""
property myTestA2 : ""

property myTestB1 : ""
property myTestB2 : ""
property myTestExpect1 : ""
property myTestExpect2 : ""

property logRegEx : true
property logResults : true
property logDebug : false



-- RUN TEMPLATE

-- \\b(WAV|24 bit|96|19\\.2)\\b
-- NEED FLAC MISSING BAD LOW REPLACE NOT LIVE

set aWordsPattern1 to "^(\\d\\d\\d\\d(?=\\s\\w))?\\s?(.*?)\\s+\\[(.*)\\]\\s+(\\d\\d\\d\\d).*?$"
set aWordsPattern2 to ""
set aSource1 to "20,000 Leagues Under the Sea [Georges Méliès] 1907"
set aSource2 to "1941 [Steven Spielberg] 1979"
set aReplace to "$4 - $2 [$3] ($4)"

my testRegWithName:"MOVIE FILE NAME SCAANING" pattern1:aWordsPattern1 pattern2:aWordsPattern2 source1:aSource1 source2:aSource2 replaceWith:aReplace expecting1:"" expecting2:""


-- MAIN SCRIPT OBJECT FUNCTIONS
on testRegWithName:aName pattern1:patternNo1 pattern2:patternNo2 ¬
	source1:sourceA source2:sourceB replaceWith:aReplace ¬
	expecting1:expectNo1 expecting2:expectNo2
	my resetValues()
	set myTestName to aName
	if not patternNo1 is "" then set myPattern1 to patternNo1
	if not patternNo2 is "" then set myPattern2 to patternNo2
	if not sourceA is "" then set mySourceA to sourceA
	if not sourceB is "" then set mySourceB to sourceB
	if not aReplace is "" then set myReplace to aReplace
	if not expectNo1 is "" then set myTestExpect1 to expectNo1
	if not expectNo2 is "" then set myTestExpect2 to expectNo2
	
	my runTestA()
	my runTestB()
	if logResults then my logTestResults()
end testRegWithName:pattern1:pattern2:source1:source2:replaceWith:expecting1:expecting2:

on resetValues()
	set myTestName to ""
	set myPattern1 to "NONE"
	set myPattern2 to "NONE"
	set mySourceA to "NONE"
	set mySourceB to "NONE"
	set myReplace to ""
	
	set myTestA1 to "NONE"
	set myTestA2 to "NONE"
	
	set myTestB1 to "NONE"
	set myTestB2 to "NONE"
	
	set myTestExpect1 to "NONE"
	set myTestExpect2 to "NONE"
end resetValues

on runTestA()
	if mySourceA is "NONE" then
		return
	end if
	if not myPattern1 is "NONE" then
		set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
	end if
	if not myPattern2 is "NONE" then
		set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
	end if
end runTestA

on runTestB()
	if mySourceB is "NONE" then
		return
	end if
	if not myPattern1 is "NONE" then
		set myTestB1 to my findInString:mySourceB withPattern:myPattern1 replaceWith:myReplace
	end if
	if not myPattern2 is "NONE" then
		set myTestB2 to my findInString:mySourceB withPattern:myPattern2 replaceWith:myReplace
	end if
end runTestB

on logTestResults()
	log ("------------------------------------------- TEST RESULTS LOG")
	log {"----------------myTestName is", myTestName}
	
	log {"myPattern1 is", myPattern1}
	log {"myPattern2 is", myPattern2}
	log {"myReplace is", myReplace}
	
	log {"--------------mySourceA is", mySourceA}
	
	log {"myTestA1 is", myTestA1}
	log {"myTestA2 is", myTestA2}
	if not myTestExpect1 is "NONE" then
		log {"myTestExpect1 is", myTestExpect1}
	end if
	
	log {"--------------mySourceB is", mySourceB}
	log {"myTestB1 is", myTestB1}
	log {"myTestB2 is", myTestB2}
	if not myTestExpect2 is "NONE" then
		log {"myTestExpect2 is", myTestExpect2}
	end if
end logTestResults

-- MAIN FUNCTIONS


on findInString:aString withPattern:aRegExString replaceWith:aReplace
	set aRegEx to my createRegularExpressionWithPattern:aRegExString
	if logDebug then
		log {"aRegEx is:", aRegEx}
	end if
	return (my findInString:aString withRegEx:aRegEx replaceWith:aReplace)
end findInString:withPattern:replaceWith:

on findInString:aString withRegEx:aRegEx replaceWith:aReplace
	if logDebug then log ("findInString:withRegEx:replaceWith: START")
	set aSource to NSString's stringWithString:aString
	set aRepString to NSString's stringWithString:aReplace
	set aLength to aSource's |length|()
	set aRange to (current application's NSMakeRange(0, aLength))
	set aCleanString to (aRegEx's stringByReplacingMatchesInString:aSource options:0 range:aRange withTemplate:aRepString)
	
	return aCleanString
end findInString:withRegEx:replaceWith:

on createRegularExpressionWithPattern:aRegExString
	if (class of aRegExString) is equal to (NSRegularExpression's class) then
		log ("it alreadry was a RegEx")
		return aRegExString
	end if
	set aPattern to NSString's stringWithString:aRegExString
	set regOptions to NSRegularExpressionCaseInsensitive + NSRegularExpressionUseUnicodeWordBoundaries
	set {aRegEx, aError} to (NSRegularExpression's regularExpressionWithPattern:aPattern options:regOptions |error|:(reference))
	if (aError ≠ missing value) then
		log {"regEx failed to create aError is:", aError}
		log {"aError debugDescrip is:", aError's debugDescription()}
		break
		return
	end if
	return aRegEx
end createRegularExpressionWithPattern:



on createPatternForMatchAnyWords:aLine
	set aString to NSString's stringWithString:aLine
	set aArray to aString's componentsSeparatedByString:" "
	set aPattern to NSString's stringWithString:"\\b("
	if (logRegEx) then
		log {"createPatternForMatchAnyWords aArray is:", aArray}
	end if
	
	set aTotal to (aArray's |count|())
	repeat with i from 1 to aTotal
		set aWord to aArray's item i
		set aWord to (aWord's stringByReplacingOccurrencesOfString:"%" withString:" ")
		set aWordPattern to (NSRegularExpression's escapedPatternForString:aWord)
		if (i ≠ aTotal) then
			set aWordPattern to (aWordPattern's stringByAppendingString:"|")
		end if
		if (logRegEx) then
			log {"aWord is:", aWord}
			log {"aWordPattern is:", aWordPattern}
		end if
		set aPattern to (aPattern's stringByAppendingString:aWordPattern)
	end repeat
	set aPattern to aPattern's stringByAppendingString:")\\b"
	if (logRegEx) then
		log {"final pattern is:", aPattern}
	end if
	return aPattern
end createPatternForMatchAnyWords:


on createPatternForMatchAllWords:aLine
	set aString to NSString's stringWithString:aLine
	set aArray to aString's componentsSeparatedByString:" "
	set aPattern to NSString's stringWithString:"^"
	if (logRegEx) then
		log {"createPatternForMatchAllWords aArray is:", aArray}
	end if
	
	repeat with i from 1 to (aArray's |count|())
		set aWord to aArray's item i
		if ((aWord's |length|()) > 1) then
			set aWordPattern to (my createPatternForMatchWord:aWord)
		else
			set aWordPattern to (my createPatternForMatchLetter:aWord)
		end if
		if (logRegEx) then
			log {"aWordPattern is:", aWordPattern}
		end if
		set aPattern to (aPattern's stringByAppendingString:aWordPattern)
	end repeat
	set aPattern to aPattern's stringByAppendingString:".*$"
	if (logRegEx) then
		log {"final pattern is:", aPattern}
	end if
	return aPattern
end createPatternForMatchAllWords:

-- (?=.*\\bYou\\b)
on createPatternForMatchWord:aWord
	set aWordPattern to NSString's stringWithString:"(?=.*\\b"
	set aWordPattern to (aWordPattern's stringByAppendingString:aWord)
	set aWordPattern to (aWordPattern's stringByAppendingString:".?\\b)")
	return aWordPattern
end createPatternForMatchWord:

on createPatternForMatchLetter:aWord
	set aWordPattern to NSString's stringWithString:"(?=.*\\b"
	set aWordPattern to (aWordPattern's stringByAppendingString:aWord)
	set aWordPattern to (aWordPattern's stringByAppendingString:".{0,2}\\b)")
	return aWordPattern
end createPatternForMatchLetter:

DJUNQUERA · September 5, 2022, 4:57pm

Hello, stefanK and technomorph

You both propose a pattern-based resource that is totally unknown to me, but considering your comments, it seems worth knowing about.

I will read and study carefully the examples you send me and I would be grateful if you could tell me where I can find information about it.

Although the examples I have chosen to clearly state my question lead one to think that there is a pattern in relation to the closing bracket and the 4 characters indicating the year, in reality there is not since, as I state in comment #12, there are fixed fields and others that are variable.

“I have put in the examples a simple format in the film name for clarity, but the format of the film consists of some fixed fields (those shown in the examples) and other variables (original title; icons representing nationality and genres; audio: original version / original version subtitled / dual; and, finally, if it has won awards, especially in the Oscars awards)”

It is also possible to find an underscore at the beginning of the filename indicating that this movie has already been seen by me.

_Year Title (Original Title) [Director]Icons for nationality and genres, audio (OV/OVS/Dual), Year, (n,n), Awards

Example:

— > _2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations

Thank you very much for all your help.

estockly · September 5, 2022, 6:17pm

Typo…

Have a look at this. Seems to do what you want, but I’m still not sure what the purpose is. You can handle titles like 1941 differently.

Later you said some titles may have an _ and I don’t know how that would work. Would you want the script to ignore it? Would it go before or after the first number?


	set titleInfo to {¬
	"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
	"20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
	"1945 The Stranger  [Orson Welles] 1945 (7,4)", ¬
	"1941 [Steven Spielberg] 1979 (5,5)"}

set AppleScript's text item delimiters to {" [", "] ", " ("}
set fixedTitles to {}
repeat with thisTitle in titleInfo
	set thisTitleInfo to text items of thisTitle
	set {titleText, creator, productionYear, otherInfo} to thisTitleInfo
	try
		set titleNumber to word 1 of titleText as number
		set prodYear to productionYear as number
		if not titleNumber = prodYear then
			if (titleNumber > 1894) and titleNumber < (prodYear + 2) then
				set titleText to "?" & productionYear & "? " & titleText
			else
				set titleText to productionYear & " " & titleText
				
			end if
		end if
	on error
		set titleText to productionYear & " " & titleText
	end try
	set the end of fixedTitles to titleText & " [" & creator & "] " & productionYear & " (" & otherInfo
end repeat
return fixedTitles

--		{"2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
--		"1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
--		"1945 The Stranger  [Orson Welles] 1945 (7,4)", ¬
--		"?1979? 1941 [Steven Spielberg] 1979 (5,5)"}

StefanK · September 5, 2022, 7:14pm

Please try this, it captures any 4 digit combination after the first captured group

As already mentioned by others Regular Expression is a very powerful way to parse strings.
There are many tutorials.

use AppleScript version "2.5"
use framework "Foundation"
use scripting additions


set filmList to {"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
	"20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
	"1945 The Stranger [Orson Welles] 1945 (7,4) ", ¬
	"1941 [Steven Spielberg] 1979 (5,5)", ¬
	"_2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations"}

set regexPattern to "(_?\\d{4})?.+(\\d{4})"
set regex to my (NSRegularExpression's regularExpressionWithPattern:regexPattern options:0 |error|:(missing value))
set mappedFilmList to {}
repeat with aFilm in filmList
	set firstMatch to (regex's firstMatchInString:aFilm options:0 range:{0, (count aFilm)})
	set extractedPrefix to (firstMatch's rangeAtIndex:1)
	set hasYearPrefix to extractedPrefix's |length| = 4 or extractedPrefix's |length| = 5
	set extractedRange to (firstMatch's rangeAtIndex:2)
	set yearLocation to extractedRange's location() as integer
	set cocoAFilm to my (NSString's stringWithString:(contents of aFilm))
	
	set extractedText to text (yearLocation + 1) thru (yearLocation + 4) of aFilm
	if hasYearPrefix and extractedPrefix's |length|() = 5 then
		set end of mappedFilmList to text 2 thru -1 of contents of aFilm
	else if contents of aFilm begins with extractedText or hasYearPrefix then
		set end of mappedFilmList to contents of aFilm
	else
		set end of mappedFilmList to extractedText & space & contents of aFilm
	end if
end repeat

estockly · September 5, 2022, 7:16pm

Here’s a guess at how to handle underscores:


use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions


set titleInfo to {¬
	"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
	"20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
	"1945 The Stranger  [Orson Welles] 1945 (7,4)", ¬
	"1941 [Steven Spielberg] 1979 (5,5)", ¬
	"_2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations"}

set AppleScript's text item delimiters to {" [", "] ", " (", "_"}
set fixedTitles to {}
repeat with thisTitle in titleInfo
	set thisTitleInfo to text items of thisTitle
	if item 1 of thisTitleInfo is not "" then
		set {titleText, creator, productionYear, otherInfo} to thisTitleInfo
		try
			set titleNumber to word 1 of titleText as number
			set prodYear to productionYear as number
			if not titleNumber = prodYear then
				if (titleNumber > 1894) and titleNumber < (prodYear + 2) then
					set titleText to "?" & productionYear & "? " & titleText
				else
					set titleText to productionYear & " " & titleText
					
				end if
			end if
		on error
			set titleText to productionYear & " " & titleText
		end try
	else
		set {titleText, creator, productionYear, otherInfo} to the rest of thisTitleInfo
		
		set titleText to "_" & titleText
	end if
	set the end of fixedTitles to titleText & " [" & creator & "] " & productionYear & " (" & otherInfo
end repeat
return fixedTitles

--{"2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
--"1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
--"1945 The Stranger  [Orson Welles] 1945 (7,4)", ¬
--"?1979? 1941 [Steven Spielberg] 1979 (5,5)", ¬
--"_2021 Drive My Car [Doraibu mai kâ)] Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations"}

You say your data is inconsistent. If it contains additional [ or] or) characters then this won’t work.

KniazidisR · September 5, 2022, 7:56pm

The tutorial on using Regex expressions by @StefanK and @technomorphh is certainly useful for users and impressive.

But in this particular case, I would just get the word -2 of each file’s basename and stick it to the beginning of the same name if it’s not already there. It would be 5 lines of code.

estockly · September 5, 2022, 8:03pm

It’s not quite as simple as that but it is pretty simple. There is a strange request about going back to 1894 (the beginning of movies, I guess) and making using different info.

Was it on this forum where I heard a quote like this:

I had problem I was trying to fix with coding, so used RegEx. Now I have two problems.

Mockman · September 5, 2022, 10:20pm

I used the ‘choose file’ so I could easily paste a functioning script here but use whatever serves your purposes.

As to the additional fields, it still should be possible to separate the fields so that you can retrieve the release year.

Since it will require multiple passes, any ‘set delimiter’ must be inside the repeat loop. We can add a second pass which will get the year. The beginning of the repeat loop should now look like this:

repeat with filmString in filmList
	
	set AppleScript's text item delimiters to "]"
	set split1 to last text item of filmString
        --> Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations
	set AppleScript's text item delimiters to "("
	set rYear to last word of first text item of split1
        --> Various icons Dual 2021 
        --> 2021

Final output:
[format]“2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)
1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)
1945 The Stranger [Orson Welles] 1945 (7,4)
1979 1941 [Steven Spielberg] 1979 (5,5)
2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations”[/format]

If there are records that are still more complex, you can still likely dig deeper using more splits and possibly some if/then statements. Hope this helps.

Mockman · September 5, 2022, 11:19pm

That’s an old one.

I don’t remember where I first read it but I recall reading this 2008 blog post by Jeff Atwood, referring to a 1997 post (which I think I read but after it was written): https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/

It apparently goes back much further than that and was originally directed towards awk. Someone who looked into it wrote about their findings here: http://regex.info/blog/2006-09-15/247

DJUNQUERA · September 7, 2022, 3:46pm

Many thanks to all of you who have helped me.

You have not only helped me to catch a fish, but to receive valuable fishing lessons.

Regards.