Hi, I have looked through the forum looking for some code snippets to perform this function but haven’t found the right bits to do the job.
I am attempting to extract data strings from voluminous text files that are bounded by two unique characters, let’s say # on the front end of the string and * on the back end. The string between is variable length and I am unable to use traditional find and replace functions in typical text editors to achieve my objective.
Could someone point me in the right direction? I am a code newbie, but am looking forward to tackling this project.
As already noted, the Code Exchange forum is for posting full snippets/solutions. In the future, please use the other forums for help requests. This post will be moved.
set ASTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to {("*" & (ASCII character 10))}
choose file with prompt "Find text in this file:" without invisibles
get quoted form of POSIX path of result
do shell script "grep -o '#.\\+\\*' " & result & " | colrm 1 1" without altering line endings
set textItems to text items 1 thru -2 of result
set AppleScript's text item delimiters to ASTID
return textItems
Edit: Small change to the shell script. Given this sample data:
Many thanks for the gracious bump and the code. This is a great teaching example, and I worked with your first code example last night - but couldnt’ get it to work just right. I thought at first that I choose poor delimeters as they are common in HTML source files, but I will work with your second example and see if I can make a go of it!
If your text is in a variable already, you can use this code with variable delimiters (there are probably other characters that need to be escaped and this assumes your delimiters are single characters only):
--set the_text to (read (choose file))
set the_text to "1234567890asdfghjkl
.qwerty#
hELLO<_wORLD/
#FindMe!*<blah/>
aeiou#Find this too.*[-]
``~+-*/=fin"
set {start_delim, end_delim} to {"#", "*"}
set found_text to my find_delimited_text(the_text, start_delim, end_delim)
-->{"FindMe!", "Find this too."}
on find_delimited_text(the_text, start_delim, end_delim)
set {escaped_start_delim, escaped_end_delim} to {my escaped_delim(start_delim), my escaped_delim(end_delim)}
set ASCII_10 to (ASCII character 10)
tell (a reference to my text item delimiters)
set {old_tid, contents} to {contents, {ASCII_10}}
set {the_text, contents} to {(the_text's paragraphs) as Unicode text, {end_delim & ASCII_10}}
end tell
set found_text to (do shell script "echo " & quoted form of the_text & " | grep -o '" & escaped_start_delim & ".\\+" & escaped_end_delim & "' | colrm 1 1" without altering line endings)'s text items 1 thru -2
tell (a reference to my text item delimiters) to set contents to old_tid
return found_text
end find_delimited_text
on escaped_delim(the_delim)
if the_delim is in "*.?()[]^\\" then return "\\" & the_delim
return the_delim
end escaped_delim
Thank you so much for your help. Between the examples you provided, I have been tearing through my data files extracting away - and I’ve discovered the power behind applescript. I really appreciate the help. This a fantastic forum and you are a great resource!