In need of help with tokenizing

Ray_Barber · July 2, 2003, 5:38pm

If anybody would help me out with this, that would be totally awesome…

I am trying to analyze a small block of text in a plain text file (about 2500 bytes), and remove any unnecessary returns. I then need the modified text to be written into a new text file.

I figured that the best way to do this was with ACME’s tokenize command. I was thinking that if I could get the text to tokenize by returns, I could then remove any blank returns. I only need one return between the lines, not multiple.

I can make the text into return-delimited tokens, but I cannot remove/delete the excess returns. I’m sure it can be done, I just don’t know how.

The source text kinda looks like this:
"blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah

blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah

blah blah blah blah blah blah blah blah

blah blah blah blah blah blah blah blah"

Seriously, if any one could help, I would be very grateful!

Ray_Barber · July 2, 2003, 5:46pm

You could use two returns as the delimiter – break the string into a list – change the delimiter to one return – and finally convert the list to a string.

hope this helps.

jonn8 · July 2, 2003, 6:37pm

Using vanilla AppleScript:

property CR : (ASCII character 13)
property NL : (ASCII character 10)

set the_file to “path:to:file”
set the_string to read file the_file
set AppleScript’s text item delimiters to NL
set the_list to (every text item of the_string) as list
set AppleScript’s text item delimiters to “”
set return_string to “”
repeat with i from 1 to count of the_list
set sub_string to item i of the_list
if sub_string is not “” then set return_string to return_string & sub_string & NL
end repeat
try
set modified_file to open for access file (the_file & “.m”) with write permission
set eof of modified_file to 0
write return_string to modified_file starting at eof
close access modified_file
on error
close access modified_file
end try

–this script was automatically tagged for
–color coded syntax by Script to Markup Code
–written by Jonathan Nathan

Jon

Mytzlscript · July 2, 2003, 8:27pm

Cool, look at the pretty colors! I downloaded but it doesn’t work with Script DeBugger 3. Do you know if there is another version out there for 3rd party editors?

Here is a short, but sweet, and completely non-vanilla solution to the original post using Akua List Suite (I have v106)

set someFile to read "Mac HD:Desktop Folder:somefile" as list using delimiter return--get the file contents as a return delimited list
set newList to collect items of someFile that match "" with negation and just contents--now just get the items of that list that do not match the blank lines

That will return every item of the original return delimited list that does not match “”. If there is an invisble character there you will have to use that in place of (that match “”).

I couldn’t get the vanilla solution to work but must confess I didn’t play with it much.

Best,

jonn8 · July 2, 2003, 9:58pm

Here is a slightly modified version of the script to test for different line endings (Mac, Unix, etc.):

property CR : (ASCII character 13)
property NL : (ASCII character 10)
property line_endings : {NL, return, CR}

set the_file to “path:to:file”
set the_string to read file the_file
repeat with i from 1 to count of line_endings
set line_ending to (item i of line_endings)
set AppleScript’s text item delimiters to line_ending
set the_list to (every text item of the_string) as list
set AppleScript’s text item delimiters to “”
if (count of the_list) > 1 then exit repeat
end repeat
set return_string to “”
repeat with i from 1 to count of the_list
set sub_string to item i of the_list
if sub_string is not “” then set return_string to return_string & sub_string & line_ending
end repeat
try
set modified_file to open for access file (the_file & “.m”) with write permission
set eof of modified_file to 0
write return_string to modified_file starting at eof
close access modified_file
on error
close access modified_file
end try

–this script was automatically tagged for
–color coded syntax by Script to Markup Code
–written by Jonathan Nathan

I don’t have Script Debugger so i can’t re-write the Markup Code script for it, sorry.

Jon

Ray_Barber · July 3, 2003, 6:32am

Hey, Mytzlscript, thanks for the help! I found this to be the most simplistic script of the two posted here… I seem to be having another slight issue, though (I just don’t quite understand the properties of delimiting yet, sorry).

I would like to replace the empty return fields with a tab character, if possible. I thought I could figure it out after getting some help with tokenizing/delimiting, but like I said, I still need to learn a little more about delimiting first.

Any ideas?

Thanks a bunch again, and thanks in advance for any help with this issue!

jonn8 · July 3, 2003, 6:49am

Just use my script above and change the line

to

Jon

Mytzlscript · July 3, 2003, 12:42pm

If you want to take a list like:

and turn it into a list like:

Then this should work for you. I omitted the collect items of command - don’t need in this case.

set someFile to read "Macintosh HD:Desktop Folder:spam copy" as list using delimiter return --get the file contents as a return delimited list 
set compiledList to "" --set an empty variable we can fill with our new tab delimited list 
repeat with thisLine in someFile --repeat with every item in the list of items read
	set thisLine to thisLine as string --coerce to string / I don't know why I needed this but without it I end up with extra spaces
	if thisLine does not equal "" then --if it is not an empty line
		set compiledList to compiledList & thisLine & tab --then add it and a tab to the end of our compiled list
	end if
end repeat

You could also try reading the text file as text instead of a list and using Acme to replace carraige returns with tab.

Hope this helps