Parsing text from a file to build a list

Okay, first off, new user and I have done extensive searching online and have come up with close information just nothing that quite cuts it. What I am trying to do is read through the contents of an .ics file (text file) and parse the information between the BEGIN:VEVENT and END:VEVENT, search through the list for user-definded criteria, extract the list items from the search, then write the data to a new text file. So far, I can open the file, set the text to a variable, then create a new file from the text variable. I am lost on the part of creating a list of “lists”, the searching, and the exportation. Please help, if you can.

Model: iMac 27
AppleScript: 2.1.2
Browser: Safari 533.21.1
Operating System: Mac OS X (10.6)

Nigel Garvey has written a script for doing just that to iCal calendars. I’m pretty sure it’s in Code Exchange. Have a look.

I have seen this and I believe this to be more work than I am trying to do here.

Here is an example of what I am looking for:

is an excerpt of the US Holidays Calendar when opened by TextEdit (.ics is just a text file rebadged, as you may already know)

I am trying to separate the tags into lists so I get something like the following:

and so on…
I am tempted to create an XML store all the data while working with it but even then, I don’t know how to go about this.
The purpose of this whole project is to separate events into multiple .ics files for import back into iCal using user defined criteria. eg. Events before a certain date, events after a certain data, all events in a certain year, events that contain x in the title…

I view the script that Nigel Garvey has posted and, again, I believe this is more complicated than I need.

Satimage.osax has two useful functions for your job: ‘splittext’ and ‘find text’. I used them in this (imperfect) attempt.

HTH,

Michael





set theText to "BEGIN:VCALENDAR
METHOD:PUBLISH
VERSION:2.0
X-WR-CALNAME:US Holidays
PRODID:-//Apple Inc.//iCal 4.0.4//EN
X-APPLE-CALENDAR-COLOR:#492BA1
X-WR-TIMEZONE:America/Los_Angeles
CALSCALE:GREGORIAN
BEGIN:VEVENT
CREATED:20090721T010305Z
UID:FC1DAF58-B6B3-444F-B8B8-58D9F9C1FD3C
DTEND;VALUE=DATE:20090615
RRULE:FREQ=YEARLY;INTERVAL=1;BYMONTH=6
TRANSP:OPAQUE
SUMMARY:Flag Day
DTSTART;VALUE=DATE:20090614
DTSTAMP:20090721T165347Z
SEQUENCE:4
END:VEVENT
BEGIN:VEVENT
CREATED:20090803T234608Z
UID:4162B846-C557-48FD-B635-E70EED88243B
DTEND;VALUE=DATE:20100706
TRANSP:TRANSPARENT
SUMMARY:Independence Day (observed)
DTSTART;VALUE=DATE:20100705
DTSTAMP:20090803T234625Z
SEQUENCE:3
END:VEVENT
..."
property theOccurrences : {} -- stores the properties (theItems) of each occurrence
set theResult to splittext of theText using "BEGIN:" with regexp
repeat with x from 2 to (count items in theResult) -- 2 - after slplittext, the first item is empty
	set theItems to {}
	set theText to item x of theResult
	set theItems to theItems & (find text "^VEVENT" in theText with regexp, all occurrences and string result) -- maybe also for VCALENDAR?
	set theItems to theItems & (find text "CREATED:(.*)\\n" in theText with regexp, all occurrences and string result)
	set theItems to theItems & (find text "TRANSP:(.*)\\n" in theText with regexp, all occurrences and string result)
	-- needs to be completed as required and finetuned. Also, the return at the end of theItems should be removed 
	set end of theOccurrences to theItems
end repeat
theOccurrences




The easiest way to do what you describe with a calender containing non-repeating events is to import it into iCal and use iCal’s AppleScript implementation to find/move/delete the events.

Otherwise, it’s potentially complex. You can get your VEVENT list like this:

set icsFile to (choose file of type "ics")

set icsText to (read icsFile as «class utf8») -- .ics data are UTF-8 Unicode text.

set astid to AppleScript's text item delimiters

set AppleScript's text item delimiters to "BEGIN:VEVENT"
set vEventList to text items 2 thru -1 of icsText
set AppleScript's text item delimiters to "END:VEVENT"
repeat with vEvent in vEventList
	set vEvent's contents to "BEGIN:VEVENT" & text 1 thru text item -2 of vEvent & "END:VEVENT"
end repeat

set AppleScript's text item delimiters to astid

vEventList

Some things to note:

  1. The data in .ics files are UTF-8 Unicode text.
  2. .ics files written by iCal 4.0 use CRLF line endings. I don’t recall offhand if this is mandatory or not.
  3. Line endings in text values are written as \n. (“\n” in AppleScript.)
  4. Some characters within text values (such as commas, which can also have functional meanings) are escaped with backslashes.
  5. A line whose first character is white space is a continuation of the text value in the line above. The white space character and the line ending preceding it must be disregarded in the interpretation of the text value.
  6. You’ll have to use repeat loops to find what you want.
  7. You probably won’t need to convert dates to Applescript, since ISOT dates are easily parsed as text.
  8. Dates and times are for the time zone in which the events were created .
  9. . except for the dates of all-day events, which are zoneless.

thank you Nigel! Your expertise is appreciated.

Before reading your reply, here is what I have come up with by myself:


-- prompt for original file
set originalFile to (choose file with prompt "Select a file to read:" of type {"ICS"})

-- set destination of new file to original folder
try
	tell application "Finder" to set the thisfolder to (folder of the originalFile) as alias
on error
	set the thisfolder to path to desktop folder as alias
end try

-- prompt for new file name
set newFile to text returned of ¬
	(display dialog "Create calendar file named:" default answer "test")
set thefullpath to POSIX path of thisfolder & newFile & ".txt"
do shell script "touch \"" & thefullpath & "\""

global thetext
set thetext to ""

-- open the chosen file and read it's conents
try
	open for access originalFile
	set thetext to (read originalFile as «class utf8»)
	close access originalFile
on error
	try
		close access originalFile
	end try
	return false
end try

set beginEvent to "BEGIN:VEVENT"
set textCount to the paragraphs of thetext

-- create the main array to hold the event items
set thearray to {}

repeat 1 times
	set end of thearray to {{}}
end repeat

set y to 1
set z to 1
set a to 1

repeat with x from 1 to the count of textCount
	set theitem to arrayitem(textCount, y)
	
	if theitem is "END:VCALENDAR" then
		set theitem to ""
	end if
	
	if theitem is beginEvent then
		set end of thearray to {{}}
		set z to z + 1
		set a to 1
	end if
	
	set item a of item z of thearray to theitem
	set end of item z of thearray to {}
	set y to y + 1
	set a to a + 1
	
end repeat


on arrayitem(textCount, paragraphitem)
	set AppleScript's text item delimiters to return
	set itemreturn to the paragraph paragraphitem of thetext as string
	set AppleScript's text item delimiters to {""}
	return itemreturn
end arrayitem

-- create final array to hold all the event lists
set modarray to items 2 thru -1 of thearray

This all is probably hugely inefficient but, as stated earlier, I am quite new to this. Comments are appreciated.

Just another way of doing this with the data represented here. If the icns file gets bigger you could try to use object references to improve the performance of the script, Otherwise I would recommend to write a small command line utility in C who can handle large files very easy. Also the build up of an icns file is very easy. Also the script below doesn’r really matters how the file is organised it only gets the events from the file, nothing more nothing less, and can handle dirty icns files as well without problems.

set icnsFileContents to "BEGIN:VCALENDAR
METHOD:PUBLISH
VERSION:2.0
X-WR-CALNAME:US Holidays
PRODID:-//Apple Inc.//iCal 4.0.4//EN
X-APPLE-CALENDAR-COLOR:#492BA1
X-WR-TIMEZONE:America/Los_Angeles
CALSCALE:GREGORIAN
BEGIN:VEVENT
CREATED:20090721T010305Z
UID:FC1DAF58-B6B3-444F-B8B8-58D9F9C1FD3C
DTEND;VALUE=DATE:20090615
RRULE:FREQ=YEARLY;INTERVAL=1;BYMONTH=6
TRANSP:OPAQUE
SUMMARY:Flag Day
DTSTART;VALUE=DATE:20090614
DTSTAMP:20090721T165347Z
SEQUENCE:4
END:VEVENT
BEGIN:VEVENT
CREATED:20090803T234608Z
UID:4162B846-C557-48FD-B635-E70EED88243B
DTEND;VALUE=DATE:20100706
TRANSP:TRANSPARENT
SUMMARY:Independence Day (observed)
DTSTART;VALUE=DATE:20100705
DTSTAMP:20090803T234625Z
SEQUENCE:3
END:VEVENT
END:VCALENDAR"

set theLines to every paragraph of icnsFileContents
set readingEvents to false
set theEvents to {}
repeat with thisLine in theLines
	if contents of thisLine is equal to "BEGIN:VEVENT" then
		set readingEvents to true
		set end of theEvents to {} --add new empty event
	end if
	
	if readingEvents then
		set end of last item of theEvents to contents of thisLine
	end if
	
	if contents of thisLine is equal to "END:VEVENT" then
		set readingEvents to false
		set AppleScript's text item delimiters to return
		set last item of theEvents to last item of theEvents as string
		set AppleScript's text item delimiters to ""
	end if
end repeat

return theEvents

I thank you all for your replies.

@Nigel Garvey
I understand that there are more convenient ways to do this using iCal, however, I will be dealing with exported calendars that contain tens of thousands of events and will usually break (read: crash) iCal. Hence the desire to edit the ics file directly. The point is to extract the data to an array, search the data, extract the events that match the search criteria, then save off a new file(s) with the desired events, ready for import into iCal. I also realize that Applescript may not be powerful enough to handle the amount of data that I am trying to process. Again, I am new to programming and Applescript seems to plain english enough for me to wrap my head around what it is I am trying to do. If you have any other suggestions, please, I am open to anything.

This might give you a basis on which to get started. It only contains the code required to identify and save vEvents where the value of a certain property is less than a given reference value. (Here the input is hard-coded for start dates before 1st January 2010.) You’ll need extend it yourself for the full functionality you want.

-- The main handler controlling the program flow.
on main()
	set originalFile to (choose file with prompt "Select a calendar file to read:" of type {"ics"})
	set newFile to (choose file name with prompt "Save new calendar file as:" default name "Test.ics")
	set {preamble, vEventList, postamble} to getCalendarParts(originalFile)
	
	(* The code in the 'considering case' statement below is hardwired to return a list of vEvents with start dates before 1st January 2010 (time zone not considered, complications with repeating events not considered). You'll need to adapt and extend it to get the criteria from the user, set up the appropriate parameters, and call the appropriate handler. You'll also need to write the handlers for the different kinds of comparison and perhaps some code to combine them. *)
	
	considering case -- To make the text comparisons faster
		set iCalendarKey to "DTSTART" -- iCalendar key for 'start date'.
		set referenceValue to "20100101" -- 1st January 2010 in ISO format.
		set vEventsToKeep to lessThan(vEventList, iCalendarKey, referenceValue)
	end considering
	
	saveNewCalendar({preamble, vEventsToKeep, postamble}, newFile)
end main

-- Read a given .ics file and return the opening calendar data, a list of indivual vEvent texts, and a final "END:VCALENDAR" line.
on getCalendarParts(icsFile)
	set icsText to (read icsFile as «class utf8») -- .ics data are UTF-8 Unicode text.
	
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "BEGIN:VEVENT"
	set preamble to text 1 thru paragraph -2 of text item 1 of icsText
	set vEventList to text items 2 thru -1 of icsText
	set AppleScript's text item delimiters to "END:VEVENT"
	repeat with vEvent in vEventList
		set vEvent's contents to "BEGIN:VEVENT" & text 1 thru text item -2 of vEvent & "END:VEVENT"
	end repeat
	set AppleScript's text item delimiters to astid
	
	return {preamble, vEventList, "END:VCALENDAR"}
end getCalendarParts

-- Coerce a list of iCalendar parts to text with CRLF line endings and write the result to a given file.
on saveNewCalendar(calendarParts, calendarFile)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to (return & linefeed)
	set calendarText to calendarParts as text
	set AppleScript's text item delimiters to astid
	
	set fRef to (open for access calendarFile with write permission)
	try
		set eof fRef to 0
		write calendarText as «class utf8» to fRef -- .ics data are UTF-8 Unicode text.
	on error msg
		display dialog msg
	end try
	close access fRef
end saveNewCalendar

-- Return a list of the vEvents in vEventList where the value with the given iCalendar key (where it exists) is less than the given reference value.
on lessThan(vEventList, iCalendarKey, referenceValue)
	-- Script object containing list properties for referencing. (Speed hack.)
	script o
		property vlIn : vEventList
		property vlOut : {}
	end script
	
	repeat with i from 1 to (count vEventList)
		set vEvent to item i of o's vlIn
		set val to getValue(vEvent, iCalendarKey)
		if (val is not missing value) and (val < referenceValue) then set end of o's vlOut to vEvent
	end repeat
	
	return o's vlOut
end lessThan

-- Return the value associated with the given iCalendar key from the given vEvent text block. If there's no such key, return 'missing value'.
on getValue(vEvent, iCalendarKey)
	if (vEvent contains iCalendarKey) then
		-- Get the value from the line containing the key.
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to iCalendarKey
		set ti2 to text item 2 of vEvent
		set AppleScript's text item delimiters to ":"
		set val to paragraph 1 of text item 2 of ti2
		-- If the key indicates a text value, append any continuation lines and unescape any special characters.
		if (iCalendarKey is in "SUMMARY LOCATION DESCRIPTION") then
			repeat with i from 2 to (count ti2's paragraphs)
				set thisLine to paragraph i of ti2
				if (thisLine begins with space) and ((count thisLine) > 1) then
					set val to val & text 2 thru -1 of thisLine
				else
					exit repeat
				end if
			end repeat
			set AppleScript's text item delimiters to "\\,"
			set val to val's text items
			set AppleScript's text item delimiters to ","
			set val to val as text
			set AppleScript's text item delimiters to "\\n"
			set val to val's text items
			set AppleScript's text item delimiters to linefeed
			set val to val as text
		end if
		set AppleScript's text item delimiters to astid
	else -- This vEvent block doesn't contain the given key.
		set val to missing value
	end if
	
	return val
end getValue

main()

First and foremost, Thank You Nigel and everyone for all of your posts throughout the forum. I’m a total newbie when it comes to Applescript, and I’ve learned an incredible amount from all of your posts.

I have a few questions about this code, the first being very basic. Does the example above still work on OS X (10.8)? I’m going batty trying to decode the DTSTART field in the ics file and when I try to incorporate the code above, i get error after error. I’m wondering if there have been any changes to Applescript over the last 3+ years which would make this code not function as expected.

I’m trying to do something fairly basic… I’m able to parse out the ics file and I end up with DTSTART:20141010T010000Z.

all I want to do is be able to adjust it for my local timezone and get it in a October 10, 2014 format. In this case, the actual event is (was) on 10/10/2014 at 9pm.

I’m using the ‘run applescript’ action of Automator and sending the results to an email. We have an internal scheduling system which absorbs emails and generates a consolidated department calendar. I’m trying to write a script to convert the ics file to a simple email body… for example:
This…
BEGIN:VCALENDAR
VERSION:2.0
METHOD:PUBLISH
BEGIN:VEVENT
UID:25bd4aa1-c3b7-4b7d-b28b-a22b389da33d
SUMMARY:Welcome Luncheon
LOCATION:Blue Point Grill
DTSTART:20141105T163000Z
DTEND:20141105T180000Z
DTSTAMP:20141014T174641Z
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:Welcome Luncheon
BEGIN:VALARM
TRIGGER:-P24H
REPEAT:4
DURATION:PT6H
ACTION:DISPLAY
DESCRIPTION:Welcome Luncheon
END:VALARM
END:VEVENT
END:VCALENDAR

becomes…

Activity display name :Welcome Luncheon
Activity address :Blue Point Grill
Start date :November 5, 2014
Start time :1130am
End Time :1:00pm

Thanks in advance for any help at all.
John

Thankfully, even the misleading name, Apple isn’t setting the standard for iCal. iCal was an application name by Apple with the same as the format and which is (Apple had to?) changed into Calendar for that reason. iCal is the successor of vCalendar and the format hasn’t been changed since 2009. So the script written in 2011 works still with the latest format, or at least should work.

Thanks for the quick reply… I’m good with the file format, as I figured it hadn’t changed much. My question was more around the script itself. I cut/paste it and just get a lot of errors (Operation couldn’t be completed (-212) ), etc. It just seems to me that a lot of the variable typing may have changed and become stricter or reserved.

Hi OldZepHead. Welcome to MacScripter.

Not wonderfully fast, but I think this works. I’ve assumed you’ll be reading individual event files rather than whole-calendar ones:

-- Hard coded data for testing.
set icsData to "BEGIN:VCALENDAR
VERSION:2.0
METHOD:PUBLISH
BEGIN:VEVENT
UID:25bd4aa1-c3b7-4b7d-b28b-a22b389da33d
SUMMARY:Welcome Luncheon
LOCATION:Blue Point Grill
DTSTART:20141105T163000Z
DTEND:20141105T180000Z
DTSTAMP:20141014T174641Z
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:Welcome Luncheon
BEGIN:VALARM
TRIGGER:-P24H
REPEAT:4
DURATION:PT6H
ACTION:DISPLAY
DESCRIPTION:Welcome Luncheon
END:VALARM
END:VEVENT
END:VCALENDAR"

set simpleEmailBody to makeEmailText(icsData)

on makeEmailText(icsData)
	set summary to "Activity display name :" & parseData(icsData, "SUMMARY:")
	set location to "Activity address :" & parseData(icsData, "LOCATION:")
	set startTime to icsToUSLocalTime(parseData(icsData, {"DTSTART:", "DTSTART;"}))
	set startDate to "Start date :" & paragraph 1 of startTime
	set startTime to "Start time :" & paragraph 2 of startTime
	set endTime to "End time :" & paragraph 2 of icsToUSLocalTime(parseData(icsData, {"DTEND:", "DTEND;"}))
	
	return summary & linefeed & location & linefeed & startDate & linefeed & startTime & linefeed & endTime
end makeEmailText

-- Parse for data with a particular label.
on parseData(icsData, label)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to label
	if (count icsData each text item) > 1 then
		set ti2 to text item 2 of icsData
	else -- There's no entry with this label in the data.
		set ti2 to linefeed
	end if
	set AppleScript's text item delimiters to astid
	
	set theseData to paragraph 1 of ti2
	repeat with i from 2 to (count ti2's paragraphs)
		if (paragraph i of ti2 begins with space) then
			set theseData to theseData & text 2 thru -1 of paragraph i of ti2
		else
			exit repeat
		end if
	end repeat
	
	return theseData
end parseData

-- Convert an ISO date to a local date/time string in US format.
on icsToUSLocalTime(DTin)
	if (DTin begins with "VALUE=DATE:") then -- All-day event.
		set shellScript to "eraT=$(date -jf '%Y%m%d' '" & text -8 thru -1 of DTin & "' '+%s') ;
date -r \"$eraT\" '+%B %e, %Y%n' | sed -E 's/ {2}/ /'"
	else
		if (DTin ends with "Z") then -- GMT.
			set dateIn to text 1 thru -2 of DTin
			set dateTZ to "GMT"
		else -- Other time zone.
			set dateIn to text -15 thru -1 of DTin
			set dateTZ to text 6 thru -17 of DTin
		end if
		set shellScript to "computerTZ=$(readlink 'etc/localtime' | sed 's|/usr/share/zoneinfo/||') ;
eraT=$(TZ=" & dateTZ & " date -jf '%Y%m%dT%H%M%S' '" & dateIn & "' '+%s') ;
TZ=\"$computerTZ\" date -r \"$eraT\" '+%B %e, %Y%n%l:%M%p' | sed -E '1 s/ {2}/ / ; 2 { s/^ // ; y/APM/apm/ ; }'"
	end if
	
	return (do shell script shellScript)
end icsToUSLocalTime

Edit: Script improved to cope when any of the required data aren’t specified in the file and to recognise all-day event dates correctly!

wow… amazingly elegant compared to the spaghetti code I threw together. a million thanks!

John

Make that 999,999. The script needed a couple of adjustments, which I’ve now made.