iWork flat files

Yvan_Koenig · October 25, 2012, 8:42am

Hello

I guess that most of you know that it’s easy to change an iWork flatfile into a package one :

rename the flatfile as wxyz.pages to wxyz.pages.zip then double click it or
open it in a script.

What I’m trying to achieve, with no success, is to convert an iWork package into a flatfile.
At this time, I must :
Set Pages (it’s the same for Numbers, Keynote & iBooks Author) preferences to create new documents as flat ones,
rename the package document from wxyz.pages to wxyz.template so that it will be a template
double-click (open) the package to get a new flat document.

I guess that a Shell incantation would do the trick but I’m not a Grand Priest.

Yvan KOENIG (VALLAURIS, France) jeudi 25 octobre 2012 10:42:47

Yvan_Koenig · October 26, 2012, 1:32pm

From the first version to iWork '08, iWork applications created documents organized as packages.
Packages are disguised folders.
We may look at their internals by a right click on their icon.
Doing that give us the contextual menu item : Show Package contents.
A basic contents would be :

bol5.jpg ← picture file
bol5.png ← picture file
bol5.tiff ← picture file
Contents ← folder
PkgInfo ← file stored in the folder Contents
index.xml.gz ← gzipped version of the file index.xml describing the document contents
QuickLook ← folder
Thumbnail.jpg ← picture file stored in the QuickLook folder, displayed in open dialogs.

When users were curious, they expanded the index.xml.gz archive. Given the way they did that they got the couple index.xml & index.xml.gz or only the expanded file index.xml.

More, there is an iWork preference to which Apple gave no graphical interface.
It’s stored in the application preferences file as the property FormatXML.
The standard setting is FormatXML = no.
In this case, the index.xml file is made of two blocks of text separated by a linefeed.
The first one just described which kind of XML is used.
At this time, it’s :

<?xml version="1.0"?>

The second block is a huge flow of descriptors (which may be nested) enclosed between less and greater characters without separators.

Given that reading this block is difficult for normal human being.
Happily, if we set FormatXML to yes, the application generates a human readable file in which the descriptors are separated by linefeed characters and are indented (alas with space characters).

iWork '09 introduced a new document format : flatfile.
This format is a special king of packed file. As long as I don’t know the way such files are packed, I will not name them “zipped files”.
They are built from the good old package format with some differences.

(a) the index.xml is not stored gzipped in the package.
(b) a new file appears : buildVersionHistory.plist
(c) an other new file may be inserted if the user urge the app to do that.
It’s Preview.pdf stored in the QuickLook folder.
This file is used by Quicklook and, if I remember well by Spotlight.
I often extracted it from corrupted documents so that users may rebuild quite easily their documents.

An easy way to reach the internals of such a flat file is to rename it from wxyz.pages to wxyz.pages.zip then double click it. Doing that we get the document wxyz.pages in the package format (with index.xml in its expanded form).

Flat files may contain the compact index.xml files (FormatXML = no) or human readable ones (FormatXML = yes).

If we look in the internals of a flatfile with an hexadecimal editor, we see several blocks of datas.
I describe here the contents of the flat version of the Pages '09 document whose contents was listed above.

The bol5.png file is described by an header, its zipped contents and a descriptor stored at the end of the document.
The bol5.tiff file is described by an header, its zipped contents and a descriptor stored at the end of the document.
The bol5.jpg file is described by an header, the exact contents of the jpeg file which was pasted in the doc (no need to pack it as a jpeg file is already packed) and a descriptor stored at the end of the document.
The Thumbnail.jpg file is described the same way : an header, the jpeg file and a descriptor at the end of the document.
When it’s available, the Preview.pdf is described the same way but this time the PDF is embedded without any kind of compression so that it may be used by Quicklook (and Spotlight if my.).
Exactly the same thing for buildVersionHistory.plist which is so short that packing it would require more space than the native version.
For index.xml, we have an header, the gzipped version and a descriptor at the end of the file.

So, the end of the document behave as a kind of Table of contents used during the expansion scheme (or by Quicklook when it want to display Preview.pdf or thumbnail.jpg).

You must be aware that iWork documents may be corrupted in different ways.

(1) the index.xml (or index.xml.gz) may be missing
(2) the index.xml file may be available but Pages may be unable to decipher it.
(2a) in the pre '09 era, what was quite common was index.xml file perfectly readable with a text editor like TextWrangler (or even TextEdit) but with a structure badly organized. In this case, with a bit of patience,we may rebuild the document from the available description
(2b) since '09 delivery, there is a new kind of corruption : the packed document is badly organized. I got files in which the header named above weren˜t at their normal location (in front of the blocks of datas). In this case, index.xml may be embedded but the application is unable to extract it.
Entering the file with an hexadecimal editor allow us to move the different blocks in their normal location and when it’s done some times, we retrieve a good document. Alas we don’t get that always because the index.xml datas may be completely missing, or available in a corrupted way so that we can’t expand it or in the wrongly formed xml described above.

What is really foolish is that sometimes, the flat file can’t be open by the application but the package document obtained through the rename scheme behave perfectly.

You will perhaps ask : what relation with AppleScript ?

(1) I think that knowing the documents internals is important when we want to use scripts to apply some changes like paper size / orientation without opening the documents in Pages.

(2) I think that it’s always useful to write that computers are far from perfect tools and that the basic rule is to keep chronological (incremental) backups of our important documents.

(3) It give me the opportunity to pass you a link to a set of scripts which I wrote to allow Lion or Mountain Lion users to extract ‘not too old’ versions of their documents when they appear as corrupted. In this case, we have no Apple delivered way to extract these ‘not too old’ versions which may be life savers according to the number of messages which I receive about corrupted documents.

This set of scripts is available in my public SimpleShare Box :
http://www.box.com/s/00qnssoyeq2xvc22ra4k
Navigating in this box, you will find a lot of scripts dedicated to iWork. AppleWorks dedicated ones are also available but I’m not sure that they would be useful for you.

The direct link to the set of scripts is :
https://www.box.com/s/nrpt1bejserlzda3iq49

You may also navigate this way :
All Files > public_YK > about_OSX > Versions as Recovery Tool.zip

Yvan KOENIG (VALLAURIS, France) vendredi 26 octobre 2012 15:32:08