Hi
I try to read a webpage full with text and has a txt extension in the url.
The page contains just text, but Safari ( 5 ) inspector shows using <pre … then the text… The source menu item is then not available.
Reading a HTML page works well, did do this many times, but I dont know how to read that text page into a variable.
set theURL to baseURL & eachYear & "-" & eachMonth & ".txt"
log {"1 url", theURL}
try
make new document with properties {URL:theURL}
delay 2
set theWholeSource to source of front document as string
log {"2 theWholeSource", theWholeSource}
set lengthSource to number of characters in theWholeSource
log {"3 lengthSource", lengthSource}
on error
display dialog "There was an Error with loading the source from the URL" with icon 0
end try
Thank you very much, this is good stuff. No load time, like in safari.
But if there is still a way to do this in safari, I still like to know, because later in my code I still need it in Safari to create other things with javascript.
I have to add some xml in some parts and in other parts I have to put html around it.
Both part should be visual checked before saved.
Before that can happen, I have to split the txt file into parts.
Where I should split depends on the following pattern.
So I have to search for the first words ‘q55’ ‘q67:’ ‘ww3:’ and ‘qw4’ in four following lines and then split that part till the next repeated four lines. The text length could be from 10 - 500 lines off each part till the next pattern of four lines.
Because those parts could be repeated at almost 150 times, with daily 50 - 80 text files, I am searching for the quickest way to do the splitting.
Any idea, how to do this split with the quickest pattern search?
After that I will do one more split because on the xml, but I guess that will be not that difficult.
I have misunderstood you perfectly you are looking for a run of lines each starting subsequently with ‘q55’ ‘q67:’ ‘ww3:’ and ‘qw4’. Then comes the the text you want to extract. This runs until the next pattern of four lines starting with the aforementioned “codes”.
Sed is the definitively fastest pattern matcher around. But this is a very complex pattern. I might be faster to write
a snippet in C than getting the correct regexp :lol:, which I can do for you. for this pattern, due to the share volume of the input.
The codes will then be considered “atomic” in that they won’t change. (Hardcoded in the C-snippet.)
I foresee that the program will leave you with some files for example with #1#2 and so on added to the filenames
if given an intial filename. The utility could also get the input from a stream.
I’ll give sed a try first, so don’t expect any thing, but that you will have a working solution within 48 hours.
Please do tell me more specifically how the resulting files should be. We also need a work folder.
It would be nice if you elaborated a little bit more on the workflow.
Should for instance the resulting text files be deleted before the next job or should a new working folder be created
and so on.
I will also try to read the sed pages and figure things out.
The files should be saved and have a filename of what comes after the ww3: till eol ( max 36 characters )
The converted html and xml results will become just one ( 1 ) file, this for earlier each spliced element/part, this to keep data and meta data together and will be read/used into a app with a text webview and some textfields with the meta data.
That one file part is also just finished.
Its very kind from you about your proposed effort to created a working solution. However, for now, I am very happy just to have the basic what will get me started and will teach me a lot.
Also with this magnitude of numbers of files I will test some speed when I got something basic, otherwise I have to write something in C or Obj-c.
About the check of the layout, I have to create something a kind of template layer, because visually by a person will make that person grazy.
Apart for figuring out the regexp with Sed. Google “Sed Towers of Hanoi” that’s an example (very heavy) but uses the registers should you need that.
I don’t still see exactly what you are up to, but if you are going to to process the file and not split it, it can be done with sed. If you are splitting the files in part that is a much more difficult story.
I’d rather jump directly for a solution in C (raw) since it is really an easy task. Maybe, just for the fun of it, I’d use awk as an intermediary solution, and check if that worked ( and were speedy enough) before writing the C code.
It would be interesting to see your final solution.