Cleaning data out of a text file?

Wow, that new file format is very different from the one posted in post #28

Do you have a larger version of the new file to test on?

Yep as I’m now collecting more data from the Netgear Orbi logs and also the Armor system on the web. Going through the entire colleted log file for today there are various data types showing:

Todays log is 126Kb long and is too large to post.

I edited my script in post #79 to handle new input file format

If I gave you my email, could you send it to me?

Thanks, I just tried it and get error: “Can’t get item 3 of {}. (-1728)”

The start of the log file reads:

If it helps at all? Feel free to PM me your email address.

Odd, it worked for me.

Is there some invisible character that is not getting posted in your sample?

Weird… I’m saving it out as an application and dragging and dropping the input file on to it?

You have email :slight_smile:

Does your input file always have lines like below?

45 MINUTES AGO
Synology-DS918+
Network Attack Blocked

I fixed the script again to skip blank lines and the examples above

It can do yes as it’s a mixture of the actual ASCII log plus me copy and pasting in data from the Armour web page.

One last change please (if it is not too much trouble), can the output file be the same as the drag and drop input file with the '_IP_CLEANED.txt appended to it?

So:

Orbi & Armor Logs.txt

becomes:

Orbi & Armor Logs_IP_CLEANED.txt

Thanks again.

One interesting thing I’ve just noted your script gives an output file with 856 IP addresses, the one I was using previously only contained 323 IPs…

Done!

Sorry had to fix bug.

Now it’s done

Nearly LOL, that gives me an output file of:

Orbi & Armor Logs_IP_CLEANED.txtCleaned_IPs.txt

Plus I now have 428 IP addresses in the output file?? This looks to be correct actually. The prior output file somehow had doubled entries.

Indeed, fixed and looking good! Many thanks!

I also edited a few lines to speed it up.

So try again…

Looking great now, smooth and fast. Muchly appreciated.

One more edit has been done.

added code in run handler to ask for file if app double clicked instead of Drag & Dropped

I think I’m late to this party, or maybe at the wrong party, because I don’t see the email I saw this morning, but here’s how I would do it*


on open (fileList)
	repeat with thisFile in fileList
		set myText to read thisFile as Unicode text
		set AppleScript's text item delimiters to {return}
		set myText to paragraphs of myText as text
		set AppleScript's text item delimiters to {"from source: ", ", port "}
		set myText to text items of myText
		set ipAddresses to {}
		repeat with x from 2 to count of myText by 2
			set the end of ipAddresses to item x of myText
		end repeat
		set AppleScript's text item delimiters to {return}
		set myText to ipAddresses as text
	end repeat
	
	tell application "Finder"
		set fileName to the name of thisFile
		set fileExtension to name extension of thisFile
		set fileLocation to container of thisFile
		set AppleScript's text item delimiters to {"." & fileExtension}
		set newFileName to text items of fileName
		set AppleScript's text item delimiters to {"_IP." & fileExtension}
		set newFilePath to (fileLocation as text) & newFileName
		set openFile to open for access newFilePath with write permission
		set eof of openFile to 1
		write myText to openFile
		close access openFile
	end tell
	tell application "TextEdit" to open file newFilePath
end open

*(This is how I’d do it with pure appleScript. I’d make a few changes if I were using my regular arsenal of script libraries and bulletproof handlers)

Hello stocky.

Your script has a few issues.

  1. no need to put fileList in ()

  2. “set myText to read thisFile as Unicode text” should be “set myText to read thisFile as text”
    or the text comes in looking japanese
    2a. The “set myText to paragraphs of myText as text” is not needed since it is already text. You convert it to a list then back to text. Not sure why?

  3. no need for first “set AppleScript’s text item delimiters to {return}” as “as paragraphs” doesn’t need it. Also better to use ;linefeed as the file will be used as an import on some networking gear which is unix based

  4. “set openFile to open for access newFilePath with write permission” should not be in a tell “Finder” block since it is a Scripting Additions command and the finder won’t know what to do with it.

  5. Your delimiter list doesn’t work on lines like this:
    [DoS Attack: SYN/ACK Scan] from source: 195.208.6.1, port 53, Monday, July 18, 2022 15:37:01

Which was provided to me in a test file by dbrewood

That’s a habit. Doesn’t hurt anything.

That’s a probably a difference between your setup and mine. On mine if I don’t add the unicode I get a strange character between each actual character.

That makes everything uniform. Some times, you’ll find text where the paragraphs are separated by line feed, sometimes by returns, in some cases by page or column breaks. This ensures that every paragraph ends with a return.

It’s there for a reason. You could break that part of the script into three lines to see what’s happening:

set AppleScript's text item delimiters to {return}
set myText to paragraphs of myText 
set myText to myText as text

paragraphs of mytext results in a list.

myText as text uses the text item delimiters to separate the list items in the text.

The result is uniform line endings, and if linefeeds are preferred it’s easy enough to change.

Yeah, but it works. Normally, I use a “bulletproof” handler to write.

Really? Seems to work perfectly (if it didn’t I wouldn’t have posted it.)

I tested with what was in the original post (which I’m not seeing anymore). My script pulled every single IP, and nothing else, and it worked when I pasted your example in.

The original text looked like this:

Guys I’ve just realised that on the revamped scripts part of the original has been lost? The script I was using was:

use framework "Foundation"
use scripting additions

on open theDroppedItems
	set theFile to POSIX path of item 1 of theDroppedItems
	# set theExistingFile to POSIX path of (choose file)
	set theExistingFile to "/Users/dbrewood/Library/Mobile Documents/com~apple~CloudDocs/_Daron Files/NAS/Deny List/deny-ip-list.txt"
	set theIPFile to getFileName(theFile)
	
	set theText to (current application's NSString's stringWithContentsOfFile:theFile encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
	set theExistingText to (current application's NSString's stringWithContentsOfFile:theExistingFile encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
	
	set theIPData to getIPData(theText, theExistingText)
	(current application's NSString's stringWithString:theIPData)'s writeToFile:theIPFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end open

on getIPData(theText, theExistingText)
	set regExPattern to "[0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+"
	set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:regExPattern options:0 |error|:(missing value)
	set regExMatches to theRegEx's matchesInString:theText options:0 range:{location:0, |length|:theText's |length|()}
	set ipList to {}
	repeat with anItem in regExMatches
		set end of ipList to (theText's substringWithRange:(anItem's range())) as text
	end repeat
	
	set ipArray to current application's NSMutableArray's arrayWithArray:ipList
	set ipExistingText to current application's NSString's stringWithString:theExistingText
	set newlineSet to current application's class "NSCharacterSet"'s newlineCharacterSet()
	set ipExistingArray to (ipExistingText's componentsSeparatedByCharactersInSet:(newlineSet))
	ipArray's removeObjectsInArray:ipExistingArray
	
	set ipSet to current application's NSOrderedSet's orderedSetWithArray:ipArray
	set ipSortedArray to ipSet's array()'s sortedArrayUsingSelector:"localizedStandardCompare:"
	return ((ipSortedArray's componentsJoinedByString:linefeed) as text)
end getIPData

on getFileName(theFile)
	set theFile to current application's NSString's stringWithString:theFile
	set fileBase to theFile's stringByDeletingPathExtension()
	set fileExtension to theFile's pathExtension()
	return ((fileBase's stringByAppendingString:"_IP_CLEANED")'s stringByAppendingPathExtension:fileExtension)
end getFileName

Part of this script checked against the list of existent reported IP addresses in the deny list and excluded those that were already in there? Unless I’m going nuts the newer scripts don’t do this?

This is one of the most important parts of the script as it preents any duplicated reportimg?

This is the first I’m hearing about a pre-existing file to compare too.
Was this in a previous post? It could be, I’m just being lazy.

Is this file always in the same name/location?