Cleaning data out of a text file?

I’ve done a bit of work myself on the script, and have looked to:

  • Remove lines that the guy I’m submitting to didn’t want (done)
  • Filtered the output to ignore items with an abuse confidence score of less than>20
  • Change the output to include the date stamp
  • Renamed the column headings to be more human friendly

I’ve managed all of the above. It’s a learning curve for me so I’m pleased with that :slight_smile:

However I can’t get the script to save to a defined path. In the script I’ll post below I’ve looked to save it in the download directory (the final version will be a different path) and I get an

error when it looks to save the file out.

Any assistance is appreciated. If we can sort this out as well as adding the City into the data (if the API gives it) then I reckon I’m nearly there.

I will look to do the following if I can work out how:

  • Sort the output in descending order on the abuse confidence column
  • Alternatively look to output the data as .TSV (I don’t think this is possible)

Those are optional niceties though and would be gilding the lily. I’m slowly learning a few things about scripting now :slight_smile:

set report to report & “;” code line is required - to not destroy the structure of JSON data. Uncomment it.

Also, please do not sign scripts edited with an error as written by me.

Thanks, I’ve uncommented that line and the error still occurs… or do I need to uncomment the other lines?

	tell (|data| of aRecord)
		if its abuseConfidenceScore ≥ 20 then -- Filters out low confidence abuse confidence
			set report to report & linefeed & its ipAddress & ";"
			-- set report to report & its isPublic & ";"
			-- set report to report & its ipVersion & ";"
			-- if (its isWhitelisted) is missing value then set its isWhitelisted to ""
			-- set report to report & its isWhitelisted & ";"
			set report to report & its abuseConfidenceScore & ";"
			set report to report & its countryCode & ";"
			-- set report to report & its usageType & ";"
			set report to report & its isp & ";"
			set report to report & its domain & ";"
			-- try
			--	set report to report & (item 1 of its hostnames)
			-- end try
			set report to report & ";"
			-- set report to report & its totalReports & ";"
			-- set report to report & its numDistinctUsers & ";"
			-- EDITED the following
			-- set lastReportedAt to its lastReportedAt
			-- if not (lastReportedAt is missing value) then set report to report & ¬
			-- text 1 thru 10 of lastReportedAt & space & text 12 thru 19 of lastReportedAt
		end if
	end tell

Sorry to offend, I’ll remove the quoted script.

Now, you commented the try block for hostnames which is required as well. :smiley: Please post the whole script because maybe you made other mistakes. The report text should have certain strong structure to Number.app be able to recognize it as the table

Here is example of structured text. When you open this properly structured text (saved in the plain .txt file) with Numbers.app, it will recognize it as the table. As you see, every cell is separated by “;” symbol, every row by linefeed. If your text will contain different number of cells in the rows, then the table can’t be created by the Numbers.app. And that was your problem when you commented the required code lines.

6 cells in the 1st row, 6 cells in the 2nd row:


"IP Address;Abuse %;Country;ISP;Domain;Hostnames:
36.156.66.62;69;CN;China Mobile Communications Corporation;chinamobileltd.com;"

to sort the table:


tell application "Numbers"
	activate
	set theDoc to open reportTextFile
	tell theDoc to tell sheet 1 to tell table 1 to if (count rows) > 2 then sort by column 2 direction descending
	export theDoc to file csvReportFile as CSV
	close documents saving no
end tell

I’m not doing very well here am I? I was trying to save you working through the changes that were needed by trying to do some of the work myself.

Okay I understand about the structure now, and yes I have likely messed it up :slight_smile:

That data needed in the report is the IP Address, Abuse %, Country, City (if possible), ISP, and Domain, the other data is not needed by the guy I submit the attack reports to.

This is why I was looking to remove some of the fields from the output data.

I was looking to change the output path to a defined folder, sort the data by the abuse confidence - thanks for the code on that - to be honest I’m not even sure where in the script to put this (sorry).

Apple scripting is so different to the coding (database work) I did in my youth that I do find some (most) parts of it difficult to follow.

The entire script I have at the moment is:


-- script:  Check IP addresses list for DD_DOS and other net attacks
-- written by: KniazidisR (today)
-- Messed up by Daron Brewood :)
-- note: visit https://api.abuseipdb.com to learn more details

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

property NSString : a reference to current application's NSString
property NSJSONSerialization : a reference to current application's NSJSONSerialization
property NSUTF8StringEncoding : a reference to current application's NSUTF8StringEncoding

property serviceAddress : "https://api.abuseipdb.com/api/v2/check"
-- KniazidisR Key
-- property myKey : "ce8b31add6647986bca78a2460b342a168375bdd35efc801ce8ffbade5ea385d06094b665454efe8"
-- DMB Key
property myKey : "247cdf97b62552676b6f56a4e56611a7fcb2ceaced16cd1e27f7f987be861c1f42b96a7f8be45fb9"
property maxAgeInDays : 30 -- for last 30 days 

-- set ipList to {"13.32.145.30", "34.247.206.80", "36.156.66.62", "45.233.113.9", "46.51.207.89"}
-- or:
set ipListFile to (choose file of type "txt") -- provide ip addresses inside this file
set ipList to paragraphs of (read ipListFile)


-- set report to "ipAddress;isPublic;ipVersion;isWhitelisted;abuseConfidenceScore;countryCode;usageType;isp;domain;hostnames:;totalReports;numDistinctUsers;lastReportedAt"
set report to "IP Address;Abuse %;Country;ISP;Domain:"
repeat with nextIP in ipList
	-- get JSON Report (as record) using site's API
	set jsonData to do shell script "curl -G " & serviceAddress & ¬
		" --data-urlencode \"ipAddress=" & nextIP & ¬
		"\" -d maxAgeInDays=" & maxAgeInDays & ¬
		" -H \"Key: " & myKey & ¬
		"\" -H \"Accept: application/json\""
	set jsonString to (NSString's stringWithString:jsonData)
	set jsonData to (jsonString's dataUsingEncoding:NSUTF8StringEncoding)
	set aRecord to (NSJSONSerialization's JSONObjectWithData:jsonData options:0 |error|:(missing value)) as record
	-- BUILD THE REPORT as TEXT 
	
	tell (|data| of aRecord)
		if its abuseConfidenceScore ≥ 20 then -- Filters out low confidence abuse confidence
			set report to report & linefeed & its ipAddress & ";"
			-- set report to report & its isPublic & ";"
			-- set report to report & its ipVersion & ";"
			-- if (its isWhitelisted) is missing value then set its isWhitelisted to ""
			-- set report to report & its isWhitelisted & ";"
			set report to report & its abuseConfidenceScore & ";"
			set report to report & its countryCode & ";"
			-- set report to report & its usageType & ";"
			set report to report & its isp & ";"
			set report to report & its domain & ";"
			-- try
			--	set report to report & (item 1 of its hostnames)
			-- end try
			set report to report & ";"
			-- set report to report & its totalReports & ";"
			-- set report to report & its numDistinctUsers & ";"
			-- EDITED the following
			-- set lastReportedAt to its lastReportedAt
			-- if not (lastReportedAt is missing value) then set report to report & ¬
			-- text 1 thru 10 of lastReportedAt & space & text 12 thru 19 of lastReportedAt
		end if
	end tell
	
end repeat

-- make temporary text file
set tempFolder to (path to temporary items folder from user domain)
tell application "Finder"
	try
		set reportTextFile to (make new file at tempFolder with properties {name:"Report.txt"}) as alias
	on error
		set reportTextFile to (file "Report.txt" of folder (tempFolder as text)) as alias
	end try
end tell

-- Calculate date stamp

set dateObj to (current date)
set theMonth to text -1 thru -2 of ("0" & (month of dateObj as number))
set theDay to text -1 thru -2 of ("0" & day of dateObj)
set theYear to year of dateObj
set dateStamp to "" & theYear & "-" & theMonth & "-" & theDay


-- write JSON report to temporary text file
set file_ID to open for access reportTextFile with write permission
set eof file_ID to 0
write report to file_ID as «class utf8»
close access file_ID

-- convert temporary text file to CSV file
-- set csvReportFile to "" & (path to desktop folder) & dateStamp & "Report.csv"
-- set csvReportFile to "" & (path to desktop folder) & "Report " & dateStamp & ".csv"
-- set csvReportFile to "" & "/Users/dbrewood/Downloads/" & "Report " & dateStamp & ".csv"

set csvReportFile to "/Users/dbrewood/Downloads/" & "Report " & dateStamp & ".csv"

-- display dialog csvReportFile


tell application "Numbers"
	activate
	set theDoc to open reportTextFile
	export theDoc to file csvReportFile as CSV
	close documents saving no
	quit it
end tell

I fixed 3 mistakes. One of them was using Posix path for the output file. Numbers.app likes the HFS paths with file specification.


-- script:  Check IP addresses list for DD_DOS and other net attacks
--            Leave only 5 fields from the report, then sort the table by the 2-nd field
-- written by: Daron Brewood (today)
-- updated and fixed by: KniazidisR
-- note: visit https://api.abuseipdb.com to learn more details

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

property NSString : a reference to current application's NSString
property NSJSONSerialization : a reference to current application's NSJSONSerialization
property NSUTF8StringEncoding : a reference to current application's NSUTF8StringEncoding

property serviceAddress : "https://api.abuseipdb.com/api/v2/check"
property privateKey : "247cdf97b62552676b6f56a4e56611a7fcb2ceaced16cd1e27f7f987be861c1f42b96a7f8be45fb9"
property maxAgeInDays : 30 -- for last 30 days 

-- set ipList to {"13.32.145.30", "34.247.206.80", "36.156.66.62", "45.233.113.9", "46.51.207.89"}
-- or:
set ipListFile to (choose file of type "txt") -- provide ip addresses inside this file
set ipList to paragraphs of (read ipListFile)

-- set report to "ipAddress;isPublic;ipVersion;isWhitelisted;abuseConfidenceScore;countryCode;usageType;isp;domain;hostnames:;totalReports;numDistinctUsers;lastReportedAt"
set report to "IP Address;Abuse %;Country;ISP;Domain"
repeat with nextIP in ipList
	-- get JSON Report (as record) using site's API
	set jsonData to do shell script "curl -G " & serviceAddress & ¬
		" --data-urlencode \"ipAddress=" & nextIP & ¬
		"\" -d maxAgeInDays=" & maxAgeInDays & ¬
		" -H \"Key: " & privateKey & ¬
		"\" -H \"Accept: application/json\""
	set jsonString to (NSString's stringWithString:jsonData)
	set jsonData to (jsonString's dataUsingEncoding:NSUTF8StringEncoding)
	set aRecord to (NSJSONSerialization's JSONObjectWithData:jsonData options:0 |error|:(missing value)) as record
	
	-- BUILD THE REPORT as TEXT 
	tell (|data| of aRecord)
		if its abuseConfidenceScore ≥ 20 then -- Filters out low confidence abuse confidence
			set report to report & linefeed & its ipAddress & ";"
			set report to report & its abuseConfidenceScore & ";"
			set report to report & its countryCode & ";"
			set report to report & its isp & ";"
			set report to report & its domain
		end if
	end tell
	
end repeat

-- make temporary text file
set tempFolder to (path to temporary items folder from user domain)
tell application "Finder"
	try
		set reportTextFile to (make new file at tempFolder with properties {name:"Report.txt"}) as alias
	on error
		set reportTextFile to (file "Report.txt" of folder (tempFolder as text)) as alias
	end try
end tell

-- Calculate date stamp
set dateObj to (current date)
set theMonth to text -1 thru -2 of ("0" & (month of dateObj as number))
set theDay to text -1 thru -2 of ("0" & day of dateObj)
set theYear to year of dateObj
set dateStamp to "" & theYear & "-" & theMonth & "-" & theDay

-- write JSON report to temporary text file
set file_ID to open for access reportTextFile with write permission
set eof file_ID to 0
write report to file_ID as «class utf8»
close access file_ID

-- convert temporary text file to CSV file
set csvReportFile to "" & (path to downloads folder) & "Report " & dateStamp & ".csv"
tell application "Numbers"
	activate -- optional
	-- create the table
	set theDoc to open reportTextFile
	-- sort it by column 2, order descending
	tell theDoc to tell sheet 1 to tell table 1 to if (count rows) > 2 then sort by column 2 direction descending
	-- export as csv and close all
	export theDoc to file csvReportFile as CSV
	close documents saving no
end tell

Many thanks for that and for fixing the errors, a few more questions:

  • When I run the script it creates the report in the downloads directory, but when I open the report Numbers flashes and opens two spreadsheets, ‘Report 2’ and ‘Report 2002-04-29’. Any ideas why ‘Report 2’ opens? It looks like Numbers is holding open the first report which is / was closed prior to saving out the CSV file?

  • I actually want the reports to be filed in ‘/Volumes/NAS Store/IP Abuse reports’, which is on my NAS, how should I change it to use that path?

  • Am I correct that the ‘City’ data is not available from the API? It is shown on the web page when an IP address is checked via that method…

Many many thanks for all the assistance here.


set csvReportFile to "" & (path to startup disk) & "Volumes:NAS Store:IP Abuse reports:" & "Report " & dateStamp & ".csv"

13 fields is reported from the API. The ‘City’ data (field) is not reported. I don’t know why.

Thanks for that, all good apart from I think I missed what you intended to say over the:

Issue??

A pity over the City data but if they are not making it available there’s not a lot we can do :slight_smile:

No, it isn’t issue. It is a very good question. I tested here and confirm the “strange” behavior you noticed.

Looks like you’ve found a major bug in all of my scripts. It turns out that the quit it command is a duplicate of the close documents saving no command (which already closes the application), and somehow it is the quit it command that creates the “weird” behavior you noticed.

In short, removing the quit it command is necessary in all my scripts. I will update them now.

Thanks for that, at least this scripting work has been useful in sorting out that issue.

I think that the script is finally finished then… Your help is appreciated and I’ve learnt a lot!

I do have one question from the guy I submit to (which I’m not considering yet) which is can Numbers be set to export the report as TSV instead of CSV?

[UPDATE] - I’ve asked the AbuseIP developers if they have any plans to include the City data in hte API access. Also whether they might be interested in referencing the Apple Script check script (with your permission of course).

Of course, you can publish all of this. All my scripts are for free use. In addition, the implementation belongs to me, and the idea is exclusively yours.

Thanks :slight_smile:

I’ll feedback what (if anything) they come back with…

One question, would it be easy to modify the script so that it could accept a file via drag and drop if it was compiled into an application?

In addition to the above query I’m still getting the issue of Numbers retaining ‘Report 4’, or ‘Report 5’ and opening those spreadsheets when I open any other spreadsheet.

Any ideas?

I guess there was no answer to the above?

Ref the first script I’m still using the following:

use framework "Foundation"
use scripting additions

on open theDroppedItems
	set theFile to POSIX path of item 1 of theDroppedItems
	# set theExistingFile to POSIX path of (choose file)
	set theExistingFile to "/Users/path to my deny list/deny-ip-list.txt"
	set theIPFile to getFileName(theFile)
	
	set theText to (current application's NSString's stringWithContentsOfFile:theFile encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
	set theExistingText to (current application's NSString's stringWithContentsOfFile:theExistingFile encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
	
	set theIPData to getIPData(theText, theExistingText)
	(current application's NSString's stringWithString:theIPData)'s writeToFile:theIPFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end open

on getIPData(theText, theExistingText)
	set regExPattern to "[0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+"
	set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:regExPattern options:0 |error|:(missing value)
	set regExMatches to theRegEx's matchesInString:theText options:0 range:{location:0, |length|:theText's |length|()}
	set ipList to {}
	repeat with anItem in regExMatches
		set end of ipList to (theText's substringWithRange:(anItem's range())) as text
	end repeat
	
	set ipArray to current application's NSMutableArray's arrayWithArray:ipList
	set ipExistingText to current application's NSString's stringWithString:theExistingText
	set newlineSet to current application's class "NSCharacterSet"'s newlineCharacterSet()
	set ipExistingArray to (ipExistingText's componentsSeparatedByCharactersInSet:(newlineSet))
	ipArray's removeObjectsInArray:ipExistingArray
	
	set ipSet to current application's NSOrderedSet's orderedSetWithArray:ipArray
	set ipSortedArray to ipSet's array()'s sortedArrayUsingSelector:"localizedStandardCompare:"
	return ((ipSortedArray's componentsJoinedByString:linefeed) as text)
end getIPData

on getFileName(theFile)
	set theFile to current application's NSString's stringWithString:theFile
	set fileBase to theFile's stringByDeletingPathExtension()
	set fileExtension to theFile's pathExtension()
	return ((fileBase's stringByAppendingString:"_IP_CLEANED")'s stringByAppendingPathExtension:fileExtension)
end getFileName

Is there a way to modify it so that it omits any IPs addresses from the 192.168.1.xxx internal network?
As always any help appreciated.

Was just wondering if anyone did an AppleScript only version without AppleScriptObjC

Here is my version, and it also skips “192.168.1.x” addresses


use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

on run
	try
		set theFiles to choose file of type {"txt"} with multiple selections allowed
	on error
		return
	end try
	open theFiles
end run

on open theFiles
	local atid, dText, aFile, cFile, ipList, startTime, elapsedTime
	--set startTime to current application's CACurrentMediaTime()
	repeat with aFile in theFiles
		try
			set cFile to open for access aFile --with write permission
		on error
			display alert "Uh-oh! Error opening file…" giving up after 10
			return false
		end try
		try
			set dText to read cFile from 1 as text
		on error
			set dText to false
			display alert "File Empty!" giving up after 10
		end try
		close access cFile
		if class of dText is not boolean then
			set ipList to parseDosIPs(dText)
			set atid to text item delimiters
			set text item delimiters to {".txt"}
			set dText to ((text items 1 thru -2 of (aFile as text)) as text)
			set text item delimiters to atid
			combSort(ipList)
			saveIPs(ipList, dText)
		end if
	end repeat
	--set elapsedTime to (current application's CACurrentMediaTime()) - startTime
end open

on parseDosIPs(dosText)
	local atid, IPv4, tmp
	script D
		property IPs : {}
		property dosList : paragraphs of dosText
	end script
	set atid to text item delimiters
	set text item delimiters to {"remote] from ", "] from source: ", "LAN access", ", ", ":"} -- {"from source: ", ", "}
	repeat with i from 1 to count D's dosList
		set tmp to item i of D's dosList
		if tmp ≠ "" then
			set tmp to text items of tmp
			if (count tmp) > 2 then
				set IPv4 to item 3 of tmp --word 1 of 
				if IPv4 is not in D's IPs then
					if IPv4 does not start with "192.168.1" then
						set end of D's IPs to IPv4
					end if
				end if
			end if
		end if
	end repeat
	set text item delimiters to atid
	return D's IPs
end parseDosIPs

on combSort(aList)
	local sf, i, j, cc, ns, js, gap, pgap, sw -- ns means No Swap
	if class of aList is not list then return false
	script mL
		property nlist : aList
	end script
	set sf to 1.7
	set cc to count mL's nlist
	set gap to cc div sf
	repeat until gap = 0
		repeat with i from 1 to gap
			set js to cc - gap
			repeat until js < 1 -- do each gap till nor more swaps
				set ns to gap
				repeat with j from i to js by gap
					if (item j of mL's nlist) > (item (j + gap) of mL's nlist) then
						set sw to (item j of mL's nlist)
						set (item j of mL's nlist) to (item (j + gap) of mL's nlist)
						set (item (j + gap) of mL's nlist) to sw
						set ns to j
					end if
				end repeat
				set js to ns - gap
			end repeat
		end repeat
		set pgap to gap
		set gap to gap div sf
		if gap = 0 then -- no while using as integer
			if pgap ≠ 1 then set gap to 1
		end if
	end repeat
end combSort

on saveIPs(pList as list, pPath as string)
	local cFile, cEOF
	set cFile to pPath & "_IP_CLEANED.txt"
	try
		set cFile to open for access cFile with write permission
	on error
		display alert "Uh-oh! Error opening file…" giving up after 10
		return false
	end try
	set atid to text item delimiters
	set text item delimiters to linefeed
	try
		set cEOF to (get eof cFile) + 1
		write (pList as text) & linefeed to cFile as text starting at cEOF
	on error
		display alert "Error! Can't write to preference file…" giving up after 10
	end try
	set text item delimiters to atid
	close access cFile
	return true
end saveIPs

p.s. On my Mac Mini I’m getting around 23 milliseconds on the sample file from a previous post

EDIT fixed delimiter to use new input file format

EDIT fixed again to skip blank lines and such

EDIT added code in run handler to ask for file if app double clicked instead of Drag & Dropped

I just dragged / dropped my IP list on it and I get error:

Can’t get item 2 of {}. (-1728)

The raw input list can look like now:

Wow, that new file format is very different from the one posted in post #28

Do you have a larger version of the new file to test on?

Yep as I’m now collecting more data from the Netgear Orbi logs and also the Armor system on the web. Going through the entire colleted log file for today there are various data types showing:

Todays log is 126Kb long and is too large to post.