Hi everybody
I have a bit of a task to do here, and i just can’t figure out how to extract certain values from a huge HTML string, that i fetch from a website through CURL.
So here is my basic setup:
I do some curl code to sign in to the website with a PHP Session cookie, and then use that cookie to open the secured site, and perform a search on that site. I can’t show you the site due to business regulations, but you can think of it as a database system, where you can search for a number in the format 1.123.123 and it then shows you search results of assets with that number. If you click on the one asset or on one of the assets (if it finds multiple ones), you can then edit stuff for that asset. Now the problem is that the Number i am searching for is not the actual number that the tool is using to change asset options, it uses a unique “id” for the asset. Quick example:
Search URL: http://www.domain.com/something/search/index/query/1.123.123
shows up search results, and once i click on the asset it loads up the URL to change options:
Asset URL: http://www.domain.com/something/search/update/id/25756
So 25756 is the unique ID number, which i wanna open directly, without having to search for the asset first. So i am in need for a script that signs in to the database through CURL, opens the search URL (1.123.123), fetches the ID (25756) from the search results and then opens a link with that fetched ID (25756)
(25756) is just an example number ofcourse.
I already have the part that searches for me on the website and stuff, and i have the result in HTML, so i need a script that looks through that ton of code, and extracts only that “ID” value!
This is an example of code fetched from the search result site:
"<table cellspacing="0" cellpadding="0"><thead><tr><td style="width: 90px;"><a href="/something/search/index/query/1.123.123/sort-search/%7B%22col%22%3A%22sap-number%22%2C%22dir%22%3A%22asc%22%7D" class="sort-desc" title="Sort ascending">SAP number</a></td><td style="width: 60px;"><a href="/something/search/index/query/1.123.123/sort-search/%7B%22col%22%3A%22week%22%2C%22dir%22%3A%22asc%22%7D" class="sort-desc" title="Sort ascending">Week</a></td><td style="width: 100px;"><a href="/something/search/index/query/1.123.123/sort-search/%7B%22col%22%3A%22category%22%2C%22dir%22%3A%22asc%22%7D" class="sort-desc" title="Sort ascending">Category</a></td><td style="width: 70px;"><a href="/something/search/index/query/1.123.123/sort-search/%7B%22col%22%3A%22hc%22%2C%22dir%22%3A%22asc%22%7D" class="sort-desc" title="Sort ascending">HC</a></td><td style="width: 269px;"><a href="/something/search/index/query/1.123.123/sort-search/%7B%22col%22%3A%22name%22%2C%22dir%22%3A%22asc%22%7D" class="sort-desc" title="Sort ascending">Name</a></td><td style="width: 90px;"><a href="/something/search/index/query/1.123.123/sort-search/%7B%22col%22%3A%22added-on%22%2C%22dir%22%3A%22desc%22%7D" class="sort-active sort-asc" title="Sort descending">Added on</a></td><td style="width: 150px;"><a href="/something/search/index/query/1.123.123/sort-search/%7B%22col%22%3A%22state%22%2C%22dir%22%3A%22asc%22%7D" class="sort-desc" title="Sort ascending">State</a></td><td style="width: 240px;"><div class="header-actions">View: <a href="/something/search/index/query/1.123.123/view-search/list" title="Show list view"><img style="margin-right: 3px;" src="/something/public/images/icons/list.png" alt=""/></a><a href="/something/search/index/query/1.123.123/view-search/detailed" title="Show detailed view"><img style="margin-right: 3px;" src="/something/public/images/icons/detailed-inactive.png" alt=""/></a> Export: <a class="newWindow" href="/something/table/print/tableId/search/query/1.123.123" title="Print complete table"><img style="margin-right: 3px;" src="/something/public/images/icons/print.png" alt=""/></a><a href="/something/table/mail/tableId/search/query/1.123.123" title="Mail me complete table"><img style="margin-right: 3px;" src="/something/public/images/icons/mail.png" alt=""/></a></div>Actions</td></tr></thead><tbody><tr class="tr-odd"><td style="width: 90px;">1.123.123</td><td style="width: 60px;">28/2014</td><td style="width: 100px;">Some Category</td><td style="width: 70px;">59</td><td style="width: 269px;"><a href="/something/search/update/id/25756">Description</a><ul class="images images-icon"><li><a href="#"><img style="margin-right: 3px;" src="/something/public/images/icons/image.png" alt=""/><span class="preview"><img src="/something/image/image/id/25756/name/53a0083ac7103/mode/preview" alt="" /></span></a></li></ul><br style="clear: both" /><p>Description</p><div id="remarks-25756" class="remarks remarks-empty"><a href="javascript:;" title="Click to add">Add remarks</a><div class="loading"><div class="loading-background"></div><div class="loading-activity"></div></div></div></td><td style="width: 90px;">09.05.2014, 11:29</td><td style="width: 150px;"><span class="state state-6"><img style="margin-right: 3px;" src="/something/public/images/icons/state-6-dark.png" alt=""/>Status</span></td><td style="width: 240px;"><ul class="actions"><li><a href="javascript:;" onclick="App.util.confirmForm('deleteJob25756','Really delete job \'Description\'?');"><img style="margin-right: 3px;" src="/something/public/images/icons/delete.png" alt=""/>delete</a><form method="post" id="deleteJob25756" action="/something/search/delete"><div><input type="hidden" name="id" value="25756" /></div></form></li></ul></td></tr></tbody></table>"
You see that the “ID” shows up a few times in the code, so it doesn’t actually matter where i would extract it from. The most interesting part is this one
<a href="/something/search/update/id/25756">Description</a>
because that is pretty much exactly the link i then want to open in Safari (but i don’t need the direct link, if i could just extract the ID (25756 in this case) i can do all sorts of fancy stuff with that asset through applescript.
So the question is, how can i get Applescript or through a “do shell script”, to extract and isolate that value number from the code above? And if there are multiple search results, there will be multiple IDs, and i would need to isolate the last one, the bottom one of the code, as curl/the website sorts the newest one on the bottom.
Thank you for your answers and help!