File Name Extract Csv File Create

Hi,

I have a piece of script that takes the file name from each file in a given folder and then creates a CSV file with the list of file names. I need to modify this script to only extract the numbers from the file name and not the text.

Example

current file name convention in folder : sz_123456_a
(the text is not always the same)

extract only : 123456 and create a csv file list.

here is the script I am using at the moment is there a way I can modify this to just extract the numbers?

do shell script “find ~/Desktop/images/ \( -iname ‘.pdf’ -or -iname '.jpg’ \) -exec basename {} \; | sed -e ‘s/.pdf$//’ -e ‘s/.jpg$//’ -e ‘s/^.*$/"&"/’ > $HOME/Desktop/file_names_list.csv”

any help would be greatly appreciated.

Hello.

This should do what you intended to do.

do shell script "find ~/Desktop/images/ \\( -iname '*.pdf' -or -iname '*.jpg' \\) -exec basename {} \\; | sed  -n  's/\\([^0-9]*\\)\\([0-9][0-9]*\\)\\([^0-9]*\\)/\"\\2\"/p' > $HOME/Desktop/file_names_list.csv"

That is, if what you wanted to do was like this. :slight_smile:

Hi.

This gets rid of all the ‘basename’ calls (which should speed things up) and simplifies the ‘sed’. The same assumptions are made regarding the file-naming convention and the quoting of the output.

do shell script "find ~/Desktop/images/ \\( -iname '*.pdf' -or -iname '*.jpg' \\) | sed -E 's|.+/([^/]+)$|\\1| ; s|[^0-9]+|\"|g'> $HOME/Desktop/file_names_list.csv"

Thank you very much for the responses both work and do exactly what I needed, huge help

Hello.

You may wish to peruse the man page re_format, for an overview of regular expressions, as they are implemented with sed, awk, grep, etc. :slight_smile:

Hello.

I thought Nigel’s clever sed regexp deserved something faster than the Find command. This won’t work properly, if your Spotlight database is broken, though you’ll quickly see that the database is broken by spotting (pun intended) the gear icon when searching for say an app, on the Spotlight menu.)

do shell script "mdfind -onlyin ~//Desktop/images 'kMDItemFSName = \"*pdf\" ||kMDItemFSName = \"*jpg\" ' | sed -E 's|.+/([^/]+)$|\\1| ; s|[^0-9]+|\"|g'> $HOME/Desktop/file_names_list.csv"

Thanks McUsrII appreciate the help,

Im changing the goal posts slightly here but would it be possible to also add the date and time that the file enters the image folder on the CSV file also,

example

123456 / 16-01-2015 / 17:12

therefore on the CSV file

column 1 = file name number extracted
column 2 = date that file entered the images folder
column 3 = time that file entered the images folder

I hope this makes sense and thanks again for your help

Well, there are some troubles with doing just that; the problem is that if the file is either downloaded, or created elsewhere on your disk, then attributes like creation date, modification date,and access date, are already set.

When we peruse through the folders, then we stat the file, so if I’m not totally wrong, then the access attribute of the file will be updated.

The short story is, that in order for you to know when the file entered the folder, then you’d either have to create the file in the folder system, or you’d have a special way of copying the file, which for instance updated the creation time/date to the point in time where you copied the file into the filesystem for your images.

I suggest you’d use the modification time / date to set when it was added to the folder, thereafter, you’d take the stance that each time you have modified a file, it is added to the folder, which will be after the time you physically added it to the folder anyway.

I wonder if you have any thoughts about this, so far.

Edit

The accestime (atime) of a file only changes when the contents of a file is read.

The change time (ctime) of a file only changes when either the contents is changed, or the attributes of the file is changed.

The modification time (mtime) of a file only changes when the contents of a file is modified.

Nothing of this changes my proposal of changing the modification time of the file, when it was added to the folder hiearchy really, and then later on using that modification time to symbolize when it was added to the folder, also when the contents of the file has been indeed modified. :slight_smile:

Hi McUsrII,

Appreciate the insight I didn’t consider all the variables associated with the time stamp.

I’ve had a look at how the files are created and it looks as though the creation date will suffice, the files are actually created in that folder system therefore that is the data I would require.

Would it be possible to add the creation date and time to the previous script? again thanks for all your help

do shell script “mdfind -onlyin ~//Desktop/images 'kMDItemFSName = "*pdf" ||kMDItemFSName = "*jpg" ’ | sed -E ‘s|.+/([^/]+)$|\1| ; s|[^0-9]+|"|g’> $HOME/Desktop/file_names_list.csv”

Hello.

I’ll see what I can do about it.

Hello.

Hopefully this works for you.

I tested it on those files:

with this command in a terminal window.

It produced this output.

Here is the “raw” output (without the folders):

Here is the script:

do shell script "mdfind -onlyin ~//Desktop/images 'kMDItemFSName = \"*pdf\" ||kMDItemFSName = \"*jpg\" ' |xargs -0 -I {} stat --format=\"%n %w\" {} |sed -En 's/(.+[/][^0-9]+)([0-9]+)([^ ]+ )([^ ]+)( )([^.]+)([.].+)/\"\\2\",\"\\4\",\"\\6\"/p'> $HOME/Desktop/file_names_list.csv"

Hi,

thanks for the update I’ve not had any luck with that script unfortunately it doesn’t create the CSV file, the output you described sounds ideal if it could be placed into a CSV file.

Not sure if its to do with my file structure, which is

desktop/images/sz_123456_a.jpg

I also tried the command in terminal with the correct path but no luck either

:frowning:

Hello.

I’m so sorry, I forgot to redirect the output of the filter, into the CSV file.

I Edited my post above, and it should work now. :frowning:

Hi,

Ive tried running the script and its creating a CSV file but the file is blank :frowning:

Hello.

Hello I believe I found the culprit, I’ll be back at you with a new version, once I have corrected the format for the stat command that you have.

Edit

And eat dinner. :slight_smile:

Hello.

This should hopefully work for you:

do shell script "mdfind -0 -onlyin ~//Desktop/images 'kMDItemFSName = \"*pdf\" ||kMDItemFSName = \"*jpg\" ' |xargs -0 -I {} /usr/bin/stat -t \"%Y-%m-%d %T\" -f \"%N %Sc\" {} |sed -En 's/(.+[/][^0-9]+)([0-9]+)([^ ]+ )([^ ]+)( )(.+)/\"\\2\",\"\\4\",\"\\6\"/p'> $HOME/Desktop/file_names_list.csv"

Here is the commandline I have used (with adjusted paths., should you wish to try it for yourself:

This is the same, but you should get the output directly to the Terminal screen.

Edit

I realized I had omitted a commandline switch for mdfind.

Hello McUsrII,

Great bit of script thank you very much hugely appreciated :smiley:

Hi McUsrII,

I’ve just noticed that the date time information it is pulling back is incorrect :frowning: it seems to not be bringing back the specific date time of the individual files within the folder.

For example I’ve got a jpg file in the folder that has a creation date of 19-01-2015 and time 12:51 but the information present in the CSV file is 20-01-2015 with the time 9:01 :confused:

are you able to help again?

Hello.

I didn’t notice that you had written back on this thread before now. Hopefully, you have learned to use the modification date instead of the creation date.

Because the creation date, is rather hard to use I figured. Actually, the creation date isn’t part of the unix file system, but is something special for OS X. The trouble is, is that I have to dig it out of the Spotlight metadata, and when I get it from there, then it may be ‘encoded’: that is, if the file is from an external source,then the there is no creation date, at least not until you modify it. This means that I not only have to filter the output, but I also have to read the contents of it, and I am not sure if I have the correct ‘not created here’ timestamp for an imported file either. ->seems like: it is ‘4001-01-01 00:00:00 +0000’ on my system. ( I used the command

.)

It is not only cumbersome, but it will also be a bit slower.

Anyways, I’d like to hear from you before I start writing it. :slight_smile:

Hi,

No problem at all, I figured a new way to structure the folders and modified the script slightly to view these folders and the data it creates works a charm for my requirements.

my coding/scripting isn’t the strongest so I appreciate all the help and advice you have given,

Thanks.