I am having a bit of a challenge with a file I receive that Ihave to reformat. The file is .csv which I save to .xls. It has some international settings for the date that do not work well with my version of excel (16.75.2). When I see the dates in a column it looks like dd/mm/yy, but only some of the dates are recognized as dates by excel. Those are aligned to the right of the column. Anything with a day larger than the 12th of the month is aligned to the left of the column and are not recognized by excel as dates.
I have tried a number of ways to tell the worksheet that the numbers in the column are dates to no avail. What works manually is using text-to-columns to set the data format as “date: DMY”. When I do that, it actually changes all of the dates in the column to mm/dd/yy. Is there a way to script this action for setting the data format for every cell in the column through text-to-columns?
I have tried:
tell application “Microsoft Excel”
tell active sheet
text to columns range “p1:p200” destination range range “q1:q200”
set data type to date
end tell
end tell
I tried a few similar variations, but I don’t really know how to do this and only got error messages of various types. If I can tell Excel that all the numbers in the column are in fact dates, then to get the numbers in the correct order I can use:
tell application “Microsoft Excel”
tell active sheet
set range2 to range “P2:P200”
tell range2
set number format to “mm/dd/yy”
end tell
end tell
end tell
but this script only works right now on the dates under 12th of the month (the ones excel alread recognizes as dates). I’m sorry this is so long, but I am really stumped and it is a complex issue to try to explain. I just need a way to tell the program that every cell in this column contains a date. Once I get that, I can get the numbers in the order that is most useful.
Thank you.
The dates recognized by excel as dates have yy for the format, the others have yyyy.
Here is the concept I am working with:
tell application “Microsoft Excel”
tell active sheet
text to columns range “p1:p200” destination range range “q1:q200”
set data type to date
end tell
end tell
I know the second one in my first post works fine once the program already recognizes a date, but it does nothing to the numbers not recognized as dates.
First, here is a sample of the text to columns command for an MDY date format:
text to columns range "P2:P200" destination range "Q2" field info {1, 3}
Note that there is only a single range. Some excel commands require the word twice but this is not one of them.
By default, use only the top cell as your destination. Excel will figure out the other extremity based on whatever (probably on how many cells are being processed).
I added the field info parameter which lets you choose a format, including fromseveral date formats. The first item (1) simply specifies which column to apply this format to — in this case, you’re only using one column so it’s an easy pick. The second item (3) is the format, with ‘3’ being MDY date. A ‘4’ here (ie {1,4} will represent a DMY date format. The code can be gleaned from the dictionary entry.
As far as your data is concerned, I would consider pre-processing it by checking to see if it has a 4-digit number and then stripping the first two digits (ie '20) from it. I only suggest this because you mention that excel seems to better handle 2-digit years. FWIW, my skin crawls when I see any date field that isn’t of the ‘2023-08-14’ variety. I still recall getting a receipt with 10/11/12 on it and later wondering what the heck it was for.
Just to confirm… you open the csv in excel and it has columns out to ‘P’ and you then use text to columns to stretch it out to column ‘Q’.
Nifty trick, using text to columns actually does this work for me! Trying to use the regular formatting options doesn’t do it, but specifying the data type does. Your script worked perfectly and I am so happy! I had to use #4 option. Excel recognizes each cell in a column now as a date! The DMY option actually put the dates into mm/dd/yy. There are no more 4-digit years after running the script.
[Edit: For anyone who struggles with dates in Excel, the text-to-column feature specifies what type of data is going into a column. Because the column was already formatted to handle dates in a particular way, as soon as I told Excel that this was DMY data, it appeared as was intended by the formatting which in this case was mm/dd/yy. This is why the feature is under the data heading and not the formatting heading]
I do open a .csv in Excel. My columns actually go out to AH, but the data in Q was unnecessary for my purposes so I was using it as an empty column. I actually changed the script to put the data back into the same column, using P2 as the destination, which is what I do when I use the function by hand. I have a couple of other columns to do this for.
I am actually scripting about a dozen tasks to make this file useful. They are things that I do each time I get a sheet, but I want for other volunteers who don’t know how to use Excel to be able to run the script on this sheet without me and have their data in a useful format. The file always comes in the same format with the same columns.
I really appreciate your help and patience. I am kind of just making this up going by what I remember from programming basic, what I can learn from research online, and what I know about excel.
I do have one more question, if you don’t mind. I had selected row 200 as a maximum (P2:P200), but I picked that arbitrarily. Although I always know the exact columns, I never know how many rows there will be. I know 200 is not enough for some files. Should I extend it out to 1000 or is there a simple way to use a variable as the last row of a particular column? This would have to be done for several different columns in my script.
Regarding the last row thing… only use the top cell of the range. Excel should figure it out after that if you add more rows to the table.
If you feel compelled to do this manually, you could try this:
You can get the row of the last cell of the used range and optionally, its first row index. You could easily use the latter to complete a cell range. NB I always include a 'with timeout when working with the used range as I’ve found that excel sometimes wanders off and doesn’t come back.
with timeout of 5 seconds
cell (get address of (special cells used range type cell type last cell))
--> cell "$U$40" of active sheet of application "Microsoft Excel"
first row index of cell (get address of (special cells used range type cell type last cell))
--> 40
end timeout
Hmmm. Using just the top cell has worked for other things, but does not seem to work for text-to-columns. I need to put in an end row.
How would I do that with the scripting you shared?
I am thinking something like this:
tell application "Microsoft Excel"
tell active sheet
with timeout of 5 seconds
cell (get address of (special cells used range type cell type last cell))
end timeout
set lc to last cell
text to columns range "P2:P" & lc destination range "P2" field info {1, 4}
end tell
end tell
I know it is not quite right, but I need to assign a variable designation to the value for the last row.
The sheets I receive may have fewer than 50 rows or they may have over 2000. I could pick something I know would be well beyond the used range, but I’m not sure if that would slow down the program if I am telling it to process thousands of rows that are not actively used. I am using text-to-columns in several places in a couple of different ways for this script.
I was not designating the top cell correctly. Or rather, I needed to include information that this was a range. Instead of (“P2”), which only acts on that cell or (“P2:P”) which generates an error, I should have used (“P:P”) to designate the whole column.
This method also acts on the header row, which does not matter in this case, but if needed, I also managed to define the variable for the last cell in a column by using “used range” features.