Kitz Forum

Chat => Tech Chat => Topic started by: renluop on October 09, 2018, 11:06:29 AM

Title: Dates change themselves to weird number when exported to CSV
Post by: renluop on October 09, 2018, 11:06:29 AM
I was connected to a site (not Companies House) that supplies financial data of companies. The accounts' dates are such as 30 Jul 2017 and the data can be exported in CSV.
When, however, the CSV file is opened the date appears as a large number, in this case 1501372800. 30/07/2017's value in Excel is 42946, and doesn't seem to have any obvious relationship to 1501372800.

Knowing the quality of brains here, I'm wondering if any one has any idea why 30 Jul 2017 becomes such a weird number in CSV, and secondly how to convert that to a recognisable date.
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: broadstairs on October 09, 2018, 11:14:33 AM
My only thought - what is the property value of the cell? I get funny values showing if not defined as a date value.

Stuart
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: watcher on October 09, 2018, 11:22:33 AM
The 10 digit number would suggest that this is a Unix date format based on the number of seconds that have elapsed since January 1 1970 (yes, really!).

The solution is to import the CSV file into Excel, add an additional column and apply the following formula to each of these numbers (this is with 1501372800 in A1) :

=FLOOR(A1/60/60/24,1) + DATE(1970,1,1)

and the result in Excel is 30/07/2017
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: renluop on October 09, 2018, 03:57:55 PM
Wow! Thanks, watcher! Why on earth did the site use something that would produce such an oddity in Excel has me confused.
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: Westie on October 09, 2018, 05:21:26 PM
Someone else asked that at https://stackoverflow.com/questions/2533563/why-are-dates-calculated-from-january-1st-1970 (https://stackoverflow.com/questions/2533563/why-are-dates-calculated-from-january-1st-1970)

This was the answer:

Quote
Unix time, or POSIX time, is a system for describing points in time, defined as the number of seconds elapsed since midnight proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting leap seconds.
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: watcher on October 09, 2018, 09:15:45 PM
The advantage of storing the date in the 10 digit Unix format is that it provides a consistent base from which a date formatted to suit a particular locale can be calculated. Storing dates in a UK format could lead to issues. For example today's date, 9th October, is 09/10/2018 in the UK; in the USA this would read as September 10 2018. The Unix number also allows dates to be sorted, and calculations made on dates without the complication of dealing with date formats.
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: jelv on October 09, 2018, 09:36:39 PM
How many people downloading this data will be using Excel and how many will be using Unix?

Excel has a perfectly good way of storing dates as numbers of days since January 1, 1900 (with option of time as fraction of a day). This meets the requirement of allowing dates to be formatted to a particular locale.

Unix starting at January 1, 1970 is bonkers. How are dates of birth before that date stored?
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: DaveC on October 09, 2018, 09:47:13 PM
This is clearly just a bug in the data extract routine on the website - the Unix timestamp is just an internal format, and shouldn't be output that way in general-purpose CSV files.  Similarly, I wouldn't expect CSV files to contain the internal Excel representation.

Most compatible would be the YYYY-MM-DD format.

So if I was the original poster, I would contact the website and let them know about it.  I very much doubt it was a deliberate choice.
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: DaveC on October 09, 2018, 09:54:29 PM
Unix starting at January 1, 1970 is bonkers. How are dates of birth before that date stored?

Negative numbers  ;D  IIUC, in modern Unixes, time is a 64-bit signed integer, so can store dates/times waaaay into the past and future.
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: jelv on October 10, 2018, 12:15:44 AM
This is clearly just a bug in the data extract routine on the website - the Unix timestamp is just an internal format, and shouldn't be output that way in general-purpose CSV files.  Similarly, I wouldn't expect CSV files to contain the internal Excel representation.

Most compatible would be the YYYY-MM-DD format.

Totally agree. Given that the extract relates to financial data of companies, a standard used mainly by computer geeks is probably the absolute worst option followed by a proprietary format (Excel); ISO 8601 format would be totally unambiguous.

I really do hope it was an oversight and not a deliberate choice!
Title: Re: Dates change themselves to weird number when exported to CSV
Post by: displaced on November 16, 2018, 07:49:45 AM
There’s been a surge in visibility of epoch-time outside the Beardy Unix Admin world recently.

That’s down to the ubiquity of the JSON data format commonly used by APIs.

JSON didn’t specify a date format, and since so much web/API dev happens on unix-y systems, the epoch time got used, and is now a de facto standard for JSON.