Time and Date

ISO8601

Matches timestamp in the form of yyyy-MM-ddTHH:mm:ssZ

output type:TIMESTAMP
quantifier:none
configuration:none

Example: parsing date time string raw_text with pattern expression ISO8601:parsed_timestamp:

raw_text parsed_timestamp
2019-01-01T13:23:45Z 2019-01-01 13:23:45.000 +0000

HTTPDATE

Matches timestamp in the form of dd/MMM/yyyy:HH:mm:ss Z

output type:TIMESTAMP
quantifier:none
configuration:none

Example: parsing date time string raw_text with pattern expression HTTPDATE:parsed_timestamp:

raw_text parsed_timestamp
26/Dec/2018:02:59:40 +0100 2018-12-26 01:59:40.000 +0000

JSONTIMESTAMP

Matches timestamp in the form of yyyy-MM-ddTHH:mm:ss.SSSZ

output type:TIMESTAMP
quantifier:none
configuration:none

Example: parsing date time string raw_text with pattern expression JSONTIMESTAMP:parsed_timestamp:

raw_text parsed_timestamp
2019-01-01T01:01:01.123PST 2019-01-01 09:01:01.123 +0000

TIMESTAMP, TIME

Allows parsing time and date fields in any format with millisecond precision.

output type:

TIMESTAMP

quantifier:

none

configuration:

format =”” - string specifying the conversion pattern enclosed in single or double-quotes. Default value: yyyy-MM-dd HH:mm:ss

timezone =”” - string specifying timezone name (as defined in IANA Time Zone Database) enclosed in single quotes. Default value: as specified in the user properties

locale =”” - string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here ). Allows parsing locale-specific month and day names. The default locale is English.

charset =”” - characterset name enclosed in single or double quotes (for example charset="ISO-8859-1"). The default charset is UTF8.

Example: Parsing following date-time with day abbreviations in German:

Do, 24 Mai 2018 14:30:34 CET
Fr, 25 Mai 2018 09:01:00 CET

we can use pattern specifying German locale: TIMESTAMP('EEE, d MMM yyyy HH:mm:ss Z', locale='de'):parsed_timestamp EOL:

parsed_timestamp
2018-05-24 13:30:34.000 +0000
2018-05-25 08:01:00.000 +0000

TIMESTAMP_NANO

Allows parsing time and date fields in any format with nanosecond precision.

output type:

TIMESTAMP_NANO

quantifier:

none

configuration:

format =”” - string specifying the conversion pattern enclosed in single or double-quotes. Default value: yyyy-MM-dd HH:mm:ss

timezone =”” - string specifying timezone name (as defined in IANA Time Zone Database) enclosed in single quotes. Default value: as specified in the user properties

locale =”” - string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here ). Allows parsing locale-specific month and day names. The default locale is English.

charset =”” - characterset name enclosed in single or double quotes (for example charset="ISO-8859-1"). The default charset is UTF8.

Example: parsing following timestamps with nanoseconds:

06/17/2019 4:25:17.2349573 CET
06/17/2019 22:8:2.84037271 CET

using pattern TIMESTAMP_NANO('MM/dd/yyyy H:m:s.f Z'):parsed_timestamp results:

parsed_timestamp
2019-06-17 03:25:17.234957300 +0000
2019-06-17 21:08:02.840372710 +0000

Unix Timestamp (Epoch)

epoch pattern result
1576590440 TIMESTAMP('s'):result 2019-12-17 13:47:20.000 +0000
1576590440679 TIMESTAMP('S'):result 2019-12-17 13:47:20.679 +0000
1576590440.679 TIMESTAMP('s.S'):result 2019-12-17 13:47:20.679 +0000
1576590440.678599 TIMESTAMP('s.SSSSSS'):result 2019-12-17 13:47:20.678 +0000
1576590440.678599 TIMESTAMP_NANO('s.SSSSSS'):result 2019-12-17 13:47:20.678000000 +0000
1576590440.678599208 TIMESTAMP_NANO('s.f'):result 2019-12-17 13:47:20.678599208 +0000

Conversion Patterns

Parsing date and time mean correctly assigning value to a timestamp - information describing a point in time. SpectX keeps timestamps similarly to Unix time (or epoch time) values - defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970.

Note

NB! Time value is always associated with geographical location, expressed usually as timezone. Hence at parsing the conversion from original time zone to UTC must happen (or otherwise the resulting time will have incorrect value when converted to UTC).

When timezone is present in the time field then TIMESTAMP, TIMESTAMP_NANO can use it in conversion. In case it is not present you can specify timezone manually.

Letter Date or Time component Presentation Example
G Era marker   case insensitive AD or BC
y Year Year 2012; 96; 0015
Y Week year Year  
M Month in year Month July; Jul; 07, 7
w Week in year Numeric 27
W Week in month Numeric 2
D Day in year Numeric 189
d Day in month Numeric 10
F Day of week in month Numeric 3
E Day name in week Text Tue; Tuesday
u Unnecessary numeric metadata Unnecessary  
a am/pm marker   case insensitive am or pm
H hour in day of zero based 24-hour clock (0 - 23) Numeric 0
k hour in day of one based 24-hour clock (1 - 24) Numeric 24
K hour in day of zero based 12-hour clock (0 - 11) Numeric 3
h hour in day of one based 12-hour clock (1 - 12) Numeric 1
m Minute in hour Numeric 30
s Second in minute Numeric 51
S Milliseconds Milliseconds 2019-01-01 00:00:00.957
f Fractional second Fractional_second 2019-01-01 00:00:00.250338976
z,Z Time zone Timezone GMT+02:00; EET

Time parsing is backed by the Java Calendar class. Depending on user Locale settings the Calendar may be Gregorian or locale-specific. Time and Date pattern behavior may be specific to the Calendar instance.

Pattern letters are usually repeated, as their number determines the exact presentation:

Text:
if the number of pattern letters is 4 or more, the full name of a field is expected by the parser. Otherwise, the abbreviated name is expected. For instance pattern “EE” expects the abbreviated name of the day in a week, such as “Tue”.
Numeric:

digits 0 - 9, leading zeroes and spaces are allowed. Depending on the number of letters in pattern specification, the behavior of parser is as follows:

  • 1 letter pattern is treated as variable length parser accepting any number of digits.
  • 2 - 4 letter patterns are treated as fixed-length parsers accepting only the respective number of digits.
  • 5 or more letter patterns are treated as variable-length patterns accepting any number of digits.
Year:

numeric data is allowed only. If the calendar is Gregorian then:

  • y - matches variable-length years, relative to 20’th century. When the year value is less than 32 then the date is adjusted to 21’st century, otherwise to 20’th century.
  • yy - matches two-digit years, relative to 20’th century. When the year value is less than 32 then the date is adjusted to 21’st century, otherwise to 20’th century.
  • yyy - matches variable-length years. The year is interpreted literally regardless of the number of digits. Therefore using the pattern MM-dd-yyy, a date “01-11-12” parses to Jan 11’th, 12 AD.
  • yyyy - matches four-digit years. The year is interpreted literally.

If the calendar is not Gregorian and the number of pattern letters is 4 or more, a calendar specific long form is used. Otherwise, calendar specific short form is used.

Note

Patterns with 2 and 4 parsing letters (yy and yyyy respectively) are treated as fixed-length parsers. Hence pattern yy will parse successfully only 2 digit long years and fail for any other length.

Patterns with any other length are treated as variable length, which accepts any length of years. For instance pattern y parses successfully both “2” and “1256”. Hence variable-length time units placed consecutively without non-numeric separators in-between, are impossible to parse correctly.

Month:

If the number of pattern letters is 3 or more, the month is interpreted as text, otherwise as numeric:

  • 1 letter pattern is treated as variable length parser, which accepts both one and two-digit months
  • 2 letter pattern is treated as the fixed-length parser, which accepts only two-digit months
  • 3 letter pattern expects abbreviated month names. For instance pattern MMM-dd-yyyy parses “Jan-11-2012” to Jan 11’th, 2012.
  • 4 or more letter pattern expects full month names. For instance pattern MMMM-dd-yyyy parses “January-11-2012” to Jan 11’th, 2012.
Unnecessary:
intended for skipping numeric parts of time and date, which do not contribute to timestamp computation. For example the number of the day in a week. These parts of the timestamp will be parsed as follows, but are ignored in the computation of timestamp value.
Milliseconds:

The number of milliseconds. Accepts numeric values up to 9 digits. The values exceeding 999 are divided by 10, 100, 1000 or 1000000 respectively to the number of digits. The remainder of the division is used as a fractional part representing milliseconds, and the quotient is added to the main timestamp.

The single letter ‘S’ matches variable-length value up to 9 digits. The pattern with up to 9 letters of ‘S’ matches values up to the respective number of digits.

Example: parsing time and date using pattern TIMESTAMP('yyyy-MM-dd HH:mm:ss.S', tz='UTC'):

raw_text parsed_timestamp
2019-01-01 00:00:00.999 2019-01-01 00:00:00.999 +0000
2019-01-01 00:00:00.1000 2019-01-01 00:00:01.000 +0000
2019-01-01 00:00:00.60000 2019-01-01 00:01:00.000 +0000
2019-01-01 00:00:00.3600000 2019-01-01 01:00:00.000 +0000
2019-01-01 00:00:00.86400000 2019-01-02 00:00:00.000 +0000
Fractional_second:

The fraction of a second. Single ‘f’ letter matches numeric values up to 9 digits.

When used with TIMESTAMP, TIME then only up to 3 most significant digits from the value are used.

When used with TIMESTAMP_NANO then all digits are used (effectively representing the number of nanoseconds).

Example: parsing following date-time string raw_text with following patterns:

TIMESTAMP('yyyy-MM-dd HH:mm:ss.f', tz='UTC'):parsed_timestamp and TIMESTAMP_NANO('yyyy-MM-dd HH:mm:ss.f', tz='UTC'):parsed_timestamp_nano

raw_text parsed_timestamp parsed_timestamp_nano
2019-01-01 00:00:00.999 2019-01-01 00:00:00.999 +0000 2019-01-01 00:00:00.999000000 +0000
2019-01-01 00:00:00.1222 2019-01-01 00:00:00.122 +0000 2019-01-01 00:00:00.122200000 +0000
2019-01-01 00:00:00.3335 2019-01-01 00:00:00.333 +0000 2019-01-01 00:00:00.333500000 +0000
2019-01-01 00:00:00.44456789 2019-01-01 00:00:00.444 +0000 2019-01-01 00:00:00.444567890 +0000
Timezone:

parses time zone expressed as timezone full name or abbreviation in English (see https://www.timeanddate.com/time/zones/)

Example. Parsing following date string to UTC timezone:

2019-01-05 13:14:25

we need to use the pattern:

1
TIMESTAMP('yyyy-MM-dd HH:mm:ss', timezone='UTC'):datetime

Results in parsing line 1 into datetime field as follows:

datetime _unmatched
2019-01-05 13:14:25 +0000 NULL

Examples

The result of parsing is displayed here in the UTC timezone.

# time_str pattern result
1 2019 1 23 1:35:47 TIMESTAMP('yyyy M d H:m:s', tz='PST'):result 2019-01-23 09:35:47.000 +0000
2 2019-01-23 01:35:47 TIMESTAMP('yyyy-MM-dd HH:mm:ss'):result 2019-01-23 01:35:47.000 +0000
3 2019 1 23 1:35:47 +0200 TIMESTAMP('yyyy M d H:m:s Z'):result 2019-01-22 23:35:47.000 +0000
4 Wed, Jan 1 2019 1:35:47.236 CET TIMESTAMP('EEE, MMM d yyyy H:m:s.SSS Z'):result 2019-01-01 01:35:47.236 +0000
5 January 16th 2020, 23:56:10.933 TIMESTAMP("MMMM d'th' yyyy, HH:mm:ss.S"):result 2020-01-16 23:56:10.933 +0000
6 1/23/19 1:13:47 PM EST TIMESTAMP('M/d/yy H:m:s a Z'):result 2019-01-23 17:13:47.000 +0000
7 1/23/19 1:13:47 PM EST TIMESTAMP('M/d/yyy H:m:s a Z'):result 0019-01-21 17:13:47.000 +0000
8 1576590440 TIMESTAMP('s'):result 2019-12-17 13:47:20.000 +0000
9 1576590440679 TIMESTAMP('S'):result 2019-12-17 13:47:20.679 +0000
10 1576590440.679 TIMESTAMP('s.S'):result 2019-12-17 13:47:20.679 +0000
11 1576590440.678599 TIMESTAMP('s.SSSSSS'):result 2019-12-17 13:47:20.678 +0000
12 1576590440.678599 TIMESTAMP_NANO('s.SSSSSS'):result 2019-12-17 13:47:20.678000000 +0000
13 1576590440.678599208 TIMESTAMP_NANO('s.f'):result 2019-12-17 13:47:20.678599208 +0000
14 06/17/2019 4:25:17.2349573 CET TIMESTAMP_NANO('MM/dd/yyyy H:m:s.f Z'):result 2019-06-17 03:25:17.234957300 +0000