Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_round and date_ceiling ignores timezone and gives incorrect result at daylight savings boundary #358

Open
jwilliman opened this issue Jul 31, 2023 · 2 comments

Comments

@jwilliman
Copy link

Hello

Thank you for a great package, you've saved me some major headaches in dealing with daylight savings time.

I've come across an instance where date_round and date_ceiling either throw an error or jump an hour when next to a daylight savings boundary, even when the timezone is correctly specified in the POSIXct object. Presumably this is due to the datetimes being coerced to naive format and dropping the timezone? Is there a way around this?

library(clock)
#> Warning: package 'clock' was built under R version 4.2.3

# Times rollback an hour when daylight savings comes off (NZ in Southern hemisphere)
x <- zoned_time_parse_abbrev(
  c("2022-04-03 01:59:59 NZDT", "2022-04-03 02:59:59 NZDT", "2022-04-03 02:59:59 NZST")
  , zone = "Pacific/Auckland") |>
  as.POSIXct() 

# This is the result I'd expect
xts::align.time(x, 60) - x
#> Time differences in secs
#> [1] 1 1 1

# But these are off by an hour
## Same results using date_ceiling
date_round(x, "minute")
#> Error in `as_zoned_time()`:
#> ! Ambiguous time due to daylight saving time at location 1.
#> ℹ Resolve ambiguous time issues by specifying the `ambiguous` argument.
#> Backtrace:
#>      ▆
#>   1. ├─clock::date_round(x, "minute")
#>   2. ├─clock:::date_round.POSIXt(x, "minute")
#>   3. │ └─clock:::date_time_rounder(...)
#>   4. │   ├─base::as.POSIXct(x, zone, nonexistent = nonexistent, ambiguous = ambiguous)
#>   5. │   └─clock:::as.POSIXct.clock_naive_time(...)
#>   6. │     ├─clock::as_zoned_time(...)
#>   7. │     └─clock:::as_zoned_time.clock_naive_time(x, zone = tz, nonexistent = nonexistent, ambiguous = ambiguous)
#>   8. │       └─clock:::as_zoned_sys_time_from_naive_time_with_reference_cpp(...)
#>   9. └─clock (local) `<fn>`(1L, `<env>`)
#>  10.   └─clock:::stop_clock(message, call = call, class = "clock_error_ambiguous_time")
#>  11.     └─rlang::abort(message, ..., call = call, class = c(class, "clock_error"))
date_round(x, "minute", ambiguous = "earliest") - x
#> Time differences in secs
#> [1]    1 3601    1
date_round(x, "minute", ambiguous = "latest") - x
#> Time differences in secs
#> [1] 3601 3601    1
date_round(x, "minute", ambiguous = "NA") - x
#> Time differences in secs
#> [1]   NA 3601    1



# Also struggles where daylight savings starts
y <- zoned_time_parse_abbrev(
  c("2022-09-25 01:59:59 NZST")
  , zone = "Pacific/Auckland") |>
  as.POSIXct() 

xts::align.time(y, 60) - y
#> Time difference of 1 secs
date_round(y, "minute")
#> Error in `as_zoned_time()`:
#> ! Nonexistent time due to daylight saving time at location 1.
#> ℹ Resolve nonexistent time issues by specifying the `nonexistent` argument.
#> Backtrace:
#>      ▆
#>   1. ├─clock::date_round(y, "minute")
#>   2. ├─clock:::date_round.POSIXt(y, "minute")
#>   3. │ └─clock:::date_time_rounder(...)
#>   4. │   ├─base::as.POSIXct(x, zone, nonexistent = nonexistent, ambiguous = ambiguous)
#>   5. │   └─clock:::as.POSIXct.clock_naive_time(...)
#>   6. │     ├─clock::as_zoned_time(...)
#>   7. │     └─clock:::as_zoned_time.clock_naive_time(x, zone = tz, nonexistent = nonexistent, ambiguous = ambiguous)
#>   8. │       └─clock:::as_zoned_sys_time_from_naive_time_with_reference_cpp(...)
#>   9. └─clock (local) `<fn>`(1L, `<env>`)
#>  10.   └─clock:::stop_clock(message, call = call, class = "clock_error_nonexistent_time")
#>  11.     └─rlang::abort(message, ..., call = call, class = c(class, "clock_error"))

# But can overcome using the nonexistent option
date_round(y, "minute", nonexistent = "roll-forward") - y 
#> Time difference of 1 secs

sessionInfo()
#> R version 4.2.2 (2022-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_New Zealand.utf8  LC_CTYPE=English_New Zealand.utf8   
#> [3] LC_MONETARY=English_New Zealand.utf8 LC_NUMERIC=C                        
#> [5] LC_TIME=English_New Zealand.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] clock_0.7.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] lattice_0.20-45 fansi_1.0.4     zoo_1.8-11      utf8_1.2.3     
#>  [5] tzdb_0.4.0      digest_0.6.31   withr_2.5.0     grid_4.2.2     
#>  [9] lifecycle_1.0.3 reprex_2.0.2    evaluate_0.20   pillar_1.8.1   
#> [13] rlang_1.1.1     cli_3.6.1       rstudioapi_0.14 fs_1.6.1       
#> [17] xts_0.13.0      vctrs_0.6.3     rmarkdown_2.20  tools_4.2.2    
#> [21] glue_1.6.2      xfun_0.37       yaml_2.3.7      fastmap_1.1.1  
#> [25] compiler_4.2.2  htmltools_0.5.4 knitr_1.42

Created on 2023-07-31 with reprex v2.0.2

@DavisVaughan
Copy link
Member

I am somewhat confident this is doing the "right" thing, but it just doesn't match what you are trying to do, and I don't think there is a feature that exactly maps to what you want yet.

These are mostly notes for future me, but the reason we convert to "naive time" vs using "sys time" internally has to do with DST offsets that aren't full hours. You can get some pretty crazy results if you use sys time, making me think that definitely isn't the right way to implement this.

library(clock)

x <- as.POSIXct("2023-04-02 00:59:59", tz = "Australia/Lord_Howe")
x <- x + 3600
x <- c(x, x + 3600/2)
x
#> [1] "2023-04-02 01:59:59 +11"   "2023-04-02 01:59:59 +1030"

# Weird 30 minute offset
x |>
  as_sys_time()
#> <sys_time<second>[2]>
#> [1] "2023-04-01T14:59:59" "2023-04-01T15:29:59"

# Madness!
x |>
  as_sys_time() |>
  time_point_round("hour") |>
  as.POSIXct(tz = date_time_zone(x))
#> [1] "2023-04-02 01:30:00 +1030" "2023-04-02 01:30:00 +1030"

# This would basically happen any time you round when its +1030
x <- as.POSIXct("2023-04-05 00:59:59", tz = "Australia/Lord_Howe")

x |>
  as_sys_time() |>
  time_point_round("hour") |>
  as.POSIXct(tz = date_time_zone(x))
#> [1] "2023-04-05 00:30:00 +1030"

# But not when its +11
x <- as.POSIXct("2023-04-01 00:59:59", tz = "Australia/Lord_Howe")

x |>
  as_sys_time() |>
  time_point_round("hour") |>
  as.POSIXct(tz = date_time_zone(x))
#> [1] "2023-04-01 01:00:00 +11"

Created on 2023-07-31 with reprex v2.0.2

@jwilliman
Copy link
Author

Thanks Davis. And I thought my data was complicated - half hour daylight savings agghh!,

Yes, it just didn't make intuitive sense to me that rounding a time to the nearest minute would skip forward an hour over other valid times.

I guess that this may be another example of durations versus periods (to use lubridate language). I suppose the user needs to understand (and specify?) if they want to round the values according to clock time or to the amount of time passed (in multiples of seconds). An example of this is the way ages are rounded down to whole calendar years rather than the number of years of 365.25 days times 86400 seconds.

For my example, where I'm working with data that is collected at set time intervals of 300 seconds, I may be best to keep everything in UTC, and only move to NZ(S|D)T to determine day/night.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants