Como lidar com finais de semana em meses diferentes

Oi, pessoal! Preciso de ajuda :blush:

Estou trabalhando com uma base de dados em que tenho dados diários de 2021, porém preciso analisar apenas os finais de semana. Para não gerar distorções na análise mensal, a regra é que o sábado e o domingo precisam ser computados no mesmo mês. ex: 2021-07-31 (sábado) e 2021-08-01 (domingo) deveriam ambos corresponder a julho.

Minha melhor ideia para lidar com isso foi agrupar as datas por semana e checar o sábado e o domingo estão no mesmo mês. Pra isso, tentei usar lag() e o lead() do dplyr, mas não deu certo da forma que construí…

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.0.5
#> Warning: package 'tibble' was built under R version 4.0.5
#> Warning: package 'tidyr' was built under R version 4.0.5
#> Warning: package 'dplyr' was built under R version 4.0.5
#> Warning: package 'forcats' was built under R version 4.0.4

weekends <- tibble::tribble(
  ~dt_trans, ~week,
  "2021-01-02",    1L,
  "2021-01-03",    1L,
  "2021-01-09",    2L,
  "2021-01-10",    2L,
  "2021-01-16",    3L,
  "2021-01-17",    3L,
  "2021-01-23",    4L,
  "2021-01-24",    4L,
  "2021-01-30",    5L,
  "2021-01-31",    5L,
  "2021-02-06",    6L,
  "2021-02-07",    6L,
  "2021-02-13",    7L,
  "2021-02-14",    7L,
  "2021-02-20",    8L,
  "2021-02-21",    8L,
  "2021-02-27",    9L,
  "2021-02-28",    9L,
  "2021-03-06",   10L,
  "2021-03-07",   10L,
  "2021-03-13",   11L,
  "2021-03-14",   11L,
  "2021-03-20",   12L,
  "2021-03-21",   12L,
  "2021-03-27",   13L,
  "2021-03-28",   13L,
  "2021-04-03",   14L,
  "2021-04-04",   14L,
  "2021-04-10",   15L,
  "2021-04-11",   15L,
  "2021-04-17",   16L,
  "2021-04-18",   16L,
  "2021-04-24",   17L,
  "2021-04-25",   17L,
  "2021-05-01",   18L,
  "2021-05-02",   18L,
  "2021-05-08",   19L,
  "2021-05-09",   19L,
  "2021-05-15",   20L,
  "2021-05-16",   20L,
  "2021-05-22",   21L,
  "2021-05-23",   21L,
  "2021-05-29",   22L,
  "2021-05-30",   22L,
  "2021-06-05",   23L,
  "2021-06-06",   23L,
  "2021-06-12",   24L,
  "2021-06-13",   24L,
  "2021-06-19",   25L,
  "2021-06-20",   25L,
  "2021-06-26",   26L,
  "2021-06-27",   26L,
  "2021-07-03",   27L,
  "2021-07-04",   27L,
  "2021-07-10",   28L,
  "2021-07-11",   28L,
  "2021-07-17",   29L,
  "2021-07-18",   29L,
  "2021-07-24",   30L,
  "2021-07-25",   30L,
  "2021-07-31",   31L,
  "2021-08-01",   31L,
  "2021-08-07",   32L,
  "2021-08-08",   32L,
  "2021-08-14",   33L,
  "2021-08-15",   33L,
  "2021-08-21",   34L,
  "2021-08-22",   34L,
  "2021-08-28",   35L,
  "2021-08-29",   35L,
  "2021-09-04",   36L,
  "2021-09-05",   36L,
  "2021-09-11",   37L,
  "2021-09-12",   37L,
  "2021-09-18",   38L,
  "2021-09-19",   38L,
  "2021-09-25",   39L,
  "2021-09-26",   39L,
  "2021-10-02",   40L,
  "2021-10-03",   40L,
  "2021-10-09",   41L,
  "2021-10-10",   41L,
  "2021-10-16",   42L,
  "2021-10-17",   42L,
  "2021-10-23",   43L,
  "2021-10-24",   43L
)

weekends %>%
  arrange(dt_trans) %>%
  group_by(week)
#> # A tibble: 86 x 2
#> # Groups:   week [43]
#>    dt_trans    week
#>    <chr>      <int>
#>  1 2021-01-02     1
#>  2 2021-01-03     1
#>  3 2021-01-09     2
#>  4 2021-01-10     2
#>  5 2021-01-16     3
#>  6 2021-01-17     3
#>  7 2021-01-23     4
#>  8 2021-01-24     4
#>  9 2021-01-30     5
#> 10 2021-01-31     5
#> # ... with 76 more rows

Created on 2021-10-28 by the reprex package (v2.0.0)

A ideia é, depois do group_by(), fazer um mutate() mais ou menos assim:
mutate(mes = if_else(month(dt_trans) != lead(month(dt_trans), month(dt_trans) - 1L, month(dt_trans)))), mas não consegui acertar a forma de construir ele. Alguma solução?

1 Curtida

Brunna,

Acho que a sua ideia é até mais complexa do que necessário. Na minha solução eu simplesmente forcei os meses dos domingos a serem iguais aos meses dos sábados adjacentes (usando group_by() como você mesma sugeriu). Veja se o código a seguir atende ao seu requisito:

library(magrittr)

# Semana começa na segunda-feira
options("lubridate.week.start" = 1L)

# Todos os dias de 2021
datas <- seq(lubridate::date("2021-01-01"), lubridate::today(), 1)

# Calcular o mês sem distorções
df <- datas %>%
  tibble::tibble(dt_trans = .) %>%
  dplyr::mutate(
    wday = lubridate::wday(dt_trans),
    week = lubridate::week(dt_trans),
    mes1 = lubridate::month(dt_trans)
  ) %>%
  dplyr::filter(wday %in% c(6, 7)) %>%
  dplyr::group_by(week) %>%
  dplyr::mutate(mes2 = rep(mes1[1], 2)) %>%
  dplyr::ungroup()

# Verificar se `mes2` atende ao requisito
dplyr::filter(df, week == 31)
#> # A tibble: 2 × 5
#>   dt_trans    wday  week  mes1  mes2
#>   <date>     <dbl> <dbl> <dbl> <dbl>
#> 1 2021-07-31     6    31     7     7
#> 2 2021-08-01     7    31     8     7

Created on 2021-10-28 by the reprex package (v2.0.1)

2 Curtidas

Caio, funcionou certinho. Muito obrigada!!
Sou conhecida entre meus colegas por complicar demais o que tem solução simples, preciso melhorar isso hehe

1 Curtida