Oi, pessoal! Preciso de ajuda
Estou trabalhando com uma base de dados em que tenho dados diários de 2021, porém preciso analisar apenas os finais de semana. Para não gerar distorções na análise mensal, a regra é que o sábado e o domingo precisam ser computados no mesmo mês. ex: 2021-07-31 (sábado) e 2021-08-01 (domingo) deveriam ambos corresponder a julho.
Minha melhor ideia para lidar com isso foi agrupar as datas por semana e checar o sábado e o domingo estão no mesmo mês. Pra isso, tentei usar lag() e o lead() do dplyr, mas não deu certo da forma que construí…
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.0.5
#> Warning: package 'tibble' was built under R version 4.0.5
#> Warning: package 'tidyr' was built under R version 4.0.5
#> Warning: package 'dplyr' was built under R version 4.0.5
#> Warning: package 'forcats' was built under R version 4.0.4
weekends <- tibble::tribble(
~dt_trans, ~week,
"2021-01-02", 1L,
"2021-01-03", 1L,
"2021-01-09", 2L,
"2021-01-10", 2L,
"2021-01-16", 3L,
"2021-01-17", 3L,
"2021-01-23", 4L,
"2021-01-24", 4L,
"2021-01-30", 5L,
"2021-01-31", 5L,
"2021-02-06", 6L,
"2021-02-07", 6L,
"2021-02-13", 7L,
"2021-02-14", 7L,
"2021-02-20", 8L,
"2021-02-21", 8L,
"2021-02-27", 9L,
"2021-02-28", 9L,
"2021-03-06", 10L,
"2021-03-07", 10L,
"2021-03-13", 11L,
"2021-03-14", 11L,
"2021-03-20", 12L,
"2021-03-21", 12L,
"2021-03-27", 13L,
"2021-03-28", 13L,
"2021-04-03", 14L,
"2021-04-04", 14L,
"2021-04-10", 15L,
"2021-04-11", 15L,
"2021-04-17", 16L,
"2021-04-18", 16L,
"2021-04-24", 17L,
"2021-04-25", 17L,
"2021-05-01", 18L,
"2021-05-02", 18L,
"2021-05-08", 19L,
"2021-05-09", 19L,
"2021-05-15", 20L,
"2021-05-16", 20L,
"2021-05-22", 21L,
"2021-05-23", 21L,
"2021-05-29", 22L,
"2021-05-30", 22L,
"2021-06-05", 23L,
"2021-06-06", 23L,
"2021-06-12", 24L,
"2021-06-13", 24L,
"2021-06-19", 25L,
"2021-06-20", 25L,
"2021-06-26", 26L,
"2021-06-27", 26L,
"2021-07-03", 27L,
"2021-07-04", 27L,
"2021-07-10", 28L,
"2021-07-11", 28L,
"2021-07-17", 29L,
"2021-07-18", 29L,
"2021-07-24", 30L,
"2021-07-25", 30L,
"2021-07-31", 31L,
"2021-08-01", 31L,
"2021-08-07", 32L,
"2021-08-08", 32L,
"2021-08-14", 33L,
"2021-08-15", 33L,
"2021-08-21", 34L,
"2021-08-22", 34L,
"2021-08-28", 35L,
"2021-08-29", 35L,
"2021-09-04", 36L,
"2021-09-05", 36L,
"2021-09-11", 37L,
"2021-09-12", 37L,
"2021-09-18", 38L,
"2021-09-19", 38L,
"2021-09-25", 39L,
"2021-09-26", 39L,
"2021-10-02", 40L,
"2021-10-03", 40L,
"2021-10-09", 41L,
"2021-10-10", 41L,
"2021-10-16", 42L,
"2021-10-17", 42L,
"2021-10-23", 43L,
"2021-10-24", 43L
)
weekends %>%
arrange(dt_trans) %>%
group_by(week)
#> # A tibble: 86 x 2
#> # Groups: week [43]
#> dt_trans week
#> <chr> <int>
#> 1 2021-01-02 1
#> 2 2021-01-03 1
#> 3 2021-01-09 2
#> 4 2021-01-10 2
#> 5 2021-01-16 3
#> 6 2021-01-17 3
#> 7 2021-01-23 4
#> 8 2021-01-24 4
#> 9 2021-01-30 5
#> 10 2021-01-31 5
#> # ... with 76 more rows
Created on 2021-10-28 by the reprex package (v2.0.0)
A ideia é, depois do group_by()
, fazer um mutate()
mais ou menos assim:
mutate(mes = if_else(month(dt_trans) != lead(month(dt_trans), month(dt_trans) - 1L, month(dt_trans))))
, mas não consegui acertar a forma de construir ele. Alguma solução?