-
Notifications
You must be signed in to change notification settings - Fork 37
Description
What aspect of the package is your request related to?
join is great but one thing I do very frequently is join on an inequality (range or one-sided). (I could not find this in existing issues, my apologies if it is a repeat.)
dplyr supports join_by(between(val, x1, x2)) for "perfect" conciseness; I believe it is implemented as a list of tilde-expressions (that's what running join_by(x >= a, x <= b) outside of left_join(.) suggests). data.table supports on = .(val >= x1, val <= x2) or on = c("val >= x1", "val <= x2") for more control (e.g., > instead of >=) but less-terse.
If you are not familiar, neither dplyr nor data.table consider any operation within their joining expressions, so join_by(between(x, a-1, a+2)) does not work (nor is it in my request). I suspect without verification that join_by(between(..)) may not be referencing dplyr::between directly.
Describe the solution you'd like
One of the following:
- string-expressions similar to
data.table, such ason = c("val >= x1", "val <= x2") - list of expressions, either captured with
substitute()or using rlang's tilde-expressions, such ason = list(val >= x1, val <= x2)oron = list(~ val >= x1, ~ val <= x2)
Having lists of expressions (either form) would support later expansion into fbetween() and similar. Regardless, if the first implementation of this were to be just strings of expressions, it would be simple enough on future enhancement to run something like if (is.character(on)) on <- convert_to_expressions(on) to support backward compatibility of the first implementation.
Additional context
dplyr has additional "helpers" within join_by() including between(), within() (I dislike this collision), overlaps(), and closest(). I say this not that I'm requesting all of this on the first attempt, but it would be useful to consider your goal-architecture so that your first stable release can be extended at a future time if need be.
Examples
x1 <- data.frame(x=1:10)
x2 <- data.frame(a=c(2,9), b=c(4, 20), z=T)
join(x1, x2, on=c(x="a"))
# left join: x1[x] 2/10 (20%) <1:1st> x2[a] 2/2 (100%)
# x b z
# 1 1 NA NA
# 2 2 4 TRUE
# 3 3 NA NA
# 4 4 NA NA
# 5 5 NA NA
# 6 6 NA NA
# 7 7 NA NA
# 8 8 NA NA
# 9 9 20 TRUE
# 10 10 NA NA
join(x1, x2, on=c("x >= a", "x <= b")) # SUGGESTION
# Error in ckmatch(xon, xnam, "Unknown x columns:") :
# Unknown x columns: x >= a, x <= b
left_join(x1, x2, join_by(x >= a, x <= b)) # dplyr
# x a b z
# 1 1 NA NA NA
# 2 2 2 4 TRUE
# 3 3 2 4 TRUE
# 4 4 2 4 TRUE
# 5 5 NA NA NA
# 6 6 NA NA NA
# 7 7 NA NA NA
# 8 8 NA NA NA
# 9 9 9 20 TRUE
# 10 10 9 20 TRUE