[WIP] Interval bounds field#217
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v2-DEV #217 +/- ##
===========================================
- Coverage 84.31% 49.77% -34.55%
===========================================
Files 12 12
Lines 848 870 +22
===========================================
- Hits 715 433 -282
- Misses 133 437 +304 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
We originally used something very similar before Intervals 1.3 but decided in #102 to switch over to encoding the bounds as parametric type information to reduce memory (the main idea being that a vector of intervals with the same bounds do not need to encode this information for each instance). I think probably the right approach is to introduce the changes you added here as new interval types (we can possibly rename the original ones too) so end-users can switch between these different types based upon their particular use case to maximize performance. |
omus
left a comment
There was a problem hiding this comment.
Couple of initial thoughts
| bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | ||
| bounds_int(l::Type{Closed}, r::Type{Open}) = 0x01 | ||
| bounds_int(l::Type{Open}, r::Type{Closed}) = 0x02 | ||
| bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x03 | ||
|
|
||
| bounds_types(x::Integer) = bounds_types(Val(UInt8(x))) |
There was a problem hiding this comment.
Missed opportunities to use bit masks:
| bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | |
| bounds_int(l::Type{Closed}, r::Type{Open}) = 0x01 | |
| bounds_int(l::Type{Open}, r::Type{Closed}) = 0x02 | |
| bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x03 | |
| bounds_types(x::Integer) = bounds_types(Val(UInt8(x))) | |
| bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | 0x00 | |
| bounds_int(l::Type{Closed}, r::Type{Open}) = 0x10 | 0x00 | |
| bounds_int(l::Type{Open}, r::Type{Closed}) = 0x00 | 0x01 | |
| bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x10 | 0x01 | |
| function bounds_types(x::UInt8) | |
| L = if x & 0x20 == 0x20 | |
| Unbounded | |
| elseif x & 0x10 == 0x10 | |
| Closed | |
| else | |
| Open | |
| end | |
| R = if x & 0x02 == 0x02 | |
| Unbounded | |
| elseif x & 0x01 == 0x01 | |
| Closed | |
| else | |
| Open | |
| end | |
| return (L, R) | |
| end |
Can make this logic even cleaner with something like: https://github.com/JuliaTime/TimeZones.jl/blob/master/src/class.jl
There was a problem hiding this comment.
Actually my initial pass did this, but I concluded that the value dispatch was more declarative / readable and I didn't notice a performance difference either way... though maybe there's a specific case you're thinking of?
There was a problem hiding this comment.
Just from a reasoning standpoint it seems sensible to denote each of the 8-bits for a specific purpose. Your values from 0x00 - 0x03 follow that logic but the unbounded values do not. This makes it difficult to reason into what each bit does and use bitwise operators to determine information about bounds.
I'm fine with using Val dispatch if there isn't a performance issue.
| abstract type AbstractInterval{T, L <: Bound, R <: Bound} end | ||
| # Methods for convert between int and tuple representations for space efficiency | ||
| # (Open, Closed) is 16 bytes while the integer represenation is only 1 byte. | ||
| # TODO: Convert types to symbols when we switch to extending IntervalSets.jl |
There was a problem hiding this comment.
I've been catching up with the motivation behind these changes. Is there rational why using symbols is a better approach than using types? Should IntervalSets.jl be updated to use types?
There was a problem hiding this comment.
I don't feel like it matters either way. My preference is more focused on making things consistent between the two, so Intervals.jl can extend IntervalSets.jl. As it stands, there doesn't seem to be much advantage to having them be types. We don't leverage the type hierarchy much and we don't store any information in them, so it's just extra names/types to remember. The practical benefit of switching to symbols is that it means we may be able to have Intervals.jl extend IntervalSets.jl with only one breaking release in Intervals.jl. Going the other way around would require a breaking release to IntervalSets.jl and a lot more updates in the ecosystem.
There was a problem hiding this comment.
I think the counter argument to using symbols is that by using types you can use the standard type inspection functions to determine what the names are used by this package. Dispatch on these types isn't used commonly outside of this package but I do think it's extremely useful for the internal logic in endpoint.jl.
| bounds_types(::Val{0x03}) = (Closed, Closed) | ||
|
|
||
| # Extension for backwards support of Unbounded endpoints to avoid changing existing logic | ||
| # TODO: Drop these in favour of the approach in IntervalSets.jl |
There was a problem hiding this comment.
Seems like an approach has yet to be worked out: JuliaMath/IntervalSets.jl#123
There was a problem hiding this comment.
I guess the question is whether it should be the responsibility of either IntervalSets.jl or Intervals.jl to handle this with yet another custom type. AFAICT, bounded-ness is just a question of whether an endpoint value isfinite which we already have an API for. My preference would be for that API to be handle independently of intervals like in Infinity.jl or Infinities.jl.
There was a problem hiding this comment.
I do agree that using isfinite on endpoints does seem like a better interface. From what I recall from the Infinity.jl work the parts where this breaks down is that:
- Defining
Infinite{T}seems like a great idea for making any type into something that supports infinity but you lose the proper type hierarchy which makes this break down in multiple dispatch. Ideally we'd want to defineInfinite{T} <: supertype(T)to make this work. - With 1 being infeasible we're left with making specific types to properly define supertypes. This results in not all types having an alternative infinite type.
This is where it started to make more sense to have the intervals to support infinity themselves. For example Interval{Char,Closed,Unbounded}.
An alternative strategy could be something like:
struct Interval{T}
first::Union{T,NegInf}
last::Union{T,PosInf}
endbut I doubt that would be an improvement over the v1 setup.
| struct Interval{T} <: AbstractInterval{T} | ||
| bounds::UInt8 # bounds comes first to allow incomplete construction for unbounded intervals |
There was a problem hiding this comment.
Just curious if the intention is to have Intervals.jl use the IntervalSets.jl interface isn't this moving in the wrong direction? Or is the plan to update IntervalSets.jl in some way?
There was a problem hiding this comment.
Yes, the plan is to:
- Subtype IntervalSets.AbstractInterval
- Provide conversion and promotion rules for different interval types
- Provide documentation on when to use each interval type and why
- More clearly identify what should live in each package (ie: core API in IntervalSets.jl and collections of intervals in Intervals.jl)
In the end, most packages can just depend on a common API in IntervalSets.jl and just use duck-typing.
Yes, IntervalSets.jl may require some updates, but I'd like to narrow down what those should actually be. AFAIK, we didn't end up using a lot of the flexibility included with this package... maybe Beacon has used more of it? Can you clarify why you mean by "moving in the wrong direction"?
The argument for reverting back to storing bounds in a field rather than parameters is to differentiate the Intervals.Interval from the IntervalSets.Interval types. If you're just working with individual intervals or all intervals have the same bounds then use IntervalSets.Interval. If you're working with large collections of Intervals where you might not have consistent bounds then use the Interval.jl type. It's the same reasoning as using AnchoredInterval if you know that all intervals have the same span.
Again, the goal is to have the two main packages complement rather than compete with each other.
There was a problem hiding this comment.
Can you clarify why you mean by "moving in the wrong direction"?
What I meant by this is that IntervalSets.jl encodes the bound type information as type information which was closer to what was done before this PR than with the bit mask. Now that I understand your plan more with supporting both types this makes more sense.
Yes, IntervalSets.jl may require some updates, but I'd like to narrow down what those should actually be. AFAIK, we didn't end up using a lot of the flexibility included with this package.
I think one of those changes is to have IntervalSets.jl to move from Interval{L,R,T} to Interval{T,L,R}. If we're going to have multiple interval types the element type is going to come first for other implementations like you have here.
The argument for reverting back to storing bounds in a field rather than parameters is to differentiate the Intervals.Interval from the IntervalSets.Interval types. If you're just working with individual intervals or all intervals have the same bounds then use IntervalSets.Interval. If you're working with large collections of Intervals where you might not have consistent bounds then use the Interval.jl type. It's the same reasoning as using AnchoredInterval if you know that all intervals have the same span.
I can get behind having both types of concrete interval exist for different use cases. Where I'm less sure is how these packages should be combined (like you mentioned)
rofinn
left a comment
There was a problem hiding this comment.
Alright, I've tried to provide a bit more context behind my motivation for the changes. In general the goal is to reduce maintenance requirements and complement IntervalSets.jl rather than compete with it which means:
- I don't really care about types vs symbols.
- I'm open to renaming things if folks want.
- Unifying the APIs should ideally require the fewest breaking changes for the ecosystem
- There should be a reason to use one type vs another. It feels like
IntervalSets.IntervalandIntervals.Intervalare competing with each other.
| bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | ||
| bounds_int(l::Type{Closed}, r::Type{Open}) = 0x01 | ||
| bounds_int(l::Type{Open}, r::Type{Closed}) = 0x02 | ||
| bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x03 | ||
|
|
||
| bounds_types(x::Integer) = bounds_types(Val(UInt8(x))) |
There was a problem hiding this comment.
Actually my initial pass did this, but I concluded that the value dispatch was more declarative / readable and I didn't notice a performance difference either way... though maybe there's a specific case you're thinking of?
| abstract type AbstractInterval{T, L <: Bound, R <: Bound} end | ||
| # Methods for convert between int and tuple representations for space efficiency | ||
| # (Open, Closed) is 16 bytes while the integer represenation is only 1 byte. | ||
| # TODO: Convert types to symbols when we switch to extending IntervalSets.jl |
There was a problem hiding this comment.
I don't feel like it matters either way. My preference is more focused on making things consistent between the two, so Intervals.jl can extend IntervalSets.jl. As it stands, there doesn't seem to be much advantage to having them be types. We don't leverage the type hierarchy much and we don't store any information in them, so it's just extra names/types to remember. The practical benefit of switching to symbols is that it means we may be able to have Intervals.jl extend IntervalSets.jl with only one breaking release in Intervals.jl. Going the other way around would require a breaking release to IntervalSets.jl and a lot more updates in the ecosystem.
| bounds_types(::Val{0x03}) = (Closed, Closed) | ||
|
|
||
| # Extension for backwards support of Unbounded endpoints to avoid changing existing logic | ||
| # TODO: Drop these in favour of the approach in IntervalSets.jl |
There was a problem hiding this comment.
I guess the question is whether it should be the responsibility of either IntervalSets.jl or Intervals.jl to handle this with yet another custom type. AFAICT, bounded-ness is just a question of whether an endpoint value isfinite which we already have an API for. My preference would be for that API to be handle independently of intervals like in Infinity.jl or Infinities.jl.
| struct Interval{T} <: AbstractInterval{T} | ||
| bounds::UInt8 # bounds comes first to allow incomplete construction for unbounded intervals |
There was a problem hiding this comment.
Yes, the plan is to:
- Subtype IntervalSets.AbstractInterval
- Provide conversion and promotion rules for different interval types
- Provide documentation on when to use each interval type and why
- More clearly identify what should live in each package (ie: core API in IntervalSets.jl and collections of intervals in Intervals.jl)
In the end, most packages can just depend on a common API in IntervalSets.jl and just use duck-typing.
Yes, IntervalSets.jl may require some updates, but I'd like to narrow down what those should actually be. AFAIK, we didn't end up using a lot of the flexibility included with this package... maybe Beacon has used more of it? Can you clarify why you mean by "moving in the wrong direction"?
The argument for reverting back to storing bounds in a field rather than parameters is to differentiate the Intervals.Interval from the IntervalSets.Interval types. If you're just working with individual intervals or all intervals have the same bounds then use IntervalSets.Interval. If you're working with large collections of Intervals where you might not have consistent bounds then use the Interval.jl type. It's the same reasoning as using AnchoredInterval if you know that all intervals have the same span.
Again, the goal is to have the two main packages complement rather than compete with each other.
Closes #208 and depends on #214
Intervals.Intervalnow serves a unique purpose overIntervalSets.Intervalwhich we can now document:IntervalSets.IntervalIntervalSets.IntervalIntervals.IntervalTODO:
Intervalbounds as an integer field rather than type parameters.AnchoredIntervalsBenchmarks:
In the case where they're all the same, it's order of magnitudes faster to use the parameterized version in IntervalSets.jl
However, once the intervals have been allocated many operations have similar performance.
And once we need to deal with potentially mixed bounds the Intervals.Interval performs better... as you'd expect.
And because we don't need to box the elements operations over all elements are faster.