Conversation
closes #1707
…the midpoint of x2 and x1, and might be rendered null if x1 is defined as -x2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A data decimation transform can be used to simplify dense line charts by removing many of the points that don't add visual information to a line path.
The decimation strategy is inspired by M4 [1]: cluster the values by grouping them on the main axis (say, x = date for time series) for each given pixel, and in each cluster retain the points that give the minimum and maximum x and y values.
This implementation goes a bit further, as it does not assume that the points are ordered along x, and we want to support curves (such as catmull-rom) that might need to use more control points than these 4 inside a given cluster. So we retain not only argminX, argmaxX, argminY, and argmaxY —this is M4—, but also the first, last, and for some curves the second and next-to-last points. Also, we keep them in the order they appear in the index.
This extension of M4 brings the number of points per pixel from a maximum of 4 to a maximum of 6 for regular (monotone) curves, and 8 for irregular (quadratic, etc) curves. This seems like a modest price to pay to have a generic transform that we can apply systematically.
The areaY, lineY, and differenceY marks now transparently call decimateX. The areaX, lineX (and differenceX in the future, cf. #1920) marks now transparently call decimateY.
The only supported option is pixelSize, which gives the step of the quantization on x (in pixels), and defaults to 0.5. Setting this option to 0 makes the transform return early, effectively neutralizing it.
I would also recommend to call the decimate transform on the tip mark for very heavy datasets, to make it faster, but it would not be a good idea to do it systematically since the user might be interested in all the intermediate points that are aligned on a same x pixel.
todo:
closes #1707
[1] https://www.vldb.org/pvldb/vol7/p797-jugel.pdf ; see also @jheer’s notebook https://observablehq.com/@uwdata/m4-scalable-time-series-visualization for a nice walk-through and implementation of M4 with Plot.