Skip to content

decimate transforms#1966

Draft
Fil wants to merge 3 commits intomainfrom
fil/decimate
Draft

decimate transforms#1966
Fil wants to merge 3 commits intomainfrom
fil/decimate

Conversation

@Fil
Copy link
Copy Markdown
Contributor

@Fil Fil commented Jan 2, 2024

A data decimation transform can be used to simplify dense line charts by removing many of the points that don't add visual information to a line path.

The decimation strategy is inspired by M4 [1]: cluster the values by grouping them on the main axis (say, x = date for time series) for each given pixel, and in each cluster retain the points that give the minimum and maximum x and y values.

This implementation goes a bit further, as it does not assume that the points are ordered along x, and we want to support curves (such as catmull-rom) that might need to use more control points than these 4 inside a given cluster. So we retain not only argminX, argmaxX, argminY, and argmaxY —this is M4—, but also the first, last, and for some curves the second and next-to-last points. Also, we keep them in the order they appear in the index.

This extension of M4 brings the number of points per pixel from a maximum of 4 to a maximum of 6 for regular (monotone) curves, and 8 for irregular (quadratic, etc) curves. This seems like a modest price to pay to have a generic transform that we can apply systematically.

The areaY, lineY, and differenceY marks now transparently call decimateX. The areaX, lineX (and differenceX in the future, cf. #1920) marks now transparently call decimateY.

The only supported option is pixelSize, which gives the step of the quantization on x (in pixels), and defaults to 0.5. Setting this option to 0 makes the transform return early, effectively neutralizing it.

I would also recommend to call the decimate transform on the tip mark for very heavy datasets, to make it faster, but it would not be a good idea to do it systematically since the user might be interested in all the intermediate points that are aligned on a same x pixel.

todo:

  • documentation
  • maybe replace the automatic selection of the main channel x (vs x2 or x1) by explicit function names such as decimateX2 etc.?

closes #1707

[1] https://www.vldb.org/pvldb/vol7/p797-jugel.pdf ; see also @jheer’s notebook https://observablehq.com/@uwdata/m4-scalable-time-series-visualization for a nice walk-through and implementation of M4 with Plot.

@Fil Fil requested a review from mbostock January 2, 2024 15:56
Fil added 2 commits January 2, 2024 17:15
…the midpoint of x2 and x1, and might be rendered null if x1 is defined as -x2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data decimation transform

1 participant