Gradient filter

You have gigantic unstructured x;y;z data but you only care about gradient of z above a threshold.

For certain data sets this code can reduce the number of significant data points by 99% or more.

How to build the code

$ cargo build --release

Example run

This example has 22 M data points:

$ ./target/release/gradient-filter --input SCOT1_raw.txt --output gradient-filtered.txt

time spent building KD-tree: 1.434549734s
time spent computing gradients: 1.850353732s
reduction: 99.25%

Other options

$ ./target/release/gradient-filter --help

Usage: gradient-filter [OPTIONS] --input <INPUT> --output <OUTPUT>

Options:
  -i, --input <INPUT>        Path to input file
  -o, --output <OUTPUT>      Path to output file
      --k <K>                Number of neighbors to compute gradient [default: 10]
      --eps <EPS>            Epsilon value to jitter points [default: 0.001]
  -g, --gradient <GRADIENT>  Gradient norm threshold [default: 0.09]
  -h, --help                 Print help
  -V, --version              Print version

Data formats

The code reads data in the form x;y;z and assumes no x-y structure or even order.

Here is an example (although it looks like on an even grid):

-83550; 6746650; 294.07
-83500; 6746650; 293.73
-83450; 6746650; 293.99
-83400; 6746650; 294.47
-83350; 6746650; 294.36
-83300; 6746650; 293.98
-83250; 6746650; 293.69
-83200; 6746650; 293.25
-83150; 6746650; 297.21
-83100; 6746650; 293.42
...

Output data is written in a similar format: x;y;z;gz.

Algorithm

The code uses nearest-neighbor search (you can adjust the number of neighbors) to compute an approximate gradient at each point using a weighted least-squares local gradient estimator.