1.7 KiB
1.7 KiB
Gradient filter
You have gigantic unstructured x;y;z data but you only care about gradient of
z above a threshold.
For certain data sets this code can reduce the number of significant data points by 99% or more.
How to build the code
$ cargo build --release
Example run
This example has 22 M data points:
$ ./target/release/gradient-filter --input SCOT1_raw.txt --output gradient-filtered.txt
time spent building KD-tree: 1.434549734s
time spent computing gradients: 1.850353732s
reduction: 99.25%
Other options
$ ./target/release/gradient-filter --help
Usage: gradient-filter [OPTIONS] --input <INPUT> --output <OUTPUT>
Options:
-i, --input <INPUT> Path to input file
-o, --output <OUTPUT> Path to output file
--k <K> Number of neighbors to compute gradient [default: 10]
--eps <EPS> Epsilon value to jitter points [default: 0.001]
-g, --gradient <GRADIENT> Gradient norm threshold [default: 0.09]
-h, --help Print help
-V, --version Print version
Data formats
The code reads data in the form x;y;z and assumes no x-y structure or even
order.
Here is an example (although it looks like on an even grid):
-83550; 6746650; 294.07
-83500; 6746650; 293.73
-83450; 6746650; 293.99
-83400; 6746650; 294.47
-83350; 6746650; 294.36
-83300; 6746650; 293.98
-83250; 6746650; 293.69
-83200; 6746650; 293.25
-83150; 6746650; 297.21
-83100; 6746650; 293.42
...
Output data is written in a similar format:
x;y;z;gz.
Algorithm
The code uses nearest-neighbor search (you can adjust the number of neighbors) to compute an approximate gradient at each point using a weighted least-squares local gradient estimator.