martes, 17 de enero de 2023

DuckDB's arg_max

In kdb+ there's a great way of showing aggregated information called `aj` for asof join. This way of joining data is a neat way to get the "latest record" in a join. 

These kinds of operations are already quite painful to do in plain sql when grouping by. Ever had to get the latest invoice of each customer? probably you had to nest a query on top of another one, or use a fancy "distinct on".

I realized though, that some analytical engines have something called `arg_max`, that can alleviate the pain when doing nested group by. And Duckdb already implements it! (discussion here)

First, let's create a table with a few values ready to group and get the max. Then, we'll want to get the id of the max of every group.   At that point, arg_max just does the right thing.

 

Btw, look at that nice Duckdb's group by all

No hay comentarios: