miércoles, 21 de diciembre de 2022

Trying things out with duckdb's dbgen

REPLs are great, but they are only a part of the "have fun" experience.

Sometimes, you need data to work on a repl and to be able to simulate the "real world".

In clojure, I've used specmonstah and I quite like it. It can build nested structures nicely and once you configured the db schema and the specs of the individual properties, you can use it as a building block from your code, to generate trees of data.

For sql, I got pretty wild with generate_series, and `create table XXX as select` and you can get quite far with those. 

But recently I discovered `dbgen` function in DuckDB and I'm pretty happy with it: this function (that comes bundled in any duckdb binary) can generate TPC-H sample databases, including (scale factor as a parameter!).

I find myself using it more and more in order to test some sql I have in mind that I'm not sure if it would work or not.  It's really good that it gets you from "I wonder what would happen if...." to "aha, I see" in a shorter time than any of my previous approaches to scaffolding sample data for sql tests.

The usage is pretty simple:

`call dbgen(sf=0.1)`, and off you go!

No hay comentarios: