miércoles, 29 de marzo de 2023

State machines

 I love state machines, and I use them in real world scenarios at least once a year.

The more I'm using them, in different languages or different situations, the more I get to "it's just an if" conclusion.

In this HN post about xstate, there's a minimalistic implementation in swift which I find a very nice distillation of how I use state machines usually. 

switch [current-state, event ]

  case ['foo', 'bar' ] -> ...

And that's 90% of it.

I've used ruby's statemachine, clojure's tilakone and clj-statecharts (there's also this new maestro lib which I haven't used yet), and home made lua ones. 

But the essence is, having a clear distinction of where am I, what I get, and what do I have to do. For the "what do I have to do", here's what I usually do:

- Updating the state of an object (usually the one bound to the state machine) is fine.

- I tend to run actions only for things that need to be immediate. Sending an alert in a UI has to be synchronous I guess.

- But if I can, I like to run some idempotent process that will run the actions. That process should notice that the state machine has stuff to do. That process is the one that has the knowledge of actions that should happen in XYZ situation. This process runs them, and logs them somewhere, so that next polling iteration doesn't redo them. These actions are things like "cancel a subscription after being unpaid for x days". The state machine will set the status to "unpaid", and set the cancel-at, but the polling process will check for "unpaid" subscriptions with cancel-at<now() that are still running, and kill them for good.

Further reading on xstate and fsm and one/two/three level fsms:

- https://www.industriallogic.com/patterns/P22.pdf

- https://hillside.net/plop/plop2003/Papers/Adamczyk-State-Machine.pdf

domingo, 26 de marzo de 2023

Finding Approximately Repeated Patterns in Time Series

I found this talk about time series which seems super cool, about smart things people can do with time series


And here are some more details in time series discords

domingo, 12 de marzo de 2023


From this HN thread I just discovered a file/info organization system called Johnny Decimal.

And it looks very very sane. And I think it can be easily combined with the person-action-object memory system I'm already using.

Going to try it and report back.

jueves, 16 de febrero de 2023

SQL window functions

SQL's window functions are such a huge topic that some see them as a DSL inside SQL itself.

You can do amazing things in sql without window functions, but when you need sophisticated rankings, or sorting subgroups, or you feel you are struggling with `group by + having + order by`, window functions are usually the answer.

 To me, Bruce Momjian's talk about window functions has been my default resource whenever I need a refresher.

But today I found a teaser of a book on window functions that's coming that looks very very good. I'll keep an eye on this page.  Also, the pic is from that site, so it deserves a link from my high-traffic blog (you're welcome).

lunes, 13 de febrero de 2023

Duckdb news and posts, early 2023

Motherduck is lately increasing their activity on the net, and they recently published a couple of interesting blogposts:

One, with the provocative-cliche of "Big Data is dead", which had quite a bit of traction in HN, and the other, a guest post, much more practical and "fun": "solving aoc with duckdb".

If you follow what happens in duckdbland, you'll know that the duckcon happened during FOSDEM2023. Here are the videos

One of the talks (not in duckcon, but in FOSDEM), talks about pytohn+duckdb. It's very nice to be able to use pandas dataframes from duckdb with the 0 copy approach.

Another talk is about duckdb extensions. Very DWIM, and in general, the practical approach the team has is quite refreshing.  Unfortunately, I don't do C++, so writing extensions is out of my league.  But I started thinking what would it take to be able to write extensions in Lua(JIT?).  Probably a low level wizard could make it happen quite fast. But performance wise... no idea.

Also, the biggest news: DuckDB 0.7.0 is out!

martes, 17 de enero de 2023

DuckDB's arg_max

In kdb+ there's a great way of showing aggregated information called `aj` for asof join. This way of joining data is a neat way to get the "latest record" in a join. 

These kinds of operations are already quite painful to do in plain sql when grouping by. Ever had to get the latest invoice of each customer? probably you had to nest a query on top of another one, or use a fancy "distinct on".

I realized though, that some analytical engines have something called `arg_max`, that can alleviate the pain when doing nested group by. And Duckdb already implements it!

First, let's create a table with a few values ready to group and get the max. Then, we'll want to get the id of the max of every group.   At that point, arg_max just does the right thing.


Btw, look at that nice Duckdb's group by all

miércoles, 21 de diciembre de 2022

Trying things out with duckdb's dbgen

REPLs are great, but they are only a part of the "have fun" experience.

Sometimes, you need data to work on a repl and to be able to simulate the "real world".

In clojure, I've used specmonstah and I quite like it. It can build nested structures nicely and once you configured the db schema and the specs of the individual properties, you can use it as a building block from your code, to generate trees of data.

For sql, I got pretty wild with generate_series, and `create table XXX as select` and you can get quite far with those. 

But recently I discovered `dbgen` function in DuckDB and I'm pretty happy with it: this function (that comes bundled in any duckdb binary) can generate TPC-H sample databases, including (scale factor as a parameter!).

I find myself using it more and more in order to test some sql I have in mind that I'm not sure if it would work or not.  It's really good that it gets you from "I wonder what would happen if...." to "aha, I see" in a shorter time than any of my previous approaches to scaffolding sample data for sql tests.

The usage is pretty simple:

`call dbgen(sf=0.1)`, and off you go!