jueves, 31 de agosto de 2023

postgres jsonb_diff and array_unique

 I recently copypasted a couple of functions from stackoverflow for my prostgres projects that are worth mentioning, cause they show a pattern on how to write plpsql functions:



Clojure and aplisms

The more I do APL and K, the more I see forks and hooks everywhere else.

That's just a couple of little helpers I've added in some clojure projects. One is "apply-if", which more than from APL, I'm taking it from any procedural language, where you can do `x= if p(x) then f(x) else x`. We used to do this a lot in lua (using ands and ors). But in a combinator style.


Second one is an attempt at the fork pattern.

Look how the avg looks like. it's verbose, but it captures the idea very close to how APL does it.



martes, 22 de agosto de 2023

2 classes of system designers

Here's a devastating quote from Andy Kinslow (NATO SOFTWARE ENGINEERING CONFERENCE ,1968) 


There are two classes of system designers. The first, if given five problems will solve them one at a time.

The second will come back and announce that these aren’t the real problems, and will eventually propose a solution to the single problem which underlies the original five.

This is the ‘system type’ who is great during the initial stages of a design project. However, you had better get rid of him after the first six months if you want to get a working system.


I found it in the Software Engineering conference pdf , but I didn't know anything about that "Andy Kinslow". And looking for that person, led me to some interesting findings, like an APL paper, or finding out that Andy Kinslow pioneered work on Time-Sharing-System, following leads of Corbato and Strachey

miércoles, 16 de agosto de 2023

Copy table from another db in postgres

 This one is somehow a followup on "Duplicate row in plain sql". What I needed in this case was to copy 

Both dbs live in the same physical postgres, and the main user has rights over everything, so the process was painless, and took 5 minutes to set it up (accounting for reading the docs).


The trick is to use the postgres_fdw and create a schema "dev" inside db_test, so I can access the tables in db_dev.


Now you can easily do your  `insert into table_to_copy select * from dev.table_to_copy;`

jueves, 29 de junio de 2023

Duplicate row in plain sql

 Another no-magic-here post, but quite a useful one when working with data in dev, and needing to massage it.

Have you found the need to duplicate a row, only with slight modification? SQL doesn't have a standard way to do it, and while some rdbms accept some sort of "SELECT * EXCEPT(id) FROM xyz", you might not have it available in your db (postgres)

The clearest solution I found is here:

It's all about creating a temporary table, and use "like" and "insert into", to avoid having to write all the fields:

CREATE TEMP TABLE tmp (like web_book);
INSERT INTO tmp SELECT * FROM web_book WHERE id = 3;
UPDATE tmp SET id = nextval('web_book_id_seq');
INSERT INTO web_book SELECT * from tmp;


In fact, we could simplify it further (at least in postgres) with:

CREATE TEMP TABLE tmp as SELECT * FROM web_book WHERE id = 3;
UPDATE tmp SET id = nextval('web_book_id_seq');
INSERT INTO web_book SELECT * from tmp;

viernes, 23 de junio de 2023

"delete limit X" is not possible in postgres...

And how to fix it.

 

So I was pretty sure that I had done some sort of large-ish destructive (as in side-effect-y) operations in batches of small numbers.  And today I had to delete a bunch of rows from a bunch of tables.

It turns out that postgres doesn't support "LIMIT" in UPDATE nor DELETEs.

 First of all, the misunderstanding when I remembered doing destructive operations in batches comes from calling postgres functions via select.

Let's say you have to kill the db connections to a db:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='db_1234567890abcdef';

This, being a select, you can do things like:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname like 'db_%' limit 100;

So yes, you are batching operations, but it's because "selects" are not always side-effect-free.

So now, back to our real problem of doing updates or deletes in batches.

 It's not that it hasn't been asked or thought about. Here's one of the mail threads about it. Tom Lane's answer highlights some of the issues from the correctness perspective.


Ok, but if I really want to do it... what's the best way? Crunchydata has a blog post explaining multiple ways of doing it, but the main takeaway is to use CTEs to select the amount you want (using limit), and then delete by id "where exists" or "where id in".

WITH rows AS (
  SELECT
 something
  FROM
    big_table
  LIMIT 10
)
DELETE FROM big_table
WHERE something IN (SELECT something FROM rows);

Cool stuff. Nothing magical, but it's a good pattern to be familiar with.

miércoles, 17 de mayo de 2023

Naming, Notation, and Idioms conventions

Variable names is a big point of discussion in communities, projects, PRs...

I'm on the "terse" side of the spectrum, and I think many times variable names are hindering more than helping. I lay on the "more inline anonymous things" rather than "everything-has-to-have-a-very-specific-name-in-the-toplevel-of-a-file". Of course, names should convey their meaning, and if they fail to do so, it's a bad naming, but I want to talk about the more subtle process of choosing active/passive names, short/long,etc... that conveys how one thinks about the code while navigating it.

Here are some of the concepts that inspire me on naming variables and organizing code "on the small".

Misscomunicating via overconstraining and overcommunicating: Sometimes, we create a piece of code thought for a particular use case, and the more descriptive we name the variables, the less obvious it is that that functionality can be used in other situations. In strongly typed languages, this interchangeability of use cases and discoverability is already killed by the language "features", but in the languages I like to use, it's perfectly ok to call the same functions to objects of different semantic meanings, even of different type or structure.

Huffmanization:"huffman coding" principle, meaning that things that are mentioned more often should be shorter.

Perl: has so many good tips about naming, and how to understand the dimensions of concepts in programming.  To name something, "topicalization" is introducing a topic you wanna talk about, and then be able to refer it with another (shorter) name (usually a pronoun) for some time, until that topic is out of scope.

APL, SmallerCodeBetterCode, Suggestivity: Showing the guts of the algorithms inline has its benefits (see Notation as a Tool of Thought paper), and the apl family is a great showcase of that. Variable names and the symbols used for the operators have a great importance there too. Also, not binding names keeps the algorithms generic and reusable (at the human level, not binding an algorithm to a usecase but to the concrete thing it does to a data structure has way more suggestivity than a triple nested structure with its own accessors).

K:  The great k's style.pdf has so much unconventional knowledge I can only suggest to go and read it with fresh and curious eyes. So bold it's what got me into k.  Also, "Stages of denial encountering K"

fast.ai naming conventions. Jeremy Howard's insights on programming are again, a bit controversial, but his tips have been always enlightening. If he says X, doesn't mean X is the only truth, but it means you should look at X with fresh and curious eyes, and maybe you'll discover a treasure.  He comes from an APL and Perl background, and writes so nice terse code I can only agree with what he says over and over.  Look at fast.ai style conventions, and the fast.ai abbreviations list. "One line of code should implement one complete idea, where possible"

Picolisp. I don't know how Alexander Burger chose all the names for picolisp's core functions, but there are lots of cool ideas both in the namings, and how the language api makes the code flow in a very natural way. (For example, @, or the `do` flexibility).  Picolisp code tends to be very dense. Not apl/j/k-dense, but very dense still.

k/q: Also, lots of names to pick from, and the information density is pretty high. From k I picked the massive overloading of functions to do "the right thing" depending on the types. I've never seen such a compact core, and it's funny to think that the functions and their overloads seem quite intuitive after having learned them for some time.  I never thought I'd say that, but here we are.

Forth and Factor: A bunch of words to pick from. If you're looking for a word, chances are that there's a Forth or Factor dictionary with a related concept.  dup, dip, swap, bi2,.... If you get familiar with those, there's a bunch of names and name compositions you can start using in your code, and things will just make sense.

Clojure: Clojure is of course very nice namewise, and it has very carefully choosen words. Even the threading macros, which are not names, but allow writing code in a way that leans to a very flow-y way. I never understood why `assert` doesn't return the first value in case of not throwing, but apart from that, no complains.

Red: Red language is the one I know the least of this list (maybe along with forth/factor). The language has such impressive properties also, not found anywhere else that I know of.  In that link I pasted, there's a function reference. Also, lots of interesting names to pick from.

Unconditional Code: Michael Feathers did a very down-to-earth talk in 2018 about Unconditional Code. The concept is not the same as the "anti-if" movement from the smalltalk community. In this case, Michael proposes to solve corner cases of our code by either changing the specs a little bit (many times making the code more generic), assume sane defaults (see Sussman's NYCLisp talk when he talks about flight autopilot, or John Ousterhout's Philosophy of Software Design), and some times, look for generality in the algorithm by exploiting data properties (examples of this is to make functions always process lists of things, and allowing empty lists so the function does nothing)

Other notations