jueves, 29 de junio de 2023

Duplicate row in plain sql

 Another no-magic-here post, but quite a useful one when working with data in dev, and needing to massage it.

Have you found the need to duplicate a row, only with slight modification? SQL doesn't have a standard way to do it, and while some rdbms accept some sort of "SELECT * EXCEPT(id) FROM xyz", you might not have it available in your db (postgres)

The clearest solution I found is here:

It's all about creating a temporary table, and use "like" and "insert into", to avoid having to write all the fields:

CREATE TEMP TABLE tmp (like web_book);
INSERT INTO tmp SELECT * FROM web_book WHERE id = 3;
UPDATE tmp SET id = nextval('web_book_id_seq');
INSERT INTO web_book SELECT * from tmp;


In fact, we could simplify it further (at least in postgres) with:

CREATE TEMP TABLE tmp as SELECT * FROM web_book WHERE id = 3;
UPDATE tmp SET id = nextval('web_book_id_seq');
INSERT INTO web_book SELECT * from tmp;

viernes, 23 de junio de 2023

"delete limit X" is not possible in postgres...

And how to fix it.

 

So I was pretty sure that I had done some sort of large-ish destructive (as in side-effect-y) operations in batches of small numbers.  And today I had to delete a bunch of rows from a bunch of tables.

It turns out that postgres doesn't support "LIMIT" in UPDATE nor DELETEs.

 First of all, the misunderstanding when I remembered doing destructive operations in batches comes from calling postgres functions via select.

Let's say you have to kill the db connections to a db:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='db_1234567890abcdef';

This, being a select, you can do things like:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname like 'db_%' limit 100;

So yes, you are batching operations, but it's because "selects" are not always side-effect-free.

So now, back to our real problem of doing updates or deletes in batches.

 It's not that it hasn't been asked or thought about. Here's one of the mail threads about it. Tom Lane's answer highlights some of the issues from the correctness perspective.


Ok, but if I really want to do it... what's the best way? Crunchydata has a blog post explaining multiple ways of doing it, but the main takeaway is to use CTEs to select the amount you want (using limit), and then delete by id "where exists" or "where id in".

WITH rows AS (
  SELECT
 something
  FROM
    big_table
  LIMIT 10
)
DELETE FROM big_table
WHERE something IN (SELECT something FROM rows);

Cool stuff. Nothing magical, but it's a good pattern to be familiar with.