And how to fix it.
So I was pretty sure that I had done some sort of large-ish destructive (as in side-effect-y) operations in batches of small numbers. And today I had to delete a bunch of rows from a bunch of tables.
It turns out that postgres doesn't support "LIMIT" in UPDATE nor DELETEs.
First of all, the misunderstanding when I remembered doing destructive operations in batches comes from calling postgres functions via select.
Let's say you have to kill the db connections to a db:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='db_1234567890abcdef';
This, being a select, you can do things like:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname like 'db_%' limit 100;
So yes, you are batching operations, but it's because "selects" are not always side-effect-free.
So now, back to our real problem of doing updates or deletes in batches.
It's not that it hasn't been asked or thought about. Here's one of the mail threads about it. Tom Lane's answer highlights some of the issues from the correctness perspective.
Ok, but if I really want to do it... what's the best way? Crunchydata has a blog post explaining multiple ways of doing it, but the main takeaway is to use CTEs to select the amount you want (using limit), and then delete by id "where exists" or "where id in".
WITH rows AS (
SELECT
something
FROM
big_table
LIMIT 10
)
DELETE FROM big_table
WHERE something IN (SELECT something FROM rows);
Cool stuff. Nothing magical, but it's a good pattern to be familiar with.
No hay comentarios:
Publicar un comentario