Friday, August 9, 2019

sql server - How can I do an UPDATE statement with JOIN in SQL?

You'll get the best performance if you forget the where clause and place all conditions in the ON expression.


I think this is because the query first has to join the tables then runs the where clause on that, so if you can reduce what is required to join then that's the fasted way to get the results/do the udpate.


Example


Scenario


You have a table of users. They can log in using their username or email or account_number. These accounts can be active (1) or inactive (0). This table has 50000 rows


You then have a table of users to disable at one go because you find out they've all done something bad. This table however, has one column with usernames, emails and account numbers mixed. It also has a "has_run" indicator which needs to be set to 1 (true) when it has been run


Query


UPDATE users User
INNER JOIN
blacklist_users BlacklistUser
ON
(
User.username = BlacklistUser.account_ref
OR
User.email = BlacklistedUser.account_ref
OR
User.phone_number = BlacklistUser.account_ref
AND
User.is_active = 1
AND
BlacklistUser.has_run = 0
)
SET
User.is_active = 0,
BlacklistUser.has_run = 1;

Reasoning


If we had to join on just the OR conditions it would essentially need to check each row 4 times to see if it should join, and potentially return a lot more rows. However, by giving it more conditions it can "skip" a lot of rows if they don't meet all the conditions when joining.


Bonus


It's more readable. All the conditions are in one place and the rows to update are in one place

No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...