MySQL: Delete All Duplicate Rows Except the Earliest One in One SQL

You want to add a unique index to a table, and unfortunately, there are already many duplicate rows in it. Manually find and delete these rows is time-wasting and error-prone. So why won't we just write one SQL statement and quickly resolve it?

First try, I wrote the following statement, and it won't work:

DELETE FROM PromotionSkus A 
WHERE 
	A.SkuId IN (SELECT SkuId FROM PromotionSkus B GROUP BY B.SkuId HAVING COUNT(B.SkuId) > 1) 
	AND 
	A.Id NOT IN (SELECT MIN(Id) FROM PromotionSkus C GROUP BY C.SkuId HAVING COUNT(C.SkuId) > 1);

AND this one below works!

DELETE FROM PromotionSkus A
WHERE 
	A.Id NOT IN (SELECT Id FROM (SELECT MIN(Id) AS Id, COUNT(SkuId) AS Total FROM PromotionSkus GROUP BY SkuId HAVING Total > 1) AS B)
	AND 
	A.SkuId IN (SELECT SkuId FROM (SELECT SkuId FROM PromotionSkus GROUP BY SkuId HAVING COUNT(SkuId) > 1) AS C);

The reason is well explained in this brilliant article.

Another mysql tip: using mysqldump export a table with one line one row.

mysqldump --databases YourDataBaseName --tables YourTableName --skip-extended-insert

Why do we need that? It is much easier to compare !

Comments

  1. Markdown is allowed. HTML tags allowed: <strong>, <em>, <blockquote>, <code>, <pre>, <a>.