MySQL: Delete All Duplicate Rows Except the Earliest One in One SQL
You want to add a unique index to a table, and unfortunately, there are already many duplicate rows in it. Manually find and delete these rows is time-wasting and error-prone. So why won't we just write one SQL statement and quickly resolve it?
First try, I wrote the following statement, and it won't work:
DELETE FROM PromotionSkus A
WHERE
A.SkuId IN (SELECT SkuId FROM PromotionSkus B GROUP BY B.SkuId HAVING COUNT(B.SkuId) > 1)
AND
A.Id NOT IN (SELECT MIN(Id) FROM PromotionSkus C GROUP BY C.SkuId HAVING COUNT(C.SkuId) > 1);
AND this one below works!
DELETE FROM PromotionSkus A
WHERE
A.Id NOT IN (SELECT Id FROM (SELECT MIN(Id) AS Id, COUNT(SkuId) AS Total FROM PromotionSkus GROUP BY SkuId HAVING Total > 1) AS B)
AND
A.SkuId IN (SELECT SkuId FROM (SELECT SkuId FROM PromotionSkus GROUP BY SkuId HAVING COUNT(SkuId) > 1) AS C);
The reason is well explained in this brilliant article.
However, I found a much simpler solution on 23-Mar-2025, which is
DELETE FROM PromotionSkus
WHERE Id NOT IN (
SELECT Id FROM (
SELECT MIN(Id) AS Id
FROM PromotionSkus
GROUP BY SkuId
) A
);
Another mysql tip: using mysqldump export a table with one line one row.
mysqldump --databases YourDataBaseName --tables YourTableName --skip-extended-insert
Why do we need that? It is much easier to compare !