商品网站建设实验报告,子域名查询工具,网站规划建设案例,广州黄埔网站制作起因是公司的crm录入不规范#xff0c;有重复数据。
之后考虑到需要手动处理#xff0c;首先需要自动找出重复的数据 查重要求#xff1a;
存在多个不允许重复的字段#xff0c;任一字段重复#xff0c;则判断为同一个客户。划分到同一重复组中。 查重sql如下
SELECT C…起因是公司的crm录入不规范有重复数据。
之后考虑到需要手动处理首先需要自动找出重复的数据 查重要求
存在多个不允许重复的字段任一字段重复则判断为同一个客户。划分到同一重复组中。 查重sql如下
SELECT CONCAT(组, dense_rank() OVER (ORDER BY group_key)) AS group_info,id, name, tel
FROM (SELECT id, name, tel,CONCAT_WS(-, IF(name IS NULL, , name), IF(tel IS NULL, , 1)) AS group_key,COUNT(*) OVER (PARTITION BY CONCAT_WS(-, IF(name IS NULL, , name), IF(tel IS NULL, , 1))) AS group_countFROM customer
) AS subquery
WHERE group_count 1
ORDER BY group_key
通过上述sql可以获得一个查重的sql。 那么接下来的问题是如果重复数据大我们是需要一个分页的。首先我们需要知道重复了多少组。
用以下sql获取
SELECT COUNT(*) AS total_count
FROM (SELECT DISTINCT group_infoFROM (SELECT CONCAT(组, dense_rank() OVER (ORDER BY group_key)) AS group_info,id, name, tel
FROM (SELECT id, name, tel,CONCAT_WS(-, IF(name IS NULL, , name), IF(tel IS NULL, , 1)) AS group_key,COUNT(*) OVER (PARTITION BY CONCAT_WS(-, IF(name IS NULL, , name), IF(tel IS NULL, , 1))) AS group_countFROM customer
) AS subquery
WHERE group_count 1
ORDER BY group_key) AS data_counts
) AS distinct_groups; 这里的分页应当按照重复组分页传统的分页就失效了。所以我们就需要手动计算分页。
把分组字段group_info中的组字去掉按group_info进行大于小于判断
SELECT dense_rank() OVER (ORDER BY group_key) AS group_info,id, name, tel
FROM (SELECT id, name, tel,CONCAT_WS(-, IF(name IS NULL, , name), IF(tel IS NULL, , 1)) AS group_key,COUNT(*) OVER (PARTITION BY CONCAT_WS(-, IF(name IS NULL, , name), IF(tel IS NULL, , 1))) AS group_countFROM customer
) AS subquery
WHERE group_count 1
AND group_info0 AND group_info11
ORDER BY group_key