天天看点

大量数据去除重复记录

前几天老大吩咐我一个任务:去除一张表里的重复(一个字段)记录.

table为表名, filed 为 待去重复的字段, id 为主键  数据库采用的是 sql server 2005 数量量一百多万.

思索未几,得出方案一如下:

SELECT filed ,min(ID) as id into #TBA  FROM table group by filed  having count(*) > 1 order by id

DECLARE mycur CURSOR FOR SELECT filed ,id FROM #TBA 
OPEN mycur

declare @filed  varchar(50),@id int
FETCH NEXT FROM mycur Into @filed ,@id

WHILE @@FETCH_STATUS = 0
BEGIN 
delete from table where filed = @filed  and id > @id
FETCH NEXT FROM mycur Into @filed ,@id
END
           

 okay,方案得出,运行,吃饭时间到,先填饱肚子去.

大概一个多小时后,居然还在运行..无奈只得中止查询分析器.

此时老大过来要结果了.于是做苦思冥想状.

忽然灵光一现,得出方案二如下:

SELECT min(ID) as id into #TBA  FROM table group by filed   order by id

delete from table where id not in (select id from #TBA)
           

激动中..运行..速度那可不是提高了一点点..

总结:换个方向思考,或许可以得出意外结果..