Welcome to dbFreaks.com!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

delete dupes with a unique id using a temp table

 
Goto page 1, 2
   Database Help (Home) -> mySQL RSS
Next:  Big Bertha Thing Positive  
Author Message
jr

External


Since: Jan 03, 2011
Posts: 8



(Msg. 1) Posted: Mon Jan 03, 2011 4:37 pm
Post subject: delete dupes with a unique id using a temp table
Archived from groups: comp>databases>mysql (more info?)

I need to delete the dupes across bu's. The records are unique based
on bu+ndc but there are 400+ duplicates..
There is a sequential id that is unique. I am trying to remove these
400+ records.

1. SELECT *
FROM ci
GROUP BY bu,ndc
HAVING COUNT(concat(bu,ndc) )>1

This query yields 400+ dupes which I need to remove.


2.
I could do this query in Oracle but not in MySQL.
You cannot delete from the same table as in the subquery.


DELETE FROM ci b
WHERE EXISTS ((
SELECT bu,ndc, count(1)
FROM ci
GROUP BY bu,ndc
HAVING COUNT(1)>1
) a
where a.bu=b.bu
and a.ndc = b.ndc



3. I need to create a temp table with the unique ids in order to
compare the ci table to the duplicates.
I create the temp table with the SELECT from above that now has the
duplicate records in it with the id.
CREATE TABLE ci_tmp AS
(SELECT c.id,c.bu,c.zonenm,c.ndc
FROM ci c
GROUP BY c.bu,c.ndc
HAVING COUNT(concat(c.bu,c.ndc) )>1)

4. Next I have to run my delete query matching against the ids in the
temp table.
AT this point I am not sure exactly how to do this or what I need to
delete.
This is what I have but it isn't right because it doesn't delete any
records.


DELETE FROM cart_inventory
WHERE id IN (select id
FROM cart_inventory_tmp
GROUP BY bu,nationaldrugcode
HAVING COUNT(concat(bu,nationaldrugcode) )>1)


I also tried deleting using max(id) but I don't know if it requires
a self-join and I think the above process might be easier?

delete FROM cart_inventory as as a
max(id)
where concat(a.bu, a.ndc)=concat(b.bu,b.ndc)

If you could explain the process of Query#4 what I am trying to match
against that might help but I basically understand
that you need to get the id of one of the duplicates from the temp
table and match it against the ci table and delete that id.
It seems like it should work but it doesn't because I run query # 4
and do the select for the dupes and they are all still there..



thanks,

 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
onedbguru

External


Since: Jan 03, 2011
Posts: 11



(Msg. 2) Posted: Mon Jan 03, 2011 5:13 pm
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 3, 7:37 pm, jr wrote:
> I need to delete the dupes across bu's.  The records are unique based
> on bu+ndc but there are 400+ duplicates..
> There is a sequential id that is unique.  I am trying to remove these
> 400+ records.
>
> 1.  SELECT *
> FROM ci
> GROUP BY bu,ndc
> HAVING COUNT(concat(bu,ndc) )>1
>
> This query yields 400+ dupes which I need to remove.
>
> 2.
> I could do this query in Oracle but not in MySQL.
> You cannot delete from the same table as in the subquery.
>
>  DELETE FROM ci  b
> WHERE EXISTS ((
> SELECT bu,ndc, count(1)
> FROM ci
> GROUP BY bu,ndc
> HAVING  COUNT(1)>1
> ) a
> where  a.bu=b.bu
> and a.ndc = b.ndc
>
> 3.   I need to create a temp table with the unique ids in order to
> compare the ci table to the duplicates.
> I create the temp table with the SELECT from above that now has the
> duplicate records in it with the id.
> CREATE TABLE  ci_tmp AS
> (SELECT c.id,c.bu,c.zonenm,c.ndc
> FROM ci c
> GROUP BY c.bu,c.ndc
> HAVING  COUNT(concat(c.bu,c.ndc) )>1)
>
> 4.  Next I have to run my delete query matching against the ids in the
> temp table.
> AT this point I am not sure exactly how to do this or what I need to
> delete.
> This is what I have but it isn't right because it doesn't delete any
> records.
>
> DELETE FROM cart_inventory
> WHERE id IN (select id
> FROM cart_inventory_tmp
> GROUP BY bu,nationaldrugcode
> HAVING COUNT(concat(bu,nationaldrugcode) )>1)
>
> I also tried   deleting using max(id) but I don't know if it requires
> a self-join and I think the above process might be easier?
>
> delete  FROM cart_inventory as as a
> max(id)
> where  concat(a.bu, a.ndc)=concat(b.bu,b.ndc)
>
> If you could explain the process of Query#4 what I am trying to match
> against that might help but I basically understand
> that you need to get the id of one of the duplicates from the temp
> table and match it against the ci table and delete that id.
> It seems like it should work but it doesn't because I run query # 4
> and do the select for the dupes and they are all still there..
>
> thanks,


1) are the rows truely DUPLICATE or do you just have a bu,ndc
duplicate - are there other columns and are they also duplicate or are
they unique? If "unique", which one is correct?

See: http://dev.mysql.com/doc/refman/5.5/en/alter-table.html and
search for IGNORE

"IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table or
if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key errors
occur. If IGNORE is specified, only the first row is used of rows with
duplicates on a unique key, The other conflicting rows are deleted.
Incorrect values are truncated to the closest matching acceptable
value."

ALTER IGNORE TABLE ci ADD UNIQUE KEY (bu,ndc ) ;

How it knows which one is "first" I do not know...

I would test this before actually executingto ensure you are purging
the correct data.

Next step - figure out why you are getting unwanted dupes and fix the
problem.

 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
jr

External


Since: Jan 03, 2011
Posts: 8



(Msg. 3) Posted: Mon Jan 03, 2011 5:27 pm
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 3, 5:13 pm, onedbguru wrote:
> On Jan 3, 7:37 pm, jr wrote:
>
>
>
>
>
>
>
>
>
> > I need to delete the dupes across bu's.  The records are unique based
> > on bu+ndc but there are 400+ duplicates..
> > There is a sequential id that is unique.  I am trying to remove these
> > 400+ records.
>
> > 1.  SELECT *
> > FROM ci
> > GROUP BY bu,ndc
> > HAVING COUNT(concat(bu,ndc) )>1
>
> > This query yields 400+ dupes which I need to remove.
>
> > 2.
> > I could do this query in Oracle but not in MySQL.
> > You cannot delete from the same table as in the subquery.
>
> >  DELETE FROM ci  b
> > WHERE EXISTS ((
> > SELECT bu,ndc, count(1)
> > FROM ci
> > GROUP BY bu,ndc
> > HAVING  COUNT(1)>1
> > ) a
> > where  a.bu=b.bu
> > and a.ndc = b.ndc
>
> > 3.   I need to create a temp table with the unique ids in order to
> > compare the ci table to the duplicates.
> > I create the temp table with the SELECT from above that now has the
> > duplicate records in it with the id.
> > CREATE TABLE  ci_tmp AS
> > (SELECT c.id,c.bu,c.zonenm,c.ndc
> > FROM ci c
> > GROUP BY c.bu,c.ndc
> > HAVING  COUNT(concat(c.bu,c.ndc) )>1)
>
> > 4.  Next I have to run my delete query matching against the ids in the
> > temp table.
> > AT this point I am not sure exactly how to do this or what I need to
> > delete.
> > This is what I have but it isn't right because it doesn't delete any
> > records.
>
> > DELETE FROM cart_inventory
> > WHERE id IN (select id
> > FROM cart_inventory_tmp
> > GROUP BY bu,nationaldrugcode
> > HAVING COUNT(concat(bu,nationaldrugcode) )>1)
>
> > I also tried   deleting using max(id) but I don't know if it requires
> > a self-join and I think the above process might be easier?
>
> > delete  FROM cart_inventory as as a
> > max(id)
> > where  concat(a.bu, a.ndc)=concat(b.bu,b.ndc)
>
> > If you could explain the process of Query#4 what I am trying to match
> > against that might help but I basically understand
> > that you need to get the id of one of the duplicates from the temp
> > table and match it against the ci table and delete that id.
> > It seems like it should work but it doesn't because I run query # 4
> > and do the select for the dupes and they are all still there..
>
> > thanks,
>
> 1) are the rows truely DUPLICATE or do you just have a bu,ndc
> duplicate - are there other columns and are they also duplicate or are
> they unique?  If "unique", which one is correct?
>
> See:http://dev.mysql.com/doc/refman/5.5/en/alter-table.htmland
> search for IGNORE
>
> "IGNORE is a MySQL extension to standard SQL. It controls how ALTER
> TABLE works if there are duplicates on unique keys in the new table or
> if warnings occur when strict mode is enabled. If IGNORE is not
> specified, the copy is aborted and rolled back if duplicate-key errors
> occur. If IGNORE is specified, only the first row is used of rows with
> duplicates on a unique key, The other conflicting rows are deleted.
> Incorrect values are truncated to the closest matching acceptable
> value."
>
> ALTER IGNORE TABLE ci ADD UNIQUE KEY (bu,ndc ) ;
>
> How it knows which one is "first" I do not know...
>
> I would test this before actually executingto ensure you are purging
> the correct data.
>
> Next step - figure out why you are getting unwanted dupes and fix the
> problem.

it is bu,ndc duplicates I am trying to get rid of, this is data
cleansing, the first or the 2nd duplicate doesn't matter, there is no
duplicate on the unique key
only duplicate on the bu,ndc. The concat(bu,ndc) is how I determine
the row is unique. The id was assigned as I uploaded the data.
Should I just use the max(id) if so how do I set up the self-join?
thanks,
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
jr

External


Since: Jan 03, 2011
Posts: 8



(Msg. 4) Posted: Mon Jan 03, 2011 5:59 pm
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 3, 5:13 pm, onedbguru wrote:
> On Jan 3, 7:37 pm, jr wrote:
>
>
>
>
>
>
>
>
>
> > I need to delete the dupes across bu's.  The records are unique based
> > on bu+ndc but there are 400+ duplicates..
> > There is a sequential id that is unique.  I am trying to remove these
> > 400+ records.
>
> > 1.  SELECT *
> > FROM ci
> > GROUP BY bu,ndc
> > HAVING COUNT(concat(bu,ndc) )>1
>
> > This query yields 400+ dupes which I need to remove.
>
> > 2.
> > I could do this query in Oracle but not in MySQL.
> > You cannot delete from the same table as in the subquery.
>
> >  DELETE FROM ci  b
> > WHERE EXISTS ((
> > SELECT bu,ndc, count(1)
> > FROM ci
> > GROUP BY bu,ndc
> > HAVING  COUNT(1)>1
> > ) a
> > where  a.bu=b.bu
> > and a.ndc = b.ndc
>
> > 3.   I need to create a temp table with the unique ids in order to
> > compare the ci table to the duplicates.
> > I create the temp table with the SELECT from above that now has the
> > duplicate records in it with the id.
> > CREATE TABLE  ci_tmp AS
> > (SELECT c.id,c.bu,c.zonenm,c.ndc
> > FROM ci c
> > GROUP BY c.bu,c.ndc
> > HAVING  COUNT(concat(c.bu,c.ndc) )>1)
>
> > 4.  Next I have to run my delete query matching against the ids in the
> > temp table.
> > AT this point I am not sure exactly how to do this or what I need to
> > delete.
> > This is what I have but it isn't right because it doesn't delete any
> > records.
>
> > DELETE FROM cart_inventory
> > WHERE id IN (select id
> > FROM cart_inventory_tmp
> > GROUP BY bu,nationaldrugcode
> > HAVING COUNT(concat(bu,nationaldrugcode) )>1)
>
> > I also tried   deleting using max(id) but I don't know if it requires
> > a self-join and I think the above process might be easier?
>
> > delete  FROM cart_inventory as as a
> > max(id)
> > where  concat(a.bu, a.ndc)=concat(b.bu,b.ndc)
>
> > If you could explain the process of Query#4 what I am trying to match
> > against that might help but I basically understand
> > that you need to get the id of one of the duplicates from the temp
> > table and match it against the ci table and delete that id.
> > It seems like it should work but it doesn't because I run query # 4
> > and do the select for the dupes and they are all still there..
>
> > thanks,
>
> 1) are the rows truely DUPLICATE or do you just have a bu,ndc
> duplicate - are there other columns and are they also duplicate or are
> they unique?  If "unique", which one is correct?
>
> See:http://dev.mysql.com/doc/refman/5.5/en/alter-table.htmland
> search for IGNORE
>
> "IGNORE is a MySQL extension to standard SQL. It controls how ALTER
> TABLE works if there are duplicates on unique keys in the new table or
> if warnings occur when strict mode is enabled. If IGNORE is not
> specified, the copy is aborted and rolled back if duplicate-key errors
> occur. If IGNORE is specified, only the first row is used of rows with
> duplicates on a unique key, The other conflicting rows are deleted.
> Incorrect values are truncated to the closest matching acceptable
> value."
>
> ALTER IGNORE TABLE ci ADD UNIQUE KEY (bu,ndc ) ;
>
> How it knows which one is "first" I do not know...
>
> I would test this before actually executingto ensure you are purging
> the correct data.
>
> Next step - figure out why you are getting unwanted dupes and fix the
> problem.

I tried the ALTER IGNORE and changed the unique key to the (bu,ndc)
on a backup table.
It changed the number of records in ci by 169 so I am afraid to do it
like that.
If I could figure out how to delete using the max(id) I would do it
that way.
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
Lennart Jonsson

External


Since: Apr 26, 2010
Posts: 22



(Msg. 5) Posted: Tue Jan 04, 2011 4:29 am
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 2011-01-04 01:37, jr wrote:
[...]
> 2.
> I could do this query in Oracle but not in MySQL.
> You cannot delete from the same table as in the subquery.
>
>
> DELETE FROM ci b
> WHERE EXISTS ((
> SELECT bu,ndc, count(1)
> FROM ci
> GROUP BY bu,ndc
> HAVING COUNT(1)>1
> ) a
> where a.bu=b.bu
> and a.ndc = b.ndc
>

You can hide the table at a deeper nesting level, but it will probably
be very slow. Another option is to delete from a join (non-standard)
like (untested):

delete b.*
from ci as b
join (
SELECT bu,ndc
FROM ci
GROUP BY bu,ndc
HAVING COUNT(1)>1
) as a
on a.bu=b.bu
and a.ndc = b.ndc

If bu, ndu is a candidate key, you might want to add a unique constraint:

alter table ci add constraint ... unique (bu, ndc);

to enforce this.

[...]

/Lennart
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
jr

External


Since: Jan 03, 2011
Posts: 8



(Msg. 6) Posted: Tue Jan 04, 2011 7:37 am
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 4, 12:35 am, Lennart Jonsson
wrote:
> On 2011-01-04 01:37, jr wrote:
> [...]
>
> > 2.
> > I could do this query in Oracle but not in MySQL.
> > You cannot delete from the same table as in the subquery.
>
> >  DELETE FROM ci  b
> > WHERE EXISTS ((
> > SELECT bu,ndc, count(1)
> > FROM ci
> > GROUP BY bu,ndc
> > HAVING  COUNT(1)>1
> > ) a
> > where  a.bu=b.bu
> > and a.ndc = b.ndc
>
> You can hide the table at a deeper nesting level, but it will probably
> be very slow. Another option is to delete from a join (non-standard)
> like (untested):
>
> delete b.*
> from ci as b
> join (
>   SELECT bu,ndc
>   FROM ci
>   GROUP BY bu,ndc
>   HAVING  COUNT(1)>1
> ) as a
>     on a.bu=b.bu
>    and a.ndc = b.ndc
>
> If bu, ndu is a candidate key, you might want to add a unique constraint:
>
>     alter table ci add constraint ... unique (bu, ndc);
>
> to enforce this.
>
> [...]
>
> /Lennart

The query is beautiful although it deleted 1085 rows. When I did the
first query to locate duplicates it found only 487. Why is
that? ?????
Where did the extra
This is the query I used to find the dupes. Is that wrong or is your
query wrong? Did it delete the dupe and the original record as well?

select bu,ndc, count(*)
from ci
group by bu,ndc
having COUNT(concat(bu,nationaldrugcode)) > 1



I wouldn't hide the table at a deeper level because users are using
the web page and it is somewhat slow already because of a UNION
query. I could try it if all else fails. I will try the unique
constraint.

thanks,
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
Lennart Jonsson

External


Since: Apr 26, 2010
Posts: 22



(Msg. 7) Posted: Tue Jan 04, 2011 4:25 pm
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 2011-01-04 16:37, jr wrote:
[...]
>
> The query is beautiful although it deleted 1085 rows. When I did the
> first query to locate duplicates it found only 487. Why is
> that? ?????
> Where did the extra
> This is the query I used to find the dupes. Is that wrong or is your
> query wrong? Did it delete the dupe and the original record as well?
>
> select bu,ndc, count(*)
> from ci
> group by bu,ndc
> having COUNT(concat(bu,nationaldrugcode)) > 1
>
>

I should have look more carefully, I just translated your attempt:

DELETE FROM ci b
WHERE EXISTS ((
SELECT bu,ndc, count(1)
FROM ci
GROUP BY bu,ndc
HAVING COUNT(1)>1
) a
where a.bu=b.bu
and a.ndc = b.ndc

into

delete b.*
from ci as b
join (
SELECT bu,ndc
FROM ci
GROUP BY bu,ndc
HAVING COUNT(1)>1
) as a
on a.bu=b.bu
and a.ndc = b.ndc

It is probably not what you want because it will delete *all* rows that
are duplicated (including the first occurrence). Compare the result from:

SELECT *
FROM ci b
WHERE EXISTS (
SELECT bu,ndc, count(1)
FROM ci
GROUP BY bu,ndc
HAVING COUNT(1)>1
) a
where a.bu=b.bu
and a.ndc = b.ndc

with:

select b.*
from ci as b
join (
SELECT bu,ndc
FROM ci
GROUP BY bu,ndc
HAVING COUNT(1)>1
) as a
on a.bu=b.bu
and a.ndc = b.ndc

>
> I wouldn't hide the table at a deeper level because users are using
> the web page and it is somewhat slow already because of a UNION
> query. I could try it if all else fails.

No, you misunderstood me. What I meant was that you can hide the table
inside the delete stmt as in:

SELECT *
FROM ci b
WHERE EXISTS (
select * FROM (
SELECT bu,ndc, count(1)
FROM ci
GROUP BY bu,ndc
HAVING COUNT(1)>1
) AS X
) a
where a.bu=b.bu
and a.ndc = b.ndc

but this will also delete more rows than you want (I guess). You will
have to enumerate the duplicate rows somehow (the sequential unique id
is probably your best option). Here's one attempt:

delete b.*
from ci as b
where exists (
select 1
from ci as c
where c.bu = b.bu
and c.ndc = b.ndc
and c.id < b.id
)

But mysql does not allow this, so we'll rewrite that as a join:

delete b.*
from ci as b
join ci as c
on c.bu = b.bu
and c.ndc = b.ndc
and c.id < b.id

Example:

create table ci (
id int not null primary key,
bu int not null,
ndc int not null
) engine = innodb;

insert into ci (id, bu, ndc)
values (1,1,1),(2,1,1),(3,2,1),(4,2,2),(5,2,2),(6,1,1);

The first query will delete *all* duplicates, i.e. all rows but the one
with id = 3:

mysql> select b.*
-> from ci as b
-> join (
-> SELECT bu,ndc
-> FROM ci
-> GROUP BY bu,ndc
-> HAVING COUNT(1)>1
-> ) as a
-> on a.bu=b.bu
-> and a.ndc = b.ndc;
+----+----+-----+
| id | bu | ndc |
+----+----+-----+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 1 | 1 |
+----+----+-----+

The rewritten query leaves the duplicates with the lowest id:

mysql> select b.* from ci as b
-> join ci as c
-> on c.bu = b.bu
-> and c.ndc = b.ndc
-> and c.id < b.id;
+----+----+-----+
| id | bu | ndc |
+----+----+-----+
| 2 | 1 | 1 |
| 5 | 2 | 2 |
| 6 | 1 | 1 |
| 6 | 1 | 1 |
+----+----+-----+
4 rows in set (0.05 sec)

mysql> delete b.*
-> from ci as b
-> join ci as c
-> on c.bu = b.bu
-> and c.ndc = b.ndc
-> and c.id < b.id;
Query OK, 3 rows affected (0.05 sec)

mysql> select * from ci;
+----+----+-----+
| id | bu | ndc |
+----+----+-----+
| 1 | 1 | 1 |
| 3 | 2 | 1 |
| 4 | 2 | 2 |
+----+----+-----+
3 rows in set (0.00 sec)

Don't just take the code and execute it, try understand the similarities
and differences from the first attempt.

HTH
/Lennart
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
Lennart Jonsson

External


Since: Apr 26, 2010
Posts: 22



(Msg. 8) Posted: Tue Jan 04, 2011 5:25 pm
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 2011-01-04 22:19, Lennart Jonsson wrote:
[...]
>
> SELECT *
> FROM ci b
> WHERE EXISTS (
> SELECT bu,ndc, count(1)
> FROM ci
> GROUP BY bu,ndc
> HAVING COUNT(1)>1
> ) a
> where a.bu=b.bu
> and a.ndc = b.ndc
>

I get an syntax error here, so I guess uncorrelated exists subqueries
are not allowed either. You can rewrite this to:

SELECT b.* FROM ci as b where exists ( select bu,ndc from ci as a where
a.bu = b.bu and a.ndc = b.ndc group by bu,ndc having count(1) > 1);
+----+----+-----+
| id | bu | ndc |
+----+----+-----+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 1 | 1 |
+----+----+-----+

/Lennart
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
Lennart Jonsson

External


Since: Apr 26, 2010
Posts: 22



(Msg. 9) Posted: Wed Jan 05, 2011 7:25 pm
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 2011-01-04 22:38, Lennart Jonsson wrote:
> On 2011-01-04 22:19, Lennart Jonsson wrote:
> [...]
>>
>> SELECT *
>> FROM ci b
>> WHERE EXISTS (
>> SELECT bu,ndc, count(1)
>> FROM ci
>> GROUP BY bu,ndc
>> HAVING COUNT(1)>1
>> ) a
>> where a.bu=b.bu
>> and a.ndc = b.ndc
>>
>
> I get an syntax error here

Never mind, there where several other reasons for the syntax error, I
just did not notice them.
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
jr

External


Since: Jan 03, 2011
Posts: 8



(Msg. 10) Posted: Thu Jan 06, 2011 5:55 am
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 5, 3:53 pm, Lennart Jonsson
wrote:
> On 2011-01-04 22:38, Lennart Jonsson wrote:
>
>
>
>
>
>
>
>
>
> > On 2011-01-04 22:19, Lennart Jonsson wrote:
> > [...]
>
> >> SELECT *
> >> FROM ci  b
> >> WHERE EXISTS (
> >>     SELECT bu,ndc, count(1)
> >>     FROM ci
> >>     GROUP BY bu,ndc
> >>     HAVING COUNT(1)>1
> >> ) a
> >> where a.bu=b.bu
> >> and a.ndc = b.ndc
>
> > I get an syntax error here
>
> Never mind, there where several other reasons for the syntax error, I
> just did not notice them.

I do get it, thank, you finished the first query that I was trying to
do even though it was wrong.
If I could have done it then I could have seen it was wrong. I did
want to do the self-join.
I get it you can't delete from the same table as a subquery. Thanks
for explaining the rest of it.
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
Lennart Jonsson

External


Since: Apr 26, 2010
Posts: 22



(Msg. 11) Posted: Thu Jan 13, 2011 1:25 am
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 2011-01-13 00:36, jr wrote:
[...]
> The 3rd query, I did get a syntax error on the select:
> select *
> from ci b
> where exists(
> select bu,ndc, count(1)
> from ci
> group by bu, ndc
> having count(1)>1
> )a
> where a.bu=b.bu
> and a.ndc=b.ndc
>
[...]
> I got syntax errors. You mentioned some other reason other than a
> syntax error?
>

You already mentioned the two where clauses, fixing that leaves us with:

.... exists (
select bu,ndc, count(1)
from ci
group by bu, ndc
having count(1)>1
) a

exists is a predicate ( sometimes referred to as propositional function,
you can think of it as an anonymous function returning true/false ) and
you can't give it a name ( there is no point in doing so either ).
Furthermore bu, ndc, etc inside the predicate is not visible outside the
predicate, so you can't reference them like you do in:

where a.bu=b.bu
and a.ndc=b.ndc

You can however reference variables outside of the predicate from within
the predicate, so

select *
from ci b
where exists(
select bu,ndc, count(1)
from ci as a

where a.bu=b.bu
and a.ndc=b.ndc

group by bu, ndc
having count(1)>1
)

is syntactically correct (will still give you the wrong result).


/Lennart
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
jr

External


Since: Jan 03, 2011
Posts: 8



(Msg. 12) Posted: Mon Jan 17, 2011 1:41 pm
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 4, 1:19 pm, Lennart Jonsson
wrote:
> On 2011-01-04 16:37, jr wrote:
> [...]
>
>
>
> > The query is beautiful although it deleted 1085 rows.   When I did the
> > first query to locate duplicates it found only 487.  Why is
> > that? ?????
> > Where did the extra
> > This is the query I used to find the dupes.  Is that wrong or is your
> > query wrong?  Did it delete the dupe and the original record as well?
>
> > select bu,ndc, count(*)
> > from ci
> > group by bu,ndc
> > having COUNT(concat(bu,nationaldrugcode)) > 1
>
> I should have look more carefully, I just translated your attempt:
>
> DELETE FROM ci  b
> WHERE EXISTS ((
> SELECT bu,ndc, count(1)
> FROM ci
> GROUP BY bu,ndc
> HAVING  COUNT(1)>1
> ) a
> where  a.bu=b.bu
> and a.ndc = b.ndc
>
> into
>
> delete b.*
> from ci as b
> join (
>   SELECT bu,ndc
>   FROM ci
>   GROUP BY bu,ndc
>   HAVING  COUNT(1)>1
> ) as a
>     on a.bu=b.bu
>    and a.ndc = b.ndc
>
> It is probably not what you want because it will delete *all* rows that
> are duplicated (including the first occurrence). Compare the result from:
>
> SELECT *
> FROM ci  b
> WHERE EXISTS (
>     SELECT bu,ndc, count(1)
>     FROM ci
>     GROUP BY bu,ndc
>     HAVING COUNT(1)>1
> ) a
> where a.bu=b.bu
> and a.ndc = b.ndc
>
> with:
>
> select b.*
> from ci as b
> join (
>   SELECT bu,ndc
>   FROM ci
>   GROUP BY bu,ndc
>   HAVING  COUNT(1)>1
> ) as a
>     on a.bu=b.bu
>    and a.ndc = b.ndc
>
>
>
> > I wouldn't hide the table at a deeper level because users are using
> > the web page and it is somewhat slow already because of a UNION
> > query.  I could try it if all else fails.
>
> No, you misunderstood me. What I meant was that you can hide the table
> inside the delete stmt as in:
>
> SELECT *
> FROM ci  b
> WHERE EXISTS (
>     select * FROM (
>         SELECT bu,ndc, count(1)
>         FROM ci
>         GROUP BY bu,ndc
>         HAVING COUNT(1)>1
>     ) AS X
> ) a
> where a.bu=b.bu
> and a.ndc = b.ndc
>
> but this will also delete more rows than you want (I guess). You will
> have to enumerate the duplicate rows somehow (the sequential unique id
> is probably your best option). Here's one attempt:
>
> delete b.*
> from ci as b
> where exists (
>     select 1
>     from ci as c
>     where c.bu = b.bu
>       and c.ndc = b.ndc
>       and c.id < b.id
> )
>
> But mysql does not allow this, so we'll rewrite that as a join:
>
> delete b.*
> from ci as b
> join ci as c
>     on c.bu = b.bu
>     and c.ndc = b.ndc
>     and c.id < b.id
>
> Example:
>
> create table ci (
>     id int not null primary key,
>     bu int not null,
>     ndc int not null
> ) engine = innodb;
>
> insert into ci (id, bu, ndc)
> values (1,1,1),(2,1,1),(3,2,1),(4,2,2),(5,2,2),(6,1,1);
>
> The first query will delete *all* duplicates, i.e. all rows but the one
> with id = 3:
>
> mysql> select b.*
>     -> from ci as b
>     -> join (
>     ->   SELECT bu,ndc
>     ->   FROM ci
>     ->   GROUP BY bu,ndc
>     ->   HAVING  COUNT(1)>1
>     -> ) as a
>     ->     on a.bu=b.bu
>     ->    and a.ndc = b.ndc;
> +----+----+-----+
> | id | bu | ndc |
> +----+----+-----+
> |  1 |  1 |   1 |
> |  2 |  1 |   1 |
> |  4 |  2 |   2 |
> |  5 |  2 |   2 |
> |  6 |  1 |   1 |
> +----+----+-----+
>
> The rewritten query leaves the duplicates with the lowest id:
>
> mysql> select b.* from ci as b
>     -> join ci as c
>     ->     on c.bu = b.bu
>     ->     and c.ndc = b.ndc
>     ->     and c.id < b.id;
> +----+----+-----+
> | id | bu | ndc |
> +----+----+-----+
> |  2 |  1 |   1 |
> |  5 |  2 |   2 |
> |  6 |  1 |   1 |
> |  6 |  1 |   1 |
> +----+----+-----+
> 4 rows in set (0.05 sec)
>
> mysql> delete b.*
>     -> from ci as b
>     -> join ci as c
>     ->    on c.bu = b.bu
>     ->   and c.ndc = b.ndc
>     ->   and c.id < b.id;
> Query OK, 3 rows affected (0.05 sec)
>
> mysql> select * from ci;
> +----+----+-----+
> | id | bu | ndc |
> +----+----+-----+
> |  1 |  1 |   1 |
> |  3 |  2 |   1 |
> |  4 |  2 |   2 |
> +----+----+-----+
> 3 rows in set (0.00 sec)
>
> Don't just take the code and execute it, try understand the similarities
> and differences from the first attempt.
>
> HTH
> /Lennart

Lennart,
Regarding this last delete query, it worked on your small sample
database but I tried it on my db of 46825 records and it didn't work.
It left me with only 24,571 records. I did a query to get the num of
duplicates and there were only 9996. Most of those were only
duplicates, but there were some triplicates etc. So it deleted way
too many and I had to kill the server eventually. I came up with 3
other queries that does the deleting in passes by comparing a table
with the duplicate ids with a table containing the max(id)s. It
seemed to work except something is wrong there also. It deleted all
the duplicates, triplcates, quadruplicates, etc. It should have only
deleted one of the duplicates with the max(id) and left the original
duplicate.

1.create table ci_tmp1 as
(select id,bu,ndc, count(*)
from ci
group by bu,ndc
having COUNT(concat(bu,ndc)) > 1
)

2.create table ci_tmp2 as(select max(c.id) from ci c, ci_tmp1 t
WHERE t.bu=c.bu AND t.ndc=c.ndc
GROUP BY t.bu, t.ndc)

3.DELETE FROM ci WHERE id IN (select * from ci_tmp2)

This delete query got near to the correct amount of duplicates
34,576. If you subtract 9996 from 46825 you get 36,829.
The only problem with this result was I was expecting to have the
triplicates and quadrupleticates left in the ci table and I would have
to run through
the 3 queries again. When I checked the ci table there were no more
duplicates. What I was hoping was that your query results number
would match with
my way of doing it in 3 passes and I would know I could use your query
but now I'm not even sure if my queries are correct? What do you
think?
Regards, Janis
..
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
jr

External


Since: Jan 03, 2011
Posts: 8



(Msg. 13) Posted: Mon Jan 17, 2011 1:45 pm
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 12, 10:10 pm, Lennart Jonsson
wrote:
> On 2011-01-13 00:36, jr wrote:
> [...]
>
>
>
>
>
>
>
>
>
> > The 3rd query,  I did get a syntax error on the select:
> > select *
> > from ci  b
> > where exists(
> > select bu,ndc, count(1)
> > from ci
> > group by bu, ndc
> > having count(1)>1
> > )a
> > where a.bu=b.bu
> > and a.ndc=b.ndc
>
> [...]
> > I got syntax errors.  You mentioned some other reason other than a
> > syntax error?
>
> You already mentioned the two where clauses, fixing that leaves us with:
>
> ... exists (
>     select bu,ndc, count(1)
>     from ci
>     group by bu, ndc
>     having count(1)>1
> ) a
>
> exists is a predicate ( sometimes referred to as propositional function,
> you can think of it as an anonymous function returning true/false ) and
> you can't give it a name ( there is no point in doing so either ).
> Furthermore bu, ndc, etc inside the predicate is not visible outside the
> predicate, so you can't reference them like you do in:
>
>   where a.bu=b.bu
>     and a.ndc=b.ndc
>
> You can however reference variables outside of the predicate from within
> the predicate, so
>
> select *
> from ci  b
> where exists(
>     select bu,ndc, count(1)
>     from ci as a
>
>     where a.bu=b.bu
>     and a.ndc=b.ndc
>
>     group by bu, ndc
>     having count(1)>1
> )
>
> is syntactically correct (will still give you the wrong result).
>
> /Lennart

thanks,
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
Lennart Jonsson

External


Since: Apr 26, 2010
Posts: 22



(Msg. 14) Posted: Tue Jan 18, 2011 1:25 am
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 2011-01-17 22:41, jr wrote:
[...]
> Lennart,
> Regarding this last delete query, it worked on your small sample
> database but I tried it on my db of 46825 records and it didn't work.
> It left me with only 24,571 records. I did a query to get the num of
> duplicates and there were only 9996. Most of those were only
> duplicates, but there were some triplicates etc. So it deleted way
> too many and I had to kill the server eventually. I came up with 3
> other queries that does the deleting in passes by comparing a table
> with the duplicate ids with a table containing the max(id)s. It
> seemed to work except something is wrong there also. It deleted all
> the duplicates, triplcates, quadruplicates, etc. It should have only
> deleted one of the duplicates with the max(id) and left the original
> duplicate.
>
> 1.create table ci_tmp1 as
> (select id,bu,ndc, count(*)
> from ci
> group by bu,ndc
> having COUNT(concat(bu,ndc)) > 1
> )
>

This query is illegal and will give you random results. It will only
work if id is functionally dependent of bu, ndc. You can prevent this
misbehaviour by enabling ONLY_FULL_GROUP_BY in sql_mode, mysql will then
return an error instead of a wrong result.

/Lennart

[...]
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
jr

External


Since: Jan 03, 2011
Posts: 8



(Msg. 15) Posted: Tue Jan 18, 2011 9:09 am
Post subject: Re: delete dupes with a unique id using a temp table [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 17, 9:54 pm, Lennart Jonsson
wrote:
> On 2011-01-17 22:41, jr wrote:
> [...]
>
>
>
>
>
>
>
>
>
> > Lennart,
> > Regarding this last delete query, it worked on your small sample
> > database but  I tried it on my db of 46825 records and it didn't work..
> > It left me with only 24,571 records.  I did a query to get the num of
> > duplicates and there were only 9996.  Most of those were only
> > duplicates, but there were some triplicates etc.  So it deleted way
> > too many and I had to kill the server eventually.   I came up with 3
> > other queries that does the deleting in passes by comparing a table
> > with the duplicate ids with a table containing the max(id)s.  It
> > seemed to work except something is wrong there also. It deleted all
> > the duplicates, triplcates, quadruplicates, etc.  It should have only
> > deleted one of the duplicates with the max(id) and left the original
> > duplicate.
>
> > 1.create table ci_tmp1 as
> > (select id,bu,ndc, count(*)
> > from ci
> > group by bu,ndc
> > having COUNT(concat(bu,ndc)) > 1
> > )
>
> This query is illegal and will give you random results. It will only
> work if id is functionally dependent of bu, ndc. You can prevent this
> misbehaviour by enabling ONLY_FULL_GROUP_BY in sql_mode, mysql will then
> return an error instead of a wrong result.
>
> /Lennart
>
> [...]

thanks for explaining it. It was very useful.
 >> Stay informed about: delete dupes with a unique id using a temp table 
Back to top
Login to vote
Display posts from previous:   
Related Topics:
delete where exists ??? - Hi, if i run the following query: select * from std_order_lines sol where exists(select * from std_orders so, customers c where so.customers_id = c.customers_id and c.test_customer =....

Create table syntax? - Warning: Error while executing this query:CREATE TABLE "purchaseorder" ( "PurchaseOrderID" int(10) unsigned NOT NULL auto_increment, "PurchaseCost" double unsigned zerofill NOT NULL default '0000000000000000000000', &...

syntax problem for CREATE TABLE ?? - Hi, I was playing with MySQL (4.1.9) during the weekend, but noticed a minor problem and would like someone to explain this to me. Basically I could successfully create a table with one unique column like this: mysql&gt; create table t1 (c1 int....

FLUSH TABLES hangs if table is locked - Using FLUSH TABLES via the C query API mysql_query() hangs if the table is locked already. That is to say, nothing prevents me from running a LOCK TABLES twice; it won't tell me "it's already locked, don't try to run a FLUSH". Anyone know ...

Java mysql show table status - Hi, I am using java to retrieve a resultset with "show table status" mysql command. I am having problem reading the [name] and [engine] column. In the metadata object it is showing as VARCHAR but when I do getString() it is returning me the o...
   Database Help (Home) -> mySQL All times are: Pacific Time (US & Canada)
Goto page 1, 2
Page 1 of 2

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]