Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / September 2007

Tip: Looking for answers? Try searching our database.

To remove the duplicate record in the list using java

Thread view: 
timothy ma and constance lee - 18 Sep 2007 03:02 GMT
Sir/Madam

We have a list of Record with the unqiue key like account no, and sequence
no, and the rest of fields are exactly the same.
Any way for java to remove those duplicated records?

Thanks
Roedy Green - 18 Sep 2007 03:07 GMT
On Tue, 18 Sep 2007 02:02:34 GMT, "timothy ma and constance lee"
<timcons1@shaw.ca> wrote, quoted or indirectly quoted someone who said

>We have a list of Record with the unqiue key like account no, and sequence
>no, and the rest of fields are exactly the same.
>Any way for java to remove those duplicated records?

For a canned solution, see http://mindprod.com/products2.html#SORTED
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Andrew Thompson - 18 Sep 2007 04:27 GMT
...
>We have a list of Record with the unqiue key like account no, and sequence
>no, ..

If every record has a unique key formed from account
& seqence number, how cany any two records be
identical, or duplicate?*

>..and the rest of fields are exactly the same.
>Any way for java to remove those duplicated records?

I do not fully understand the question.

The way you describe the records, I guess it
it might be something like (fixed width font
required for proper viewing)..

Acc. #  | Seq. # | Field1  | Field2 | Field3
121045     2       cat       dog      fish
415386     3       giraffe   dog      fish
848345     7       cat       dog      fish
900277     4       frog      cow      whale

..and you are saying you want to remove duplicates
in Fields1 through 3.  So the first and third record are
'duplicates' but the second (with Giraffe) and 4th are
not?

Am I on track so far?

* If that is the case, and records 1 and 3 are
considered 'duplicates' which one should be
dumped?

Signature

Andrew Thompson
http://www.athompson.info/andrew/

timothy ma and constance lee - 18 Sep 2007 04:35 GMT
Andrew

Something like that

Account No  Seq No    Name
123456        001          abc
123456        001          abc
123234        001           xyz
123234        002           abd1
123421        002           ijk

The message may be from some Mainframe that we dont want fix it Mainframe
level. SImply using java to remove the duplicated one:

123456    001    abc

Thanks

> ..
>>We have a list of Record with the unqiue key like account no, and sequence
[quoted text clipped - 29 lines]
> considered 'duplicates' which one should be
> dumped?
Daniel Pitts - 18 Sep 2007 05:40 GMT
On Sep 17, 8:35 pm, "timothy ma and constance lee" <timco...@shaw.ca>
wrote:
> > ..
> >>We have a list of Record with the unqiue key like account no, and sequence
[quoted text clipped - 53 lines]
>
> Thanks

Try using a Set (probably HashSet or LinkedHashSet depending).  You'll
have to make sure that your object properly implements hashCode() and
equals(), but that shouldn't be too hard...

Also, please don't top post.

4. It makes it hard to follow the conversation.
3. Why is top-posting bad?
2. Please don't top post.
1. I like to top post.

Good luck,
Daniel.
Andrew Thompson - 18 Sep 2007 05:55 GMT
...

Please refrain from top-posting.  I find it most confusing.
Notice how both Roedy and myself put our comments
directly after anything worth replying to?  In best situations,
we would then trim other parts of earlier messages that
we are not commenting on.  That technique is known as
'in-line with trim' posting - and is much easier to follow.

>Something like that
>
[quoted text clipped - 4 lines]
>123234        002           abd1
>123421        002           ijk

OK.  So I was wrong in guessing that the Acc./Seq. #
was unique in all cases - they can also be duplicate.

>The message may be from some Mainframe that we dont want fix it Mainframe
>level. SImply using java to remove the duplicated one:
>
>123456    001    abc

I suspect (without looking at the link Roedy posted)
that sorting the records is one technique that might
identify duplicates, but there are also other ways.

For example, you might iterate the entire original list
and on each iteration of the loop.
- Make an object that uses all the fields as a 'key'
- Use that  key to check if a record with that key
already exists in a HashMap.
- If not..
  - add the object to the HashMap,
..else..
 - discard it as a duplicate.

At the end of the loop, the HashMap should contain
only the unique records.

Signature

Andrew Thompson
http://www.athompson.info/andrew/

Lew - 18 Sep 2007 22:45 GMT
timothy ma and constance lee wrote:
>> Account No  Seq No    Name
>> 123456        001          abc
>> 123456        001          abc
>> 123234        001           xyz
>> 123234        002           abd1
>> 123421        002           ijk

> OK.  So I was wrong in guessing that the Acc./Seq. #
> was unique in all cases - they can also be duplicate.

That wasn't a guess:

timothy ma and constance lee wrote:
> We have a list of Record with the unqiue [sic] key
> like account no, and sequence no,

They actually said so, then contradicted it with the data example.

Signature

Lew

Piotr Kobzda - 18 Sep 2007 16:05 GMT
> The message may be from some Mainframe that we dont want fix it Mainframe
> level.

How do you receive the message?

If your records comes from the SQL database, you may simply achieve your
goal using "SELECT DISTINCT ... " instead of a regular "SELECT ..."
statement.

If there are some additional data being read with an SQL query, there is
usually also possibility to read the rows in order which is consistent
(partially at least) with your uniqueness key.  Because database keys
are usually already indexed, it should cost nothing if you'll choose
your database keys in ORDER BY clause to achieve the right order.
Having even partially sorted records at Java side you may significantly
seed up your process (of course, if it all is really worth of it).

Otherwise, just follow some of the already suggested solutions.

piotr
Greg R. Broderick - 18 Sep 2007 23:50 GMT
"timothy ma and constance lee" <timcons1@shaw.ca> wrote in news:_HHHi.194784
$fJ5.28279@pd7urf1no:

> Andrew
>
[quoted text clipped - 11 lines]
>
> 123456    001    abc

Questions:

1.  do you already have a Java object that encapsulates a record of data?  If
not, can you implement one?

2.  does this Java object implement java.lang.Comparable?  If not, can it be
made to do so?

3.  if you have two duplicate records in the sequence of records, would you
rather end up (after you do your processing to remove duplicates) wiht the
first record, or would you rather end up with the last record of the
duplicates?

Suggestion:

use a java.util.Set of objects (that must implement Comparable) to eliminate
duplicates.  When you've added all of your collection of record objects to
the Set, you will end up with a collection with no duplicates.

Regards
GRB

Signature

---------------------------------------------------------------------
Greg R. Broderick                  usenet200709@blackholio.dyndns.org

A. Top posters.
Q. What is the most annoying thing on Usenet?
---------------------------------------------------------------------

Lew - 19 Sep 2007 00:10 GMT
> 3.  if you have two duplicate records in the sequence of records, would you
> rather end up (after you do your processing to remove duplicates) wiht the
> first record, or would you rather end up with the last record of the
> duplicates?

A meaningless distinction in many data systems, such as SQL-based ones.

For example, SQL queries make no promises about order of records absent an
ORDER BY clause, and even then, none about ordering of equal values.

Signature

Lew



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.