...
>..and the rest of fields are exactly the same.
>Any way for java to remove those duplicated records?

Signature
Andrew Thompson
http://www.athompson.info/andrew/
Andrew
Something like that
Account No Seq No Name
123456 001 abc
123456 001 abc
123234 001 xyz
123234 002 abd1
123421 002 ijk
The message may be from some Mainframe that we dont want fix it Mainframe
level. SImply using java to remove the duplicated one:
123456 001 abc
Thanks
> ..
>>We have a list of Record with the unqiue key like account no, and sequence
[quoted text clipped - 29 lines]
> considered 'duplicates' which one should be
> dumped?
Daniel Pitts - 18 Sep 2007 05:40 GMT
On Sep 17, 8:35 pm, "timothy ma and constance lee" <timco...@shaw.ca>
wrote:
> > ..
> >>We have a list of Record with the unqiue key like account no, and sequence
[quoted text clipped - 53 lines]
>
> Thanks
Try using a Set (probably HashSet or LinkedHashSet depending). You'll
have to make sure that your object properly implements hashCode() and
equals(), but that shouldn't be too hard...
Also, please don't top post.
4. It makes it hard to follow the conversation.
3. Why is top-posting bad?
2. Please don't top post.
1. I like to top post.
Good luck,
Daniel.
Andrew Thompson - 18 Sep 2007 05:55 GMT
...
Please refrain from top-posting. I find it most confusing.
Notice how both Roedy and myself put our comments
directly after anything worth replying to? In best situations,
we would then trim other parts of earlier messages that
we are not commenting on. That technique is known as
'in-line with trim' posting - and is much easier to follow.
>Something like that
>
[quoted text clipped - 4 lines]
>123234 002 abd1
>123421 002 ijk
OK. So I was wrong in guessing that the Acc./Seq. #
was unique in all cases - they can also be duplicate.
>The message may be from some Mainframe that we dont want fix it Mainframe
>level. SImply using java to remove the duplicated one:
>
>123456 001 abc
I suspect (without looking at the link Roedy posted)
that sorting the records is one technique that might
identify duplicates, but there are also other ways.
For example, you might iterate the entire original list
and on each iteration of the loop.
- Make an object that uses all the fields as a 'key'
- Use that key to check if a record with that key
already exists in a HashMap.
- If not..
- add the object to the HashMap,
..else..
- discard it as a duplicate.
At the end of the loop, the HashMap should contain
only the unique records.

Signature
Andrew Thompson
http://www.athompson.info/andrew/
Lew - 18 Sep 2007 22:45 GMT
timothy ma and constance lee wrote:
>> Account No Seq No Name
>> 123456 001 abc
>> 123456 001 abc
>> 123234 001 xyz
>> 123234 002 abd1
>> 123421 002 ijk
> OK. So I was wrong in guessing that the Acc./Seq. #
> was unique in all cases - they can also be duplicate.
That wasn't a guess:
timothy ma and constance lee wrote:
> We have a list of Record with the unqiue [sic] key
> like account no, and sequence no,
They actually said so, then contradicted it with the data example.

Signature
Lew
Piotr Kobzda - 18 Sep 2007 16:05 GMT
> The message may be from some Mainframe that we dont want fix it Mainframe
> level.
How do you receive the message?
If your records comes from the SQL database, you may simply achieve your
goal using "SELECT DISTINCT ... " instead of a regular "SELECT ..."
statement.
If there are some additional data being read with an SQL query, there is
usually also possibility to read the rows in order which is consistent
(partially at least) with your uniqueness key. Because database keys
are usually already indexed, it should cost nothing if you'll choose
your database keys in ORDER BY clause to achieve the right order.
Having even partially sorted records at Java side you may significantly
seed up your process (of course, if it all is really worth of it).
Otherwise, just follow some of the already suggested solutions.
piotr
Greg R. Broderick - 18 Sep 2007 23:50 GMT
"timothy ma and constance lee" <timcons1@shaw.ca> wrote in news:_HHHi.194784
$fJ5.28279@pd7urf1no:
> Andrew
>
[quoted text clipped - 11 lines]
>
> 123456 001 abc
Questions:
1. do you already have a Java object that encapsulates a record of data? If
not, can you implement one?
2. does this Java object implement java.lang.Comparable? If not, can it be
made to do so?
3. if you have two duplicate records in the sequence of records, would you
rather end up (after you do your processing to remove duplicates) wiht the
first record, or would you rather end up with the last record of the
duplicates?
Suggestion:
use a java.util.Set of objects (that must implement Comparable) to eliminate
duplicates. When you've added all of your collection of record objects to
the Set, you will end up with a collection with no duplicates.
Regards
GRB

Signature
---------------------------------------------------------------------
Greg R. Broderick usenet200709@blackholio.dyndns.org
A. Top posters.
Q. What is the most annoying thing on Usenet?
---------------------------------------------------------------------
Lew - 19 Sep 2007 00:10 GMT
> 3. if you have two duplicate records in the sequence of records, would you
> rather end up (after you do your processing to remove duplicates) wiht the
> first record, or would you rather end up with the last record of the
> duplicates?
A meaningless distinction in many data systems, such as SQL-based ones.
For example, SQL queries make no promises about order of records absent an
ORDER BY clause, and even then, none about ordering of equal values.

Signature
Lew