Java Forum / General / December 2006
Merging Linked Lists
Damo - 19 Dec 2006 19:38 GMT Hi,
I (will) have anythng up to 6 Linked Lists of strings. I want to merge them and remove duplicate entries at the same time. So that I end up with one Linked List with every node containing a distinct string. I dont really want(need) to sort the list. Does anyone know how I would go about doing this efficiently. Any advice would be much appreciated.
Daniel Dyer - 19 Dec 2006 19:59 GMT > Hi, > [quoted text clipped - 4 lines] > go about doing this efficiently. > Any advice would be much appreciated. Is LinkedList the best data structure for your needs? Perhaps a Set would be better if you need no duplicates and don't care about sorting?
 Signature Daniel Dyer http://www.uncommons.org
ballpointpenthief - 19 Dec 2006 20:54 GMT > Hi, > [quoted text clipped - 4 lines] > go about doing this efficiently. > Any advice would be much appreciated. Read Knuth Vol 3 (I think). I expect sorting before merging, then merging if no duplicates will be more efficient.
Damo - 19 Dec 2006 22:22 GMT > > Hi, > > [quoted text clipped - 8 lines] > I expect sorting before merging, then merging if no duplicates will be > more efficient. But its possible that two of the Linked lists will have the same item. so even if theyre sorted i only want to merge distinct items from the second to the first. The only thing I'm sure of is that each list will not have duplicates within them.
Daniel Pitts - 19 Dec 2006 23:11 GMT > > > Hi, > > > [quoted text clipped - 13 lines] > second to the first. The only thing I'm sure of is that each list will > not have duplicates within them. Is there any reason not to use a Set instead?
Damo - 19 Dec 2006 23:23 GMT > Is there any reason not to use a Set instead? Ye, I was just reading about Sets , there and the more I read the more inclined I am to use them. How are they as regards efficiency as opposed to linked lists? Efficiency is my main prioity for this project.
Daniel Pitts - 19 Dec 2006 23:49 GMT > > Is there any reason not to use a Set instead? > > Ye, I was just reading about Sets , there and the more I read the more > inclined I am to use them. How are they as regards efficiency as > opposed to linked lists? Efficiency is my main prioity for this project. Well, it certainly depends on what your using them for. However, efficiency should NEVER be the main priority of any project. Creating something that WORKS should be the main priority. <soap-box> You need to make something that works and is easy to read. After that, if it runs to slowly, you can run it through a profiler and figure out what parts of it are taking the most amount of time. You'd often be surprised.
If you start with something that works and is easy to maintain, then tweaking the speed will be a lot easier. If you worry about it too soon (its called premature optimization), then you might end up with a very slow program anyway, but it would be harder to optimize the RIGHT parts. </soap-box> In any case: HashSets are often more effecient when adding and removing arbitrary elements if you need to ensure uniqueness. They are of comparable speed when iterating through the contained values. ArrayLists are generally faster than LinkedLists if you only add/remove from the end of the list, but look up arbitrary elements in the middle of the list.
If you need to maintain order, then look into LinkedHashSet. While it has a little more overhead than HashSet, it can be faster when iterating over the entire set.
So. My professional suggestion is to make all of your types "Set" (or "Collection" is better, depending on what you need to do), and instantiate with "new HashSet()". If you find, once you finish the working project, that the HashSet is too slow for one reason or another, it will be easy to modify your code to use a different type of Collection, better suited to your domain.
Lee Weiner - 20 Dec 2006 00:00 GMT >> Is there any reason not to use a Set instead? > >Ye, I was just reading about Sets , there and the more I read the more >inclined I am to use them. How are they as regards efficiency as >opposed to linked lists? Efficiency is my main prioity for this project. Efficiency measured how? Do Sets use more resources than LinkedLists? Yes. The whole reason for a Set to exist is to eliminate duplicates. That's what they do, that's all they do. AFAIC, a Set is the most efficient tool for your requirement.
Lee Weiner lee AT leeweiner DOT org
Daniel Pitts - 20 Dec 2006 00:05 GMT > >> Is there any reason not to use a Set instead? > > [quoted text clipped - 9 lines] > Lee Weiner > lee AT leeweiner DOT org Actually, a set is likely more effecient in some ways. Also, they are for much more than just eliminating duplicates.
For example, they are also for effecient "contains" checks.
Damo - 20 Dec 2006 01:14 GMT Thanks for the words of advice. I think i'll put some more thought into which one I use as I need some of the advantages of several of them. With Linked lists it takes O(1) to insert into a list at an arbirtrary position, right? My plan was to iterate through the lists , if the first list contains the current item in the second list,skip it or else insert it into the corrsponding position in the first list. this should I think return one List with no duplicates
Patricia Shanahan - 20 Dec 2006 13:35 GMT > Thanks for the words of advice. I think i'll put some more thought into > which one I use as I need some of the advantages of several of them. > With Linked lists it takes O(1) to insert into a list at an arbirtrary > position, right? If you do anything with a specified index, LinkedList scans from the nearer end of the list to find the place. Finding the next element from an iterator should be faster.
> My plan was to iterate through the lists , if the first list contains > the current item in the second list,skip it or else insert it into the > corrsponding position in the first list. > this should I think return one List with no duplicates That sounds like an O(n*n) method for merging two lists, each containing n elements, even if you use iterators.
Sort-and-merge is O(n log n). You only have to look at the head elements of each list, move the smaller to the result, dropping any duplicates from either list.
Dumping all the data into a HashSet is O(n). It is also VERY simple to code, using addAll.
Patricia
Oliver Wong - 20 Dec 2006 20:44 GMT > Thanks for the words of advice. I think i'll put some more thought into > which one I use as I need some of the advantages of several of them. > With Linked lists it takes O(1) to insert into a list at an arbirtrary > position, right? Are we talking about linked lists, or are we talking about Sun's class, java.util.LinkedList? Inserting into java.util.LinkedList at an arbitrary position is O(n), the implementation has to "seek" to that arbitrary position. A different implementation of linked list might have different asymptotic runtime performance.
> My plan was to iterate through the lists , if the first list contains > the current item in the second list,skip it or else insert it into the > corrsponding position in the first list. Given that you said the list are unsorted, how would you determine what the "corresponding" position is?
> this should I think return one List with no duplicates Yes, but using a HashSet will probably be faster.
- Oliver
Jhair Tocancipa Triana - 20 Dec 2006 15:13 GMT > Hi, > I (will) have anythng up to 6 Linked Lists of strings. I want to merge > them and remove duplicate entries at the same time. If you don't want duplicates, what about using Sets instead of a Lists?
 Signature -- Jhair
Lew - 20 Dec 2006 15:58 GMT Damo writes:
>> I (will) have anythng up to 6 Linked Lists of strings. I want to merge >> them and remove duplicate entries at the same time. Daniel Dyer wrote:
> Is LinkedList the best data structure for your needs? Perhaps a Set > would be better if you need no duplicates and don't care about sorting? Daniel Pitts wrote:
> Is there any reason not to use a Set instead? Daniel Pitts wrote:
> So. My professional suggestion is to make all of your types "Set" (or > "Collection" is better, depending on what you need to do), and > instantiate with "new HashSet()". Lee Weiner wrote:
> The whole reason for a Set to exist is to eliminate duplicates. That's what > they do, that's all they do. AFAIC, a Set is the most efficient tool for > your requirement. Daniel Pitts wrote:
> Actually, a set is likely more efficient in some ways. Also, they are > for much more than just eliminating duplicates. > > For example, they are also for efficient "contains" checks. Patricia Shanahan wrote:
>> My plan was to iterate through the lists , if the first list contains >> the current item in the second list,skip it or else insert it into the [quoted text clipped - 10 lines] > Dumping all the data into a HashSet is O(n). It is also VERY simple to > code, using addAll.
> If you don't want duplicates, what about using Sets instead of a > Lists? Get the hint??
- Lew
Remon van Vliet - 21 Dec 2006 13:48 GMT > Get the hint?? > > - Lew Sarcasm is a wonderful thing ;)
Lew - 21 Dec 2006 15:09 GMT >> Get the hint?? >> >> - Lew > > Sarcasm is a wonderful thing ;) Is it sarcasm? I thought it was emphasis that many folks kept repeating the same advice over and over again, with varying degrees of supporting and compelling evidence, and that the advice was not being absorbed.
If one person calls you a horse, they're a kook. If two people call you a horse, it's a joke or a conspiracy. If three people call you a horse, it's time to buy a saddle.
People kept responding to the OP with good advice. More people kept responding with the same good advice. And still more people kept responding with still the same good advice. People put themselves out in good faith trying to help the OP; their efforts seemed to be in vain. There was no sarcasm in pointing that out.
- Lew
John Ersatznom - 21 Dec 2006 15:42 GMT > If one person calls you a horse, they're a kook. > If two people call you a horse, it's a joke or a conspiracy. > If three people call you a horse, it's time to buy a saddle. No, if three people call you a horse, it's time to feed your killfile, if need be first saying sayonara to google groups or anything of the kind. Who has time for anyone that can't even correctly identify you to species? :)
Daniel Pitts - 21 Dec 2006 16:31 GMT > >> Get the hint?? > >> [quoted text clipped - 17 lines] > > - Lew I'm not so sure it was in vain... Damo hasn't argued with the point ([s]he's no Twisted after all). There just isn't an evidence that Damo has tried with the Set and succeeded or otherwise. Perhaps a follow up with be nice Damo?
Twisted - 21 Dec 2006 16:59 GMT > Damo hasn't argued with the point [implied insult directed towards me deleted] Oh, no, not this sh.t again.
Daniel Pitts - 21 Dec 2006 18:31 GMT > > Damo hasn't argued with the point [implied insult directed towards me deleted] > > Oh, no, not this sh.t again. You're right, that was uncalled for by me. I apologize.
bugbear - 21 Dec 2006 17:02 GMT > Hi, > [quoted text clipped - 4 lines] > go about doing this efficiently. > Any advice would be much appreciated. 1) bear in mind what other have said about not using lists at all.
2) If you don't mind some temporary storage,
LinkedList merge(List multipleLists) { LinkedList newList = new LinkedList(); Set newListSet = new HashSet();
for(Iterator li = multipleLists.iterator(); li.hasNext();) { List l = (List)li.next(); for(Iterator i = l.iterator(); i.hasNext();) { Object o = i.next(); if(!newListSet.contains(o)) { newListSet.add(o); newList.add(o); } } } return newList; }
may serve (untested code)
I'm assuming your multiple lists are held in a list... I think that's O(N).
It also kind of retains the order of the input lists
BugBear
djthomp - 21 Dec 2006 19:54 GMT > > Hi, > [quoted text clipped - 32 lines] > > BugBear It can be simpler if you take advantage of set's addAll method:
LinkedList merge(List multipleLists) { Set newListSet = new HashSet();
for(Iterator li = multipleLists.iterator(); li.hasNext();) newListSet.addAll((List)li.next());
return new LinkedList(newListSet); }
Daniel Pitts - 21 Dec 2006 20:33 GMT > > > Hi, > > [quoted text clipped - 43 lines] > return new LinkedList(newListSet); > } Or, using a more type safe, Java 1.5 approach:
public static <E> List<E> merge( Iterable<? extends Collection<? extends E>> collections) { final Set<E> set = new LinkedHashSet<E>(); for (Collection<? extends E> collection: collections) { set.addAll(collection); } return new LinkedList<E>(set); }
Damo - 21 Dec 2006 21:35 GMT hello all, I was'nt ignoring you're advice. I was just exploring all the options, seeing as I need some of the advantages of Linked Lists and Sets. In particular the no duplicate feature of sets and i needed to keep the order items were added(not necessarlly sorted)(Linked Lists), and inserting into a particulaar point in the colection(Linked Lists). I have been trying to work with sets but they dont seem to suit my needs. Its too convoluted to retain the ordering of the collection Anyway you're words of advice are appreciated.
djthomp - 21 Dec 2006 21:50 GMT > hello all, > I was'nt ignoring you're advice. I was just exploring all the options, [quoted text clipped - 5 lines] > needs. Its too convoluted to retain the ordering of the collection > Anyway you're words of advice are appreciated. Before choosing not to use a Set because of order retention problems, you might take at LinkedHashSet. It keeps track of item order by using saving the insertion order with an internal linked list. Iterating over the LinkedHashSet gives you its contents in their original order.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedHashSet.html
djthomp - 21 Dec 2006 21:52 GMT > > hello all, > > I was'nt ignoring you're advice. I was just exploring all the options, [quoted text clipped - 10 lines] > > http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedHashSet.html ...you might take -a look- at LinkedHashSet. Sigh.
Patricia Shanahan - 21 Dec 2006 23:19 GMT >>> hello all, >>> I was'nt ignoring you're advice. I was just exploring all the options, [quoted text clipped - 12 lines] > > ...you might take -a look- at LinkedHashSet. Sigh. More generally, there are java.util.Set implementations that provide each of the common forms of ordering, so I tend to think Set as soon as I know I want to avoid duplicates, even if it needs to be ordered.
Don't care about order: HashSet
Insertion order: LinkedHashSet
Natural order: TreeSet
Comparator-specified order: TreeSet
Patricia
John Ersatznom - 22 Dec 2006 10:31 GMT > Don't care about order: HashSet > [quoted text clipped - 5 lines] > > Patricia Can any of these support inserting a new element into the middle of the ordering? Well, the first can't, and the third and fourth can if you do the "BASIC line numbering trick" and give every item a position number, where you leave gaps so you can fill in missing positions later. That's a bit of a hack (but it can be done, e.g. using doubles and suggesting the client code insert baz between foo and bar by setting (foo.getPosition() + bar.getPosition())/2 as the position for baz.
Of course, once you are comparing stuff using a position number field you set to suit, you're looking at a comparator that isn't consistent with equals, which is going to bomb your TreeSet anyway. Now you need the objects to use their position number to decide equality (and hash code) too. You probably actually want to wrap them with a
public class PositionalHolder<T> implements Comparable<PositionalHolder<T>> { public double position; public T object; public PositionalHolder (T object, double position) { this.object = object; this.position = position; } @SuppressWarnings("unchecked") public void equals (Object x) { return (x == this) ||(x instanceof PositionalHolder && ((PositionalHolder)x).position == position); } public int hashCode () { return Double.valueOf(position).hashCode(); } public int compareTo (PositionalHolder<T> other) { return (position < other.position) ?-1:((position > other.position)?1:0); } public void putBetween (PositionalHolder<T> x, PositionalHolder<T> y) { // Convenience method. position = (x.position + y.position)/2; } }
Of course, Bad Things will likely happen if you a) use infs or nans as positions, b) use the same position for multiple objects in the same ordered collection, or c) change something's position while it's in one rather than remove it, change its position, and then reinsert it. You might want to prevent the last by making the PositionalHolder immutable with a (x, object, y) "between" constructor, a just-an-object constructor (position gets zero), a "before" constructor (object, y) that uses y.position - 1, and an "after" constructor (x, object) that uses x.position + 1. (This overload is clearly ambiguous if you try to put a PositionalHolder inside another PositionalHolder but why the hell would you?)
John Ersatznom - 22 Dec 2006 12:48 GMT > public class PositionalHolder<T> > implements Comparable<PositionalHolder<T>> { [quoted text clipped - 24 lines] > } > } Naturally, it was right after posting this that I came up with the ArbitraryHolder, which holds one object it "holds" and a second whose equals(), hashCode(), and compareTo() get used. And can be constructed with an explicit comparator to use. On either the "held" object or the object used to define the "equals" and "hashCode" behavior...
Oliver Wong - 21 Dec 2006 21:51 GMT > i needed to keep the > order items were added(not necessarlly sorted)(Linked Lists) Maybe you can elaborate on this requirement to help us provide you with better advice.
- Oliver
bugbear - 22 Dec 2006 10:23 GMT > Or, using a more type safe, Java 1.5 approach: > [quoted text clipped - 6 lines] > return new LinkedList<E>(set); > } Yeah - sorry, we're still using Java 1.4 here, for reasons of install base compatibility.
BugBear
Daniel Pitts - 23 Dec 2006 19:30 GMT > > Or, using a more type safe, Java 1.5 approach: > > [quoted text clipped - 11 lines] > > BugBear I feel really sorry for you. Java 1.5 is a vast improvement over 1.4. In my opinion, the distance between 1.1 and 1.4 is the same as 1.4 to 1.5.
I find it a lot easier to upgrade an install base, than to maintain 1.4 code.
Daniel.
John Ersatznom - 24 Dec 2006 14:10 GMT > I find it a lot easier to upgrade an install base, than to maintain 1.4 > code. Agreed. All those casts! They make pre-1.5 Java the first runner up for the Too Damn Many Parentheses Award. (We all know who the actual winner is, I trust.)
Daniel Pitts - 24 Dec 2006 19:37 GMT > > I find it a lot easier to upgrade an install base, than to maintain 1.4 > > code. > > Agreed. All those casts! They make pre-1.5 Java the first runner up for > the Too Damn Many Parentheses Award. (We all know who the actual winner > is, I trust.) Whitespace's cousin Parentheses?
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|