Java Forum / General / January 2008
Best way to loop through ArrayList and remove elements on the way?
Kevin - 28 Jan 2008 22:46 GMT Hi guys,
Just want to confirm:
for a ArrayList, in single thread mode (only one thread will access this ArrayList), what is the best way to:
1) loop through all the element of this arraylist (for example, each item of it is a String). 2) do some check on each item (for example, check if it equals to "abc"). 3) remove this item from the arraylist if the above check is true.
Is using iterator() and then use iterator.remove() the best way? Like:
for (Iterator it = myarraylist.iterator(); it.hasNext(); ) { String s = (String) it.next(); if (s.equals("abc")) { it.remove(); } };
Thanks a log.
Daniele Futtorovic - 28 Jan 2008 23:18 GMT > Hi guys, > [quoted text clipped - 15 lines] > > Thanks a log. "Best" with respects to what?
The code above will be more efficient on a LinkedList than on an ArrayList. As a general rule, when doing filtering operations (conditional removing/keeping) on array-backed structures (ArrayList, StringBuffer, etc.), it is more efficient, especially if the structure is big, to copy those elements which are to be kept into a new structure, and then to discard the old one, instead of perform multiple deletions. Only if there would be a significant number (more than a couple) of deletions, of course. Your mileage may vary, but it takes a lot off the CPU and adds only little memory overhead.
DF.
Lasse Reichstein Nielsen - 28 Jan 2008 23:30 GMT > for a ArrayList, in single thread mode (only one thread will access > this ArrayList), what is the best way to: [quoted text clipped - 4 lines] > "abc"). > 3) remove this item from the arraylist if the above check is true. First of all, an ArrayList takes linear time to remove a single element, so repeated removal is a bad idea.
Either use a LinkedList, which can remove in constant time during iteration, or first collect the elements in a HashSet and then remove them at the end using Collection.removeAll(Collection). I suggest the latter.
For ArrayList, the removeAll method takes time proportional to the size of the list times the lookup time for the argument collection. Using a HashSet as argument should minimize the time it takes.
If you only remove a few values, any collection will probably suffice.
I.e., either:
LinkedList tmpLinkedList = new LinkedList(myarraylist); for(Iterator iter = tmpLinkedList.iterator(); iter.hasNext();) { if (test(iter.next)) { iter.remove(); } } myarraylist.clear(); myarraylist.addAll(tmpLinkedList);
or
HashSet toRemove = new HashSet(); for(Iterator iter = myarraylist.iterator(); iter.hasNext();) { Object elem = iter.next(); if (test(elem)) { toRemove.add(elem); } } myarraylist.removeAll(toRemove);
This is assuming the test is complicated. If it's just equality check, then you might be able to use removeAll directly.
> Is using iterator() and then use iterator.remove() the best way? For the above reasons, I'd say no.
> Like: > [quoted text clipped - 6 lines] > } > }; In this simple case, where you only compare for equality (and even to only a single value), it would suffice to do: myarraylist.removeAll(Collections.singletonSet("abc")); If you have more values, but still only do equality checks, just create a collection and remove them all. /L
 Signature Lasse Reichstein Nielsen - lrn@hotpop.com DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
Kevin - 29 Jan 2008 00:25 GMT Thanks guys!
So remove(object) is linear time with respect to the ArrayList's lenght, right?
Is using iterator.remove() still O(n) time? I think the iterator already keep the "reference" to the current item (for example, in my previous example code, it "points" to the current item already), and if we want to remove it, there is no need to look for it in the arraylist, so it just removes it directly, which should be constant time, right?
Thanks.
Mike Schilling - 29 Jan 2008 04:34 GMT > Thanks guys! > > So remove(object) is linear time with respect to the ArrayList's > lenght, right? Yes. So is remove(int), but remove(object) has a larger constant. remove(m) for an ArrayList of size n needs to:
1. copy the references numbered from m+1 to n-1 to the positions m to n-2 respectively 2. Set the reference at m-1 to null.
remove(o) has to
a. Find the index m of the first o in the ArrayList b. perform steps 1 and 2 above.
> Is using iterator.remove() still O(n) time? Yes, because it amounts to remove(int).
Daniele Futtorovic - 29 Jan 2008 00:52 GMT >> for a ArrayList, in single thread mode (only one thread will access >> this ArrayList), what is the best way to: [quoted text clipped - 15 lines] > size of the list times the lookup time for the argument collection. > Using a HashSet as argument should minimize the time it takes. I don't quite understand that advice. The docs for removeAll(Collection) in AbstractCollection (neither AbstractList nor ArrayList override that method) state: "This implementation iterates over this collection, checking each element returned by the iterator in turn to see if it's contained in the specified collection. If it's so contained, it's removed from this collection with the iterator's remove method".
Which is what the OP was doing -- except for the part of the lookup. Sure, using removeAll is cleaner, and, as opposed to removing one "type" (as defined by equality relations in this context) at once, it is superior. But only in that it avoids multiple iterations. The most itensive operation here, however, isn't the iteration, but the removal. Which your suggestion does not affect.
As far as I can surmise, there are only two ways of substantially improving the algorithm: either switch to a sequential list (and then using removeAll would be a good idea too), or give up on having the filtering modify the list at all, but rather yield a new one.
DF.
Lasse Reichstein Nielsen - 29 Jan 2008 07:47 GMT > I don't quite understand that advice. The docs for removeAll(Collection) > in AbstractCollection (neither AbstractList nor ArrayList override that [quoted text clipped - 3 lines] > specified collection. If it's so contained, it's removed from this > collection with the iterator's remove method". True, my mistake.
I was looking at the GNU Classpath implementation of ArrayList, which does implement an optimized removeAll, without noticing that it wasn't the Sun version. <URL:http://www.docjar.com/html/api/java/util/ArrayList.java.html>
/L
 Signature Lasse Reichstein Nielsen - lrn@hotpop.com DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
Daniele Futtorovic - 29 Jan 2008 08:31 GMT >> I don't quite understand that advice. The docs for >> removeAll(Collection) in AbstractCollection (neither AbstractList [quoted text clipped - 12 lines] > > /L I see. They've forgotten to modify the Javadoc accordingly, though. In their AbstractCollection class they still write it's done using an Iterator. Which, at that level, is correct, of course. They would have to override it ArrayList solely for the doc, I suppose.
I started wondering why this hadn't been added to the core classes, fired up the bugparade, and here we are: <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6529800> ... is flagged as "Closed, fixed" (took them til 1.6!!).
The ArrayList class in my SDK doesn't contain any such implementation, but it's 1.6.0_02 (yeah, I should update). However, I can't seem to find the corresponding bugID in the release notes either: <http://java.sun.com/javase/6/webnotes/ReleaseNotes.html>
:scratches head: DF.
Patricia Shanahan - 29 Jan 2008 00:56 GMT > Hi guys, > [quoted text clipped - 22 lines] > > Thanks a log. From the point of view of coding simplicity, I think it is the best way.
If it becomes a performance bottleneck, investigate whether you should switch to LinkedList or take other steps to reduce the cost.
Patricia
Roedy Green - 29 Jan 2008 02:00 GMT On Mon, 28 Jan 2008 14:46:35 -0800 (PST), Kevin <happy_singapore@yahoo.com.sg> wrote, quoted or indirectly quoted someone who said :
>Is using iterator() and then use iterator.remove() the best way? >Like: yes.
 Signature Roedy Green, Canadian Mind Products The Java Glossary, http://mindprod.com
Kevin McMurtrie - 29 Jan 2008 07:29 GMT In article <0fc7f374-c200-4032-a765-7b32bc3fed96@j20g2000hsi.googlegroups.com>,
> Hi guys, > [quoted text clipped - 22 lines] > > Thanks a log. Removing from ArrayList requires shifting elements so it can be slow. If an ArrayList is still the right tool, it may be faster in some cases to build a new list based on what wouldn't be removed. If you can't rebuild, you can still get an improvement by going backwards through the list.
 Signature I don't read Google's spam. Reply with another service.
Boris Stumm - 29 Jan 2008 14:14 GMT > for a ArrayList, in single thread mode (only one thread will access > this ArrayList), what is the best way to: [quoted text clipped - 6 lines] > > Is using iterator() and then use iterator.remove() the best way? I do not think so, as stated in the other replies. An Iterator moves from the beginning of the ArrayList to the end, so that the complete operation will terminate in O(n^2). However, as long as speed is not an issue (or the lists are small), this approach is the cleanest one.
The fastest way to check all elements in an ArrayList, and remove some of them is probably the following (assuming that the list is NOT SORTED!)
ArrayList list = ...; for (int i = list.size() - 1; i >= 0; i--) { if (elementNeedsToBeRemoved(list.get(i))) { list.set(i, list.get(list.size() -1 )); list.remove(list.size() - 1); } }
This will work in O(n).
Patricia Shanahan - 29 Jan 2008 15:08 GMT >> for a ArrayList, in single thread mode (only one thread will access >> this ArrayList), what is the best way to: [quoted text clipped - 11 lines] > operation will terminate in O(n^2). However, as long as speed is not > an issue (or the lists are small), this approach is the cleanest one. Isn't the iterator-based version O(n*(m+1)) where n is the length of the list and m is the number of items being removed?
Whether this is equivalent to O(n), O(n^2), or something in between depends on the relationship between n and m.
Patricia
Lasse Reichstein Nielsen - 30 Jan 2008 00:18 GMT > Isn't the iterator-based version O(n*(m+1)) where n is the length of the > list and m is the number of items being removed? It's O(n*m*k) where n is the size of the list, m is the number of elements removed (it's an average, removing the first element is more expensive than the last element), and k is the time it takes to check whether an element is in the argument collection.
> Whether this is equivalent to O(n), O(n^2), or something in between > depends on the relationship between n and m. Worst case is O(n^2), so it's fair to use that as an upper bound on the complexity of the algorithm.
/L
 Signature Lasse Reichstein Nielsen - lrn@hotpop.com DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
Patricia Shanahan - 30 Jan 2008 00:43 GMT >> Isn't the iterator-based version O(n*(m+1)) where n is the length of the >> list and m is the number of items being removed? [quoted text clipped - 3 lines] > expensive than the last element), and k is the time it takes to check > whether an element is in the argument collection. I don't understand why you multiply together m and k.
The cost of examining each element is O(n*k), and the cost of removing the elements that need removing is O(n*m*j), where j is the cost of shifting one element down the array. We can treat at least one of k and j as being one, because constant factors don't affect complexity, so that can reduce to O(n*k+n*m)=O(n*(k+m)). I was treating them both as being one unit, so I reduced it to O(n*(m+1)).
>> Whether this is equivalent to O(n), O(n^2), or something in between >> depends on the relationship between n and m. [quoted text clipped - 3 lines] > > /L However, in many real applications of this sort of operation the number of removed elements is very small, so I think it is important to remember that the complexity depends on the removal rate.
Patricia
Lasse Reichstein Nielsen - 30 Jan 2008 16:58 GMT >> It's O(n*m*k) ...
> I don't understand why you multiply together m and k. Neither do I, now that you mention it. I should have said O(n*m+n*k), but I was too busy multiplying :)
> The cost of examining each element is O(n*k), and the cost of removing > the elements that need removing is O(n*m*j), where j is the cost of > shifting one element down the array. We can treat at least one of k and > j as being one, because constant factors don't affect complexity, so > that can reduce to O(n*k+n*m)=O(n*(k+m)). I was treating them both as > being one unit, so I reduced it to O(n*(m+1)). Indeed.
>> Worst case is O(n^2), so it's fair to use that as an upper bound on >> the complexity of the algorithm.
> However, in many real applications of this sort of operation the number > of removed elements is very small, so I think it is important to > remember that the complexity depends on the removal rate. Nah, being practical shouldn't get in the way of a good theory :) But ofcourse, you are right. Knowing the actual problem being solved is more important than general theory. Like Bubble sort being the best sorting algorithm ... for almost sorted lists.
/L
 Signature Lasse Reichstein Nielsen - lrn@hotpop.com DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
Roedy Green - 30 Jan 2008 20:54 GMT On Mon, 28 Jan 2008 14:46:35 -0800 (PST), Kevin <happy_singapore@yahoo.com.sg> wrote, quoted or indirectly quoted someone who said :
>Is using iterator() and then use iterator.remove() the best way? >Like: Another faster method if the lists are long is to copy leaving behind the undesired elements.
This is the technique used in the SortedArrayList class.
See http://mindprod.com/products2.html#SORTED
 Signature Roedy Green, Canadian Mind Products The Java Glossary, http://mindprod.com
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|