How to efficiently (performance) remove many items from List in Java? -
i have quite large list named items (>= 1,000,000 items) , condition denoted <cond> selects items deleted , <cond> true many (maybe half) of items on list.
my goal efficiently remove items selected <cond> , retain other items, source list may modified, new list may created - best way should chosen considering performance.
here testing code:
system.out.println("preparing items"); list<integer> items = new arraylist<integer>(); // integer demo (int = 0; < 1000000; i++) { items.add(i * 3); // demo } system.out.println("deleting items"); long startmillis = system.currenttimemillis(); items = removemany(items); long endmillis = system.currenttimemillis(); system.out.println("after remove: items.size=" + items.size() + " , took " + (endmillis - startmillis) + " milli(s)");
and naive implementation:
public static <t> list<t> removemany(list<t> items) { int = 0; iterator<t> iter = items.iterator(); while (iter.hasnext()) { t item = iter.next(); // <cond> goes here if (/*<cond>: */i % 2 == 0) { iter.remove(); } i++; } return items; }
as can see used item index modulo 2 == 0 remove condition (<cond>) - demonstation purposes.
what better version of removemany
may provided , why better version better?
ok, it's time test results of proposed approaches. here approaches have tested (name of each approach class name in sources):
naiveremovemanyperformer
-arraylist
iterator , remove - first , naive implementation given in question.betternaiveremovemanyperformer
-arraylist
backward iteration , removal end front.linkedremovemanyperformer
- naive iterator , remove working onlinkedlist
. disadventage: workslinkedlist
.createnewremovemanyperformer
-arraylist
made copy (only retained elements added), iterator used traverse inputarraylist
.smartcreatenewremovemanyperformer
- bettercreatenewremovemanyperformer
- initial size (capacity) of resultarraylist
set final list size. disadvantage: must know final size of list when starting.fastersmartcreatenewremovemanyperformer
- better (?)smartcreatenewremovemanyperformer
- use item index (items.get(idx)
) instead of iterator.magicremovemanyperformer
- works in place (no list copy)arraylist
, compacts holes (removed items) beginning items end of list. disadventage: approach changes order of items in list.forwardinplaceremovemanyperformer
- works in placearraylist
- move retaining items fill holes, sublist returned (no final removal or clear).guavaarraylistremovemanyperformer
- google guavaiterables.removeif
usedarraylist
- sameforwardinplaceremovemanyperformer
final removal of items @ end of list.
full source code given @ end of answer.
tests performed different list sizes (from 10,000 items 10,000,000 items) , different remove factors (specifying how many items must removed list).
as posted here in comments other answers - have thought copying items arraylist
second arraylist
faster iterating linkedlist
, removing items. sun's java documentation says constant factor of arraylist
low compared linkedlist
implementation, surprisingly not case in problem.
in practice linkedlist
simple iteration , removal has best performance in cases (this approach implemented in linkedremovemanyperformer
). magicremovemanyperformer
performance comparable linkedremovemanyperformer
, other approaches slower. google guava guavaarraylistremovemanyperformer
slower hand made similar code (because code not remove unnecessary items @ end of list).
example results removing 500,000 items 1,000,000 source items:
naiveremovemanyperformer
: test not performed - i'm not patient, performs worsebetternaiveremovemanyperformer
.betternaiveremovemanyperformer
: 226080 milli(s)linkedremovemanyperformer
: 69 milli(s)createnewremovemanyperformer
: 246 milli(s)smartcreatenewremovemanyperformer
: 112 milli(s)fastersmartcreatenewremovemanyperformer
: 202 milli(s)magicremovemanyperformer
: 74 milli(s)forwardinplaceremovemanyperformer
: 69 milli(s)guavaarraylistremovemanyperformer
: 118 milli(s)
example results removing 1 item 1,000,000 source items (first item removed):
- betternaiveremovemanyperformer: 34 milli(s)
- linkedremovemanyperformer: 41 milli(s)
- createnewremovemanyperformer: 253 milli(s)
- smartcreatenewremovemanyperformer: 108 milli(s)
- fastersmartcreatenewremovemanyperformer: 71 milli(s)
- magicremovemanyperformer: 43 milli(s)
- forwardinplaceremovemanyperformer: 73 milli(s)
- guavaarraylistremovemanyperformer: 78 milli(s)
example results removing 333,334 items 1,000,000 source items:
- betternaiveremovemanyperformer: 253206 milli(s)
- linkedremovemanyperformer: 69 milli(s)
- createnewremovemanyperformer: 245 milli(s)
- smartcreatenewremovemanyperformer: 111 milli(s)
- fastersmartcreatenewremovemanyperformer: 203 milli(s)
- magicremovemanyperformer: 69 milli(s)
- forwardinplaceremovemanyperformer: 72 milli(s)
- guavaarraylistremovemanyperformer: 102 milli(s)
example results removing 1,000,000 (all) items 1,000,000 source items (all items removed one-by-one processing, if know priori items removed, list should cleared):
- betternaiveremovemanyperformer: 58 milli(s)
- linkedremovemanyperformer: 88 milli(s)
- createnewremovemanyperformer: 95 milli(s)
- smartcreatenewremovemanyperformer: 91 milli(s)
- fastersmartcreatenewremovemanyperformer: 48 milli(s)
- magicremovemanyperformer: 61 milli(s)
- forwardinplaceremovemanyperformer: 49 milli(s)
- guavaarraylistremovemanyperformer: 133 milli(s)
my final conclusions: use hybrid approach - if dealing linkedlist - simple iteration , removal best, if dealing arraylist - depends if item order important - use forwardinplaceremovemanyperformer then, if item order may changed - best choice magicremovemanyperformer. if remove factor known priori (you know how many items removed vs retained) more conditionals may put select approach performing better in particular situation. known remove factor not usual case... google guava iterables.removeif
such hybrid solution different assumption (original list must changed, new 1 cannot created , item order matters) - these common assumptions removeif
best choice in real-life cases.
notice approaches (naive not good!) enough - 1 of them shold fine in real application, naive approach must avoided.
at last - source code testing.
package wildwezyrlistremovaltesting; import com.google.common.base.predicate; import com.google.common.collect.iterables; import java.util.arraylist; import java.util.iterator; import java.util.linkedlist; import java.util.list; public class removemanyfromlist { public static abstract class baseremovemanyperformer { protected string performername() { return getclass().getsimplename(); } protected void info(string msg) { system.out.println(performername() + ": " + msg); } protected void populatelist(list<integer> items, int itemcnt) { (int = 0; < itemcnt; i++) { items.add(i); } } protected boolean mustremoveitem(integer itemval, int itemidx, int removefactor) { if (removefactor == 0) { return false; } return itemidx % removefactor == 0; } protected abstract list<integer> removeitems(list<integer> items, int removefactor); protected abstract list<integer> createinitiallist(); public void testme(int itemcnt, int removefactor) { list<integer> items = createinitiallist(); populatelist(items, itemcnt); long startmillis = system.currenttimemillis(); items = removeitems(items, removefactor); long endmillis = system.currenttimemillis(); int chksum = 0; (integer item : items) { chksum += item; } info("removing took " + (endmillis - startmillis) + " milli(s), itemcnt=" + itemcnt + ", removed items: " + (itemcnt - items.size()) + ", remaining items: " + items.size() + ", checksum: " + chksum); } } private list<baseremovemanyperformer> rmps = new arraylist<baseremovemanyperformer>(); public void addperformer(baseremovemanyperformer rmp) { rmps.add(rmp); } private runtime runtime = runtime.getruntime(); private void rungc() { (int = 0; < 5; i++) { runtime.gc(); } } public void testall(int itemcnt, int removefactor) { rungc(); (baseremovemanyperformer rmp : rmps) { rmp.testme(itemcnt, removefactor); } rungc(); system.out.println("\n--------------------------\n"); } public static class naiveremovemanyperformer extends baseremovemanyperformer { @override public list<integer> removeitems(list<integer> items, int removefactor) { if (items.size() > 300000 && items instanceof arraylist) { info("this removeitems slow, returning without processing"); return items; } int = 0; iterator<integer> iter = items.iterator(); while (iter.hasnext()) { integer item = iter.next(); if (mustremoveitem(item, i, removefactor)) { iter.remove(); } i++; } return items; } @override public list<integer> createinitiallist() { return new arraylist<integer>(); } } public static class betternaiveremovemanyperformer extends naiveremovemanyperformer { @override public list<integer> removeitems(list<integer> items, int removefactor) { // if (items.size() > 300000 && items instanceof arraylist) { // info("this removeitems slow, returning without processing"); // return items; // } (int = items.size(); --i >= 0;) { integer item = items.get(i); if (mustremoveitem(item, i, removefactor)) { items.remove(i); } } return items; } } public static class linkedremovemanyperformer extends naiveremovemanyperformer { @override public list<integer> createinitiallist() { return new linkedlist<integer>(); } } public static class createnewremovemanyperformer extends naiveremovemanyperformer { @override public list<integer> removeitems(list<integer> items, int removefactor) { list<integer> res = createresultlist(items, removefactor); int = 0; (integer item : items) { if (mustremoveitem(item, i, removefactor)) { // no-op } else { res.add(item); } i++; } return res; } protected list<integer> createresultlist(list<integer> items, int removefactor) { return new arraylist<integer>(); } } public static class smartcreatenewremovemanyperformer extends createnewremovemanyperformer { @override protected list<integer> createresultlist(list<integer> items, int removefactor) { int newcapacity = removefactor == 0 ? items.size() : (int) (items.size() * (removefactor - 1l) / removefactor + 1); //system.out.println("newcapacity=" + newcapacity); return new arraylist<integer>(newcapacity); } } public static class fastersmartcreatenewremovemanyperformer extends smartcreatenewremovemanyperformer { @override public list<integer> removeitems(list<integer> items, int removefactor) { list<integer> res = createresultlist(items, removefactor); (int = 0; < items.size(); i++) { integer item = items.get(i); if (mustremoveitem(item, i, removefactor)) { // no-op } else { res.add(item); } } return res; } } public static class forwardinplaceremovemanyperformer extends naiveremovemanyperformer { @override public list<integer> removeitems(list<integer> items, int removefactor) { int j = 0; // destination idx (int = 0; < items.size(); i++) { integer item = items.get(i); if (mustremoveitem(item, i, removefactor)) { // no-op } else { if (j < i) { items.set(j, item); } j++; } } return items.sublist(0, j); } } public static class magicremovemanyperformer extends naiveremovemanyperformer { @override public list<integer> removeitems(list<integer> items, int removefactor) { (int = 0; < items.size(); i++) { if (mustremoveitem(items.get(i), i, removefactor)) { integer retaineditem = removesomefromend(items, removefactor, i); if (retaineditem == null) { items.remove(i); break; } items.set(i, retaineditem); } } return items; } private integer removesomefromend(list<integer> items, int removefactor, int lowerbound) { (int = items.size(); --i > lowerbound;) { integer item = items.get(i); items.remove(i); if (!mustremoveitem(item, i, removefactor)) { return item; } } return null; } } public static class guavaarraylistremovemanyperformer extends baseremovemanyperformer { @override protected list<integer> removeitems(list<integer> items, final int removefactor) { iterables.removeif(items, new predicate<integer>() { public boolean apply(integer input) { return mustremoveitem(input, input, removefactor); } }); return items; } @override protected list<integer> createinitiallist() { return new arraylist<integer>(); } } public void testforoneitemcnt(int itemcnt) { testall(itemcnt, 0); testall(itemcnt, itemcnt); testall(itemcnt, itemcnt - 1); testall(itemcnt, 3); testall(itemcnt, 2); testall(itemcnt, 1); } public static void main(string[] args) { removemanyfromlist t = new removemanyfromlist(); t.addperformer(new naiveremovemanyperformer()); t.addperformer(new betternaiveremovemanyperformer()); t.addperformer(new linkedremovemanyperformer()); t.addperformer(new createnewremovemanyperformer()); t.addperformer(new smartcreatenewremovemanyperformer()); t.addperformer(new fastersmartcreatenewremovemanyperformer()); t.addperformer(new magicremovemanyperformer()); t.addperformer(new forwardinplaceremovemanyperformer()); t.addperformer(new guavaarraylistremovemanyperformer()); t.testforoneitemcnt(1000); t.testforoneitemcnt(10000); t.testforoneitemcnt(100000); t.testforoneitemcnt(200000); t.testforoneitemcnt(300000); t.testforoneitemcnt(500000); t.testforoneitemcnt(1000000); t.testforoneitemcnt(10000000); } }
Comments
Post a Comment