How to efficiently (performance) remove many items from List in Java? -


i have quite large list named items (>= 1,000,000 items) , condition denoted <cond> selects items deleted , <cond> true many (maybe half) of items on list.

my goal efficiently remove items selected <cond> , retain other items, source list may modified, new list may created - best way should chosen considering performance.

here testing code:

    system.out.println("preparing items");     list<integer> items = new arraylist<integer>(); // integer demo     (int = 0; < 1000000; i++) {         items.add(i * 3); // demo     }      system.out.println("deleting items");     long startmillis = system.currenttimemillis();     items = removemany(items);     long endmillis = system.currenttimemillis();      system.out.println("after remove: items.size=" + items.size() +              " , took " + (endmillis - startmillis) + " milli(s)"); 

and naive implementation:

public static <t> list<t> removemany(list<t> items) {     int = 0;     iterator<t> iter = items.iterator();     while (iter.hasnext()) {         t item = iter.next();         // <cond> goes here         if (/*<cond>: */i % 2 == 0) {             iter.remove();         }         i++;     }     return items; } 

as can see used item index modulo 2 == 0 remove condition (<cond>) - demonstation purposes.

what better version of removemany may provided , why better version better?

ok, it's time test results of proposed approaches. here approaches have tested (name of each approach class name in sources):

  1. naiveremovemanyperformer - arraylist iterator , remove - first , naive implementation given in question.
  2. betternaiveremovemanyperformer - arraylist backward iteration , removal end front.
  3. linkedremovemanyperformer - naive iterator , remove working on linkedlist. disadventage: works linkedlist.
  4. createnewremovemanyperformer - arraylist made copy (only retained elements added), iterator used traverse input arraylist.
  5. smartcreatenewremovemanyperformer - better createnewremovemanyperformer - initial size (capacity) of result arraylist set final list size. disadvantage: must know final size of list when starting.
  6. fastersmartcreatenewremovemanyperformer - better (?) smartcreatenewremovemanyperformer - use item index (items.get(idx)) instead of iterator.
  7. magicremovemanyperformer - works in place (no list copy) arraylist , compacts holes (removed items) beginning items end of list. disadventage: approach changes order of items in list.
  8. forwardinplaceremovemanyperformer - works in place arraylist - move retaining items fill holes, sublist returned (no final removal or clear).
  9. guavaarraylistremovemanyperformer - google guava iterables.removeif used arraylist - same forwardinplaceremovemanyperformer final removal of items @ end of list.

full source code given @ end of answer.

tests performed different list sizes (from 10,000 items 10,000,000 items) , different remove factors (specifying how many items must removed list).

as posted here in comments other answers - have thought copying items arraylist second arraylist faster iterating linkedlist , removing items. sun's java documentation says constant factor of arraylist low compared linkedlist implementation, surprisingly not case in problem.

in practice linkedlist simple iteration , removal has best performance in cases (this approach implemented in linkedremovemanyperformer). magicremovemanyperformer performance comparable linkedremovemanyperformer, other approaches slower. google guava guavaarraylistremovemanyperformer slower hand made similar code (because code not remove unnecessary items @ end of list).

example results removing 500,000 items 1,000,000 source items:

  1. naiveremovemanyperformer: test not performed - i'm not patient, performs worse betternaiveremovemanyperformer.
  2. betternaiveremovemanyperformer: 226080 milli(s)
  3. linkedremovemanyperformer: 69 milli(s)
  4. createnewremovemanyperformer: 246 milli(s)
  5. smartcreatenewremovemanyperformer: 112 milli(s)
  6. fastersmartcreatenewremovemanyperformer: 202 milli(s)
  7. magicremovemanyperformer: 74 milli(s)
  8. forwardinplaceremovemanyperformer: 69 milli(s)
  9. guavaarraylistremovemanyperformer: 118 milli(s)

example results removing 1 item 1,000,000 source items (first item removed):

  1. betternaiveremovemanyperformer: 34 milli(s)
  2. linkedremovemanyperformer: 41 milli(s)
  3. createnewremovemanyperformer: 253 milli(s)
  4. smartcreatenewremovemanyperformer: 108 milli(s)
  5. fastersmartcreatenewremovemanyperformer: 71 milli(s)
  6. magicremovemanyperformer: 43 milli(s)
  7. forwardinplaceremovemanyperformer: 73 milli(s)
  8. guavaarraylistremovemanyperformer: 78 milli(s)

example results removing 333,334 items 1,000,000 source items:

  1. betternaiveremovemanyperformer: 253206 milli(s)
  2. linkedremovemanyperformer: 69 milli(s)
  3. createnewremovemanyperformer: 245 milli(s)
  4. smartcreatenewremovemanyperformer: 111 milli(s)
  5. fastersmartcreatenewremovemanyperformer: 203 milli(s)
  6. magicremovemanyperformer: 69 milli(s)
  7. forwardinplaceremovemanyperformer: 72 milli(s)
  8. guavaarraylistremovemanyperformer: 102 milli(s)

example results removing 1,000,000 (all) items 1,000,000 source items (all items removed one-by-one processing, if know priori items removed, list should cleared):

  1. betternaiveremovemanyperformer: 58 milli(s)
  2. linkedremovemanyperformer: 88 milli(s)
  3. createnewremovemanyperformer: 95 milli(s)
  4. smartcreatenewremovemanyperformer: 91 milli(s)
  5. fastersmartcreatenewremovemanyperformer: 48 milli(s)
  6. magicremovemanyperformer: 61 milli(s)
  7. forwardinplaceremovemanyperformer: 49 milli(s)
  8. guavaarraylistremovemanyperformer: 133 milli(s)

my final conclusions: use hybrid approach - if dealing linkedlist - simple iteration , removal best, if dealing arraylist - depends if item order important - use forwardinplaceremovemanyperformer then, if item order may changed - best choice magicremovemanyperformer. if remove factor known priori (you know how many items removed vs retained) more conditionals may put select approach performing better in particular situation. known remove factor not usual case... google guava iterables.removeif such hybrid solution different assumption (original list must changed, new 1 cannot created , item order matters) - these common assumptions removeif best choice in real-life cases.

notice approaches (naive not good!) enough - 1 of them shold fine in real application, naive approach must avoided.

at last - source code testing.

package wildwezyrlistremovaltesting;  import com.google.common.base.predicate; import com.google.common.collect.iterables; import java.util.arraylist; import java.util.iterator; import java.util.linkedlist; import java.util.list;  public class removemanyfromlist {      public static abstract class baseremovemanyperformer {          protected string performername() {             return getclass().getsimplename();         }          protected void info(string msg) {             system.out.println(performername() + ": " + msg);         }          protected void populatelist(list<integer> items, int itemcnt) {             (int = 0; < itemcnt; i++) {                 items.add(i);             }         }          protected boolean mustremoveitem(integer itemval, int itemidx, int removefactor) {             if (removefactor == 0) {                 return false;             }             return itemidx % removefactor == 0;         }          protected abstract list<integer> removeitems(list<integer> items, int removefactor);          protected abstract list<integer> createinitiallist();          public void testme(int itemcnt, int removefactor) {             list<integer> items = createinitiallist();             populatelist(items, itemcnt);             long startmillis = system.currenttimemillis();             items = removeitems(items, removefactor);             long endmillis = system.currenttimemillis();             int chksum = 0;             (integer item : items) {                 chksum += item;             }             info("removing took " + (endmillis - startmillis)                     + " milli(s), itemcnt=" + itemcnt                     + ", removed items: " + (itemcnt - items.size())                     + ", remaining items: " + items.size()                     + ", checksum: " + chksum);         }     }     private list<baseremovemanyperformer> rmps =             new arraylist<baseremovemanyperformer>();      public void addperformer(baseremovemanyperformer rmp) {         rmps.add(rmp);     }     private runtime runtime = runtime.getruntime();      private void rungc() {         (int = 0; < 5; i++) {             runtime.gc();         }     }      public void testall(int itemcnt, int removefactor) {         rungc();         (baseremovemanyperformer rmp : rmps) {             rmp.testme(itemcnt, removefactor);         }         rungc();         system.out.println("\n--------------------------\n");     }      public static class naiveremovemanyperformer             extends baseremovemanyperformer {          @override         public list<integer> removeitems(list<integer> items, int removefactor) {             if (items.size() > 300000 && items instanceof arraylist) {                 info("this removeitems slow, returning without processing");                 return items;             }             int = 0;             iterator<integer> iter = items.iterator();             while (iter.hasnext()) {                 integer item = iter.next();                 if (mustremoveitem(item, i, removefactor)) {                     iter.remove();                 }                 i++;             }             return items;         }          @override         public list<integer> createinitiallist() {             return new arraylist<integer>();         }     }      public static class betternaiveremovemanyperformer             extends naiveremovemanyperformer {          @override         public list<integer> removeitems(list<integer> items, int removefactor) { //            if (items.size() > 300000 && items instanceof arraylist) { //                info("this removeitems slow, returning without processing"); //                return items; //            }              (int = items.size(); --i >= 0;) {                 integer item = items.get(i);                 if (mustremoveitem(item, i, removefactor)) {                     items.remove(i);                 }             }             return items;         }     }      public static class linkedremovemanyperformer             extends naiveremovemanyperformer {          @override         public list<integer> createinitiallist() {             return new linkedlist<integer>();         }     }      public static class createnewremovemanyperformer             extends naiveremovemanyperformer {          @override         public list<integer> removeitems(list<integer> items, int removefactor) {             list<integer> res = createresultlist(items, removefactor);             int = 0;              (integer item : items) {                 if (mustremoveitem(item, i, removefactor)) {                     // no-op                 } else {                     res.add(item);                 }                 i++;             }              return res;         }          protected list<integer> createresultlist(list<integer> items, int removefactor) {             return new arraylist<integer>();         }     }      public static class smartcreatenewremovemanyperformer             extends createnewremovemanyperformer {          @override         protected list<integer> createresultlist(list<integer> items, int removefactor) {             int newcapacity = removefactor == 0 ? items.size()                     : (int) (items.size() * (removefactor - 1l) / removefactor + 1);             //system.out.println("newcapacity=" + newcapacity);             return new arraylist<integer>(newcapacity);         }     }      public static class fastersmartcreatenewremovemanyperformer             extends smartcreatenewremovemanyperformer {          @override         public list<integer> removeitems(list<integer> items, int removefactor) {             list<integer> res = createresultlist(items, removefactor);              (int = 0; < items.size(); i++) {                 integer item = items.get(i);                 if (mustremoveitem(item, i, removefactor)) {                     // no-op                 } else {                     res.add(item);                 }             }              return res;         }     }      public static class forwardinplaceremovemanyperformer             extends naiveremovemanyperformer {          @override         public list<integer> removeitems(list<integer> items, int removefactor) {             int j = 0; // destination idx             (int = 0; < items.size(); i++) {                 integer item = items.get(i);                 if (mustremoveitem(item, i, removefactor)) {                     // no-op                 } else {                     if (j < i) {                         items.set(j, item);                     }                     j++;                 }             }              return items.sublist(0, j);         }     }      public static class magicremovemanyperformer             extends naiveremovemanyperformer {          @override         public list<integer> removeitems(list<integer> items, int removefactor) {             (int = 0; < items.size(); i++) {                 if (mustremoveitem(items.get(i), i, removefactor)) {                     integer retaineditem = removesomefromend(items, removefactor, i);                     if (retaineditem == null) {                         items.remove(i);                         break;                     }                     items.set(i, retaineditem);                 }             }              return items;         }          private integer removesomefromend(list<integer> items, int removefactor, int lowerbound) {             (int = items.size(); --i > lowerbound;) {                 integer item = items.get(i);                 items.remove(i);                 if (!mustremoveitem(item, i, removefactor)) {                     return item;                 }             }             return null;         }     }      public static class guavaarraylistremovemanyperformer             extends baseremovemanyperformer {          @override         protected list<integer> removeitems(list<integer> items, final int removefactor) {             iterables.removeif(items, new predicate<integer>() {                  public boolean apply(integer input) {                     return mustremoveitem(input, input, removefactor);                 }             });              return items;         }          @override         protected list<integer> createinitiallist() {             return new arraylist<integer>();         }     }      public void testforoneitemcnt(int itemcnt) {         testall(itemcnt, 0);         testall(itemcnt, itemcnt);         testall(itemcnt, itemcnt - 1);         testall(itemcnt, 3);         testall(itemcnt, 2);         testall(itemcnt, 1);     }      public static void main(string[] args) {         removemanyfromlist t = new removemanyfromlist();         t.addperformer(new naiveremovemanyperformer());         t.addperformer(new betternaiveremovemanyperformer());         t.addperformer(new linkedremovemanyperformer());         t.addperformer(new createnewremovemanyperformer());         t.addperformer(new smartcreatenewremovemanyperformer());         t.addperformer(new fastersmartcreatenewremovemanyperformer());         t.addperformer(new magicremovemanyperformer());         t.addperformer(new forwardinplaceremovemanyperformer());         t.addperformer(new guavaarraylistremovemanyperformer());          t.testforoneitemcnt(1000);         t.testforoneitemcnt(10000);         t.testforoneitemcnt(100000);         t.testforoneitemcnt(200000);         t.testforoneitemcnt(300000);         t.testforoneitemcnt(500000);         t.testforoneitemcnt(1000000);         t.testforoneitemcnt(10000000);     } } 

Comments

Popular posts from this blog

c++ - Convert big endian to little endian when reading from a binary file -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

unicode - Are email addresses allowed to contain non-alphanumeric characters? -