.net - LINQ method for adding items to a dictionary -


i'm trying learn bit more linq implementing peter norvig's spelling corrector in c#.

the first part involves taking large file of words (about 1 million) , putting dictionary key word , value number of occurrences.

i'd so:

foreach (var word in allwords)                                                     {                if (wordcount.containskey(word))         wordcount[word]++;     else         wordcount.add(word, 1); } 

where allwords ienumerable<string>

in linq i'm doing this:

var wordcountlinq = (from word in allwordslinq                          group word word                          groups                          select groups).todictionary(g => g.key, g => g.count());   

i compare 2 dictionaries looking @ <key, value> , they're identical, they're producing same results.

the foreach loop takes 3.82 secs , linq query takes 4.49 secs

i'm timing using stopwatch class , i'm running in release mode. don't think performance bad wondering if there reason difference.

am doing linq query in inefficient way or missing something?

update: here's full benchmark code sample:

public static void testcode() {     //file can downloaded http://norvig.com/big.txt , consists of million words.     const string filename = @"path_to_file";     var allwords = match m in regex.matches(file.readalltext(filename).tolower(), "[a-z]+", regexoptions.compiled)                    select m.value;      var wordcount = new dictionary<string, int>();     var timer = new stopwatch();                 timer.start();     foreach (var word in allwords)                                                         {                    if (wordcount.containskey(word))             wordcount[word]++;         else             wordcount.add(word, 1);     }     timer.stop();      console.writeline("foreach loop took {0:0.00} ms ({1:0.00} secs)\n",             timer.elapsedmilliseconds, timer.elapsedmilliseconds / 1000.0);      //make linq use different enumerable (with same values),      //if don't becomes way faster, assmume caching thing??     var allwordslinq = match m in regex.matches(file.readalltext(filename).tolower(), "[a-z]+", regexoptions.compiled)                    select m.value;      timer.reset();     timer.start();     var wordcountlinq = (from word in allwordslinq                             group word word                             groups                             select groups).todictionary(g => g.key, g => g.count());       timer.stop();      console.writeline("linq took {0:0.00} ms ({1:0.00} secs)\n",             timer.elapsedmilliseconds, timer.elapsedmilliseconds / 1000.0);                      } 

one of reasons linq version slower, because instead of 1 dictionary, 2 dictionaries created:

  1. (internally) group operator; group stores each individual word. can verify looking @ toarray() rather count(). lot of overhead don't need in case.

  2. the todictionary method foreach on actual linq query, results query added new dictionary. depending on number of unique words, can take time.

another reason linq query little slower, because linq relies on lambda expressions (the delegate in dathan's answer), , calling delegate adds tiny amount of overhead compared inline code.

edit: note linq scenarios (such linq sql, not in-memory linq such here), rewriting query produces more optimized plan:

from word in allwordslinq  group word word groups  select new { word = groups.key, count = groups.count() } 

note however, doesn't give dictionary, rather sequence of words , counts. can transform dictionary with

(from word in allwordslinq   group word word groups   select new { word = groups.key, count = groups.count() }) .todictionary(g => g.word, g => g.count); 

Comments

Popular posts from this blog

c++ - Convert big endian to little endian when reading from a binary file -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

unicode - Are email addresses allowed to contain non-alphanumeric characters? -