Extract data from large structured file using Java/Python -
i have large text file (~100mb) need parsed extract information. find efficient way of doing it. file structured in block:
mon, 01 jan 2010 01:01:01 token1 = valuexyz token2 = valueabc token3 = valuepqr ... tokenx = value123 mon, 01 jan 2010 01:02:01 token1 = valuexyz token2 = valueabc token3 = valuepqr ... tokeny = value456
is there library in parsing file? (in java, python, command line tool)
edit: know question vague, key element not way read file, parse regex, etc. looking more in library, or tools suggestions in terms of performance. example, antlr have been possibility, tool loads whole file in memory, not good.
thanks!
for efficient parsing of files, on big file, can use awk. example
$ awk -vrs= '{print "====>" $0}' file ====>mon, 01 jan 2010 01:01:01 token1 = valuexyz token2 = valueabc token3 = valuepqr ... tokenx = value123 ====>mon, 01 jan 2010 01:02:01 token1 = valuexyz token2 = valueabc token3 = valuepqr ... tokeny = value456 ====>mon, 01 jan 2010 01:03:01 token1 = valuexyz token2 = valueabc token3 = valuepqr
as can see arrows , each record 1 block "====>" arrows next (by setting record separator rs blanks). can set field separator, eg newline
$ awk -vrs= -vfs="\n" '{print "====>" $1}' file ====>mon, 01 jan 2010 01:01:01 ====>mon, 01 jan 2010 01:02:01 ====>mon, 01 jan 2010 01:03:01
so in above example, every 1st field date/time stamp. "token1" example, this
$ awk -vrs= -vfs="\n" '{for(i=1;i<=nf;i++) if ($i ~/token1/){ print $i} }' file token1 = valuexyz token1 = valuexyz token1 = valuexyz
Comments
Post a Comment