c - Fastest way to find out minimum of 3 numbers? -


in program wrote, 20% of time being spent on finding out minimum of 3 numbers in inner loop, in routine:

static inline unsigned int min(unsigned int a, unsigned int b, unsigned int c) {     unsigned int m = a;     if (m > b) m = b;     if (m > c) m = c;     return m; } 

is there way speed up? ok assembly code x86/x86_64.

edit: in reply of comments:
* compiler being used gcc 4.3.3
* far assembly concerned, beginner there. asked assembly here, learn how this. :)
* have quad-core intel 64 running, mmx/sse etc. supported.
* it's hard post loop here, can tell it's heavily optimized implementation of levenshtein algorithm.

this compiler giving me non-inlined version of min:

.globl min     .type   min, @function min:     pushl   %ebp     movl    %esp, %ebp     movl    8(%ebp), %edx     movl    12(%ebp), %eax     movl    16(%ebp), %ecx     cmpl    %edx, %eax     jbe .l2     movl    %edx, %eax .l2:     cmpl    %ecx, %eax     jbe .l3     movl    %ecx, %eax .l3:     popl    %ebp     ret     .size   min, .-min     .ident  "gcc: (ubuntu 4.3.3-5ubuntu4) 4.3.3"     .section    .note.gnu-stack,"",@progbits 

the inlined version within -o2 optimized code (even markers mrk = 0xfefefefe, before , after call min()) getting optimized away gcc, couldn't hold of it.

update: tested changes suggested nils, ephemient, there's no perceptible performance boost using assembly versions of min(). however, 12.5% boost compiling program -march=i686, guess because whole program getting benefits of new faster instructions gcc generating option. guys.

p.s. - used ruby profiler measure performance (my c program shared library loaded ruby program), time spent top-level c function called ruby program, ends calling min() down stack. please see question.

make sure using appropriate -march setting, first off. gcc defaults not using instructions not supported on original i386 - allowing use newer instruction sets can make big difference @ times! on -march=core2 -o2 get:

min:     pushl   %ebp     movl    %esp, %ebp     movl    8(%ebp), %edx     movl    12(%ebp), %ecx     movl    16(%ebp), %eax     cmpl    %edx, %ecx     leave     cmovbe  %ecx, %edx     cmpl    %eax, %edx     cmovbe  %edx, %eax     ret 

the use of cmov here may avoid branch delays - , without inline asm passing in -march. when inlined larger function more efficient, possibly 4 assembly operations. if need faster this, see if can sse vector operations work in context of overall algorithm.


Comments

Popular posts from this blog

unicode - Are email addresses allowed to contain non-alphanumeric characters? -

c++ - Convert big endian to little endian when reading from a binary file -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -