Valid HTML 4.01 Transitional Valid CSS Valid SVG 1.0

Me, myself & IT

Optimising Microsoft® Visual C compilers

Purpose

Demonstrate poor and unoptimised or wrong code generation of Microsoft’s (not quite so) optimising Visual C compilers, with (currently) 26 examples (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), plus a side note on the implementation of the arithmetic division and multiplication routines for 64-bit operands on the I386 alias x86 processor architecture.

The most hilarious examples are 0, 1 and 4, the bug is shown in example 11, while the examples 6 to 9, 13, and 18 to 22 present the worst cases.

Side note

On the I386 alias x86 processor architecture, 64-bit arithmetic division and multiplication operations are implemented via calls of several (almost) undocumented helper routines: _alldiv(), _allrem() and _alldvrm() for division of two signed 64-bit operands, returning a signed 64-bit quotient, remainder or both, _aulldiv(), _aullrem() and _aulldvrm() for division of two unsigned 64-bit operands, returning an unsigned 64-bit quotient, remainder or both, plus _aullmul() for multiplication of two signed as well as unsigned 64-bit operands, returning the (un)signed product modulo 264.
Additionally, 64-bit shift operations are implemented via calls of some more undocumented helper routines: _allshl() for both a signed or an unsigned 64-bit operand, _allshr() for a signed 64-bit operand, plus _aullshr() for an unsigned 64-bit operand.

Note: all helper routines use non-standard calling or naming convention, none of them can be called from C or C++ code!

Especially the implementation of the division routines (albeit written in assembler) is very poor: on current Intel® processors (i.e. those introduced in the last 13 years) they are about 5× to 9× slower than properly optimised code, and about 7× to 11× slower than native 64-bit division operations.

Note: according to comments in their source code, the initial version was written November 29, 1983, for 32-bit operands on 16-bit Intel processors; they were modified November 19, 1993, for 64-bit operands on 32-bit Intel processors, but without taking advantage of the 32-bit processor’s (introduced October 1985, i.e. 8 years earlier) new capabilities: the loop with SHR and RCR instructions, which shifts the operands by just one bit per pass, was not replaced with a BSR followed by two pairs of SHLD and SHL or SHRD and SHR instructions to shift the operands in one go.

Measured on an Intel processor of the Core2 family running under Windows® PE, dividing 16 billion pairs of 64-bit pseudo-random numbers produced by 6 different independent (deterministic random bit) generators, _aulldiv() and _aullrem() consume from 114 to 125 processor clock cycles per call; the assembly routines provided with my own NOMSVCRT.LIB consume from 16 to 32 processor clock cycles per call, while the native 64-bit machine instructions consume from 8 to 19 processor clock cycles per operation.

Note: codenamed Penryn, Wolfdale and Yorkfield, Intel introduced these processors from late 2007 to early 2009.

For comparision: the (corresponding) __divdi3(), __moddi3(), __udivdi3() and __umoddi3() routines from the builtins library of LLVM’s compiler-rt runtime libraries, originally written in December 2008 by Apple’s Stephen Canon, adapted for the Visual C compiler and improved by me, consume from 27 to 37 processor clock cycles per call.

Caveat: these heavily-optimized assembly routines are not shipped with current packages of LLVM for Windows, but (nearly 5× bigger and more than 2× slower) optimized implementations written in C, which too are not properly optimised: instead to take advantage of the 64-bit shift operations supported by their own clang compiler they use a bunch of conditionally executed complementary 32-bit left and right shift operations to handle shift counts below and above the word length individually, then combine their results with logical or operations.

Even this not so optimised C implementation is about 2× to 3× faster than Microsoft’s assembler implementation!

Warning: the _lldiv(), _llrem(), _ulldiv() and _ullrem() assembly routines published by AMD® in their Software Optimization Guide for AMD Family 15h Processors, Publication No. 47414, Revision 3.06, January 2012, Software Optimization Guide for AMD Family 10h and 12h Processors, Publication No. 40546, Revision 3.13, February 2011, Software Optimization Guide for AMD Family 10h and 12h Processors, Publication No. 40546, Revision 3.10, February 2009, Software Optimization Guide for AMD64 Processors, Publication No. 25112, Revision 3.06, September 2005, Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors, Publication No. 25112, Revision 3.04, March 2004, Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors, Publication No. 25112, Revision 3.03, September 2003, AMD Athlon Processor x86 Code Optimization Guide, Publication No. 22007, Revision K, February 2002, have bugs and return wrong results; for example, unsigned division of 18446744073709551615÷4294967299 yields the quotient 4294967294 instead of 4294967293 (in other notation (264−1)÷(232+3)=232−2 instead of 232−3, or 0xFFFFFFFFFFFFFFFF÷0x100000003=0xFFFFFFFE instead of 0xFFFFFFFD), and the remainder 18446744069414584325 instead of 8.
This bug shows for multiple (other) dividends too, and also with the divisors 7516192769=0x1C0000001, 15032385539=0x380000003, …!

Execution times

Measurements are performed using 8 runs of 64-bit pseudo-random numbers, produced by 6 different independent (deterministic random bit) generators, with 1 billion divisions per run returning the quotient and 1 billion divisions per run returning the remainder, totalling in 16 billion divisions.

The table shows the execution times of 64-bit division routines from different libraries on several processors, in average, minimum and maximum processor clock cycles per call, as well as their code sizes in bytes and number of instructions; the upper half for the routines written in assembler, the lower half for the native 64-bit hardware and the routines written in C.

NOMSVCRT.LIB LLVM Compiler-RT Microsoft
_aulldiv()_aullrem() _aulldiv()_aullrem() _aulldiv()_aullrem()
4242 52 [66]53 [71] 4243 instructions
105119 130 [157]137 [172] 102115 bytes
AMD® Ryzen7 2700X 91116111417 161823161823 535863566472 minimum,
Intel Core i5-8400 131619162023 162427162427 140146154139148154 average
Intel Core i5-7400 101837132038 132445132445 129136141132140148 and
Intel Core i5-6600 131619162123 162427162427 133144155135146154 maximum
Intel Core i5-4670 91619121922 172226162428 118128141119129137 number
Intel Core i5-3550 121620162023 172429182429 118130149122135143 of
Intel Core i3-2328M 21222426 23352235 129158142164 processor
Intel Core2 P8700 161923182432 283237273136 114117121117122125 clock
Intel Core2 E8500 171924202430 293237283136 114119125119124128 cycles
Intel Core2 Q8400 171924202430 293338273237 116120127119125132 per
AMD® AthlonII X4 635 687275697173 647278647278 129134139131136138 call
Intel® Pentium®4 8711215098124163 97129181117143182 301332376320351371

Note: the values in brackets denote the number of instructions and bytes of the original __udivdi3() and __umoddi3() routines.

Native LLVM Compiler-RT Microsoft
DIVREM __udivdi3()__umoddi3() __udivdi3()__umoddi3()
34 8 + 25433 + 254 8 + 26312 + 263 instructions
710 27 + 73279 + 732 27 + 63840 + 638 bytes
AMD® Ryzen7 2700X 57105710 365161435667 426176446174 minimum,
Intel Core i5-8400 172023172023 344956445865 436781476882 average
Intel Core i5-7400 161819161820 304552365159 335872335872 and
Intel Core i5-6600 172023172023 304956435765 366581396682 maximum
Intel Core i5-4670 182124182124 364754435663 325872355973 number
Intel Core i5-3550 212527212527 284453365258 336075356177 of
Intel Core i3-2328M 21262325 45655674 processor
Intel Core2 P8700 8141791419 495967627282 607582617583 clock
Intel Core2 E8500 516067637282 617683637684 cycles
Intel Core2 Q8400 8141981419 506067637382 627783627784 per
AMD® AthlonII X4 635 747680747680 516572607482 567182577280 call
Intel® Pentium®4 99125162100143190 7611214578114146

Note: optimising the __udivmoddi4() routine for speed, Microsoft’s current Visual C 2017 compiler emits 9 instructions more than LLVM’s clang compiler, counting but 94 bytes less; the __udivdi3() and __umoddi3() routines call __udivmoddi4() to perform the division, just like the __divdi3() and __moddi3() routines shown in example 20.

Example 0

According to their documentation on MSDN, the macros Int32x32To64 and UInt32x32To64 defined in the header file WINNT.H of the Windows SDK (are supposed to) generate just a single multiply instruction:
Multiplies two signed 32-bit integers, returning a signed 64-bit integer result. The function performs optimally on 32-bit Windows.

This function is implemented on all platforms by optimal inline code: a single multiply instruction that returns a 64-bit result.
Multiplies two unsigned 32-bit integers, returning an unsigned 64-bit integer result. The function performs optimally on 32-bit Windows.

This function is implemented on all platforms by optimal inline code: a single multiply instruction that returns a 64-bit result.
Contrary to these stateadvertisements, the 32-bit Visual C compilers but generate a call to the external routine _allmul() instead of the single multiply instruction!

Note: _allmul() is an undocumented helper routine for the 32-bit compiler which multiplies two 64-bit integers, similar to the (sort of documented) _alldiv() and _aulldiv() helper routines.

Demonstration

  1. Create the text file example0.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2004-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define Int32x32To64(a, b)  ((long long)(((long long)((long)(a))) * ((long)(b))))
    #define UInt32x32To64(a, b) ((unsigned long long)(((unsigned long long)((unsigned int)(a))) * ((unsigned int)(b))))
    
    int main(int argc)
    {
        long long x = argc * -argc;
        long long y = Int32x32To64(argc, -argc);
        long long z = UInt32x32To64(argc, -argc);
    }
  2. Generate the assembly listing example0.asm from the source file example0.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Tcexample0.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example0.c
  3. Display the assembly listing example0.asm created in step 2.:

    Type example0.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example0.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    EXTRN	__allmul:PROC
    
    ; Function compile flags: /Odtp
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _z$ = -24						; size = 8
    _y$ = -16						; size = 8
    _x$ = -8						; size = 8
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example0.c
    ; Line 7
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 24					; 00000018H
    	push	esi
    ; Line 8
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	imul	DWORD PTR _argc$[ebp]
    	imul	eax, DWORD PTR _argc$[ebp]
    	cdq
    	mov	DWORD PTR _x$[ebp], eax
    	mov	DWORD PTR _x$[ebp+4], edx
    ; Line 9
    	mov	eax, DWORD PTR _argc$[ebp]
    	cdq
    	mov	ecx, eax
    	mov	esi, edx
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	imul	DWORD PTR _argc$[ebp]
    	cdq
    	push	edx
    	push	eax
    	push	esi
    	push	ecx
    	call	__allmul
    	mov	DWORD PTR _y$[ebp], eax
    	mov	DWORD PTR _y$[ebp+4], edx
    ; Line 10
    	mov	edx, DWORD PTR _argc$[ebp]
    	neg	edx
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	mul	DWORD PTR _argc$[ebp]
    	mul	edx
    	mov	DWORD PTR _z$[ebp], eax
    	mov	DWORD PTR _z$[ebp+4], edx
    ; Line 11
    	xor	eax, eax
    	pop	esi
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    Notice the difference between unsigned and signed multiplication: while a single multiply instruction is generated for the former, a call of the external routine _allmul() is generated for the latter!
Note: for a real life example where such unoptimised code is generated, see the MSDN article Converting a time_t Value to a File Time and the MSKB article 167296!

Fix

Both macros should have been replaced a long time ago by the intrinsic functions __emul() and __emulu() introduced with the Visual C 2005 compiler!
#if _MSC_VER < 1400
#define Int32x32To64(a, b)  ((long long)(((long long)((long)(a))) * ((long)(b))))
#define UInt32x32To64(a, b) ((unsigned long long)(((unsigned long long)((unsigned int)(a))) * ((unsigned int)(b))))
#else
         long long __emul(int, int);
unsigned long long __emulu(unsigned int, unsigned int);
#pragma intrinsic(__emul, __emulu)
#define Int32x32To64  __emul
#define UInt32x32To64 __emulu
#endif
Note: Visual C fails to provide the inverse intrinsics for division of 64-bit by 32-bit integers.

Of course this also applies to the macros (really: inline assembler functions) Int64ShllMod32(), Int64ShraMod32() and Int64ShrlMod32() defined in the header file WINNT.H of the Windows SDK; these too should have been replaced a long time ago by the intrinsic functions __ll_lshift(), __ll_rshift() and __ull_rshift() introduced with the Visual C 2005 compiler!

#if _MSC_VER < 1400
…
#else
unsigned long long __ll_lshift(unsigned long long, int);
         long long __ll_rshift(long long, int);
unsigned long long __ull_rshift(unsigned long long, int);
#pragma intrinsic(__ll_lshift, __ll_rshift, __ull_rshift)
#define Int64ShllMod32 __ll_lshift
#define Int64ShraMod32 __ll_rshift
#define Int64ShrlMod32 __ull_rshift
#endif
The sample code for converting from seconds since January 1, 1970, to 100 nano-seconds since January 1, 1601, should be written without macros and intrinsic functions.
#include <windows.h>

VOID EpochToFileTime(ULONG seconds, LPFILETIME pft)
{
    ULONGLONG ull = seconds * 10000000ULL + 116444736000000000ULL;
    pft->dwLowDateTime = ull;
    pft->dwHighDateTime = ull >> 32;
}

Example 1

Demonstration

  1. Create the text file example1.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __inline
    long long __fastcall Int32x32To64(long v, long w)
    {
        return (long long) v * w;
    }
    
    long __fastcall Int32x32To64Div32(long x, long y, long z)
    {
        return Int32x32To64(x, y) / z;
    }
    
    long __fastcall Int32x32To64Rem32(long x, long y, long z)
    {
        return Int32x32To64(x, y) % z;
    }
    
    __inline
    unsigned long long __fastcall UInt32x32To64(unsigned long v, unsigned long w)
    {
        return (unsigned long long) v * w;
    }
    
    unsigned long __fastcall UInt32x32To64Div32(unsigned long x, unsigned long y, unsigned long z)
    {
        return UInt32x32To64(x, y) / z;
    }
    
    unsigned long __fastcall UInt32x32To64Rem32(unsigned long x, unsigned long y, unsigned long z)
    {
        return UInt32x32To64(x, y) % z;
    }
  2. Generate the assembly listing example1.asm from the source file example1.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample1.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example1.c
    example1.c(11): warning C4244: 'return' : conversion from '__int64' to 'long', possible loss of data
    example1.c(27): warning C4244: 'return' : conversion from 'unsigned __int64' to 'unsigned long', possible loss of data
  3. Display the assembly listing example1.asm created in step 2.:

    Type example1.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example1.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@Int32x32To64@8
    PUBLIC	@Int32x32To64Div32@12
    PUBLIC	@Int32x32To64Rem32@12
    PUBLIC	@UInt32x32To64@8
    PUBLIC	@UInt32x32To64Div32@12
    PUBLIC	@UInt32x32To64Rem32@12
    EXTRN	__alldiv:PROC
    EXTRN	__allrem:PROC
    EXTRN	__aulldiv:PROC
    EXTRN	__aullrem:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64@8
    _TEXT	SEGMENT
    @Int32x32To64@8 PROC					; COMDAT
    ; _v$ = ecx
    ; _w$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 7
    	ret	0
    @Int32x32To64@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64Div32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @Int32x32To64Div32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 10
    	push	esi
    ; Line 6
    	mov	esi, eax
    	mov	ecx, edx
    ; Line 11
    	idiv	DWORD PTR _z$[esp-4]
    	mov	eax, DWORD PTR _z$[esp]
    	cdq
    	push	edx
    	push	eax
    	push	ecx
    	push	esi
    	call	__alldiv
    	pop	esi
    ; Line 12
    	ret	4
    @Int32x32To64Div32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64Rem32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @Int32x32To64Rem32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 15
    	push	esi
    ; Line 6
    	mov	esi, eax
    	mov	ecx, edx
    ; Line 16
    	idiv	DWORD PTR _z$[esp-4]
    	mov	eax, edx
    	mov	eax, DWORD PTR _z$[esp]
    	cdq
    	push	edx
    	push	eax
    	push	ecx
    	push	esi
    	call	__allrem
    	pop	esi
    ; Line 17
    	ret	0
    @Int32x32To64Rem32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64@8
    _TEXT	SEGMENT
    @UInt32x32To64@8 PROC					; COMDAT
    ; _v$ = ecx
    ; _w$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 23
    	ret	0
    @UInt32x32To64@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64Div32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @UInt32x32To64Div32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 27
    	div	DWORD PTR _z$[esp-4]
    	push	0
    	push	DWORD PTR _z$[esp]
    	push	edx
    	push	eax
    	call	__aulldiv
    ; Line 28
    	ret	0
    @UInt32x32To64Div32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64Rem32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @UInt32x32To64Rem32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 32
    	div	DWORD PTR _z$[esp-4]
    	mov	eax, edx
    	push	0
    	push	DWORD PTR _z$[esp]
    	push	edx
    	push	eax
    	call	__aullrem
    ; Line 33
    	ret	0
    @UInt32x32To64Rem32@12 ENDP
    _TEXT	ENDS
    END
    While the compiler here (contrary to example 0) generates the proper code for the multiplications, it but fails to generate the corresponding proper code for the immediately following divisions.

    Also notice the difference between the signed and unsigned variants of the combined multiplication and division routines: instead to push the (properly sign-extended) divisor first and the product afterwards, the product is computed first, then moved into two (intermediate) registers which are finally pushed for the calls of the _alldiv(), _allrem(), _aulldiv() and _aullrem() helper routines.

Example 2

Demonstration

  1. Create the text file example2.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    extern long long foo(void);
    extern long long bar(void);
    
    long long product(void)
    {
        return foo() * bar();
    }
  2. Generate the assembly listing example2.asm from the source file example2.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample2.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example2.c
  3. Display the assembly listing example2.asm created in step 2.:

    Type example2.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example2.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_product
    EXTRN	_foo:PROC
    EXTRN	_bar:PROC
    EXTRN	__allmul:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_product
    _TEXT	SEGMENT
    _product PROC						; COMDAT
    ; File c:\users\stefan\desktop\example2.c
    ; Line 7
    	push	esi
    	push	edi
    ; Line 8
    	call	_foo
    	mov	edi, eax
    	mov	esi, edx
    	push	edx
    	push	eax
    	call	_bar
    	push	edx
    	push	eax
    	push	esi
    	push	edi
    	call	__allmul
    	pop	edi
    	pop	esi
    ; Line 9
    	ret	0
    _product ENDP
    _TEXT	ENDS
    END
    Multiplication is commutative, so the arguments for the external routine _allmul() can be swapped, saving 6 of the 13 instructions generated, and without clobbering the registers EDI and ESI for intermediate storage.

Example 3

Demonstration

  1. Create the text file example3.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long __stdcall div(long long foo, long long bar)
    {
        return foo / bar;
    }
    
    long long __stdcall mod(long long foo, long long bar)
    {
        return foo % bar;
    }
    
    long long __stdcall mul(long long foo, long long bar)
    {
        return foo * bar;
    }
  2. Generate the assembly listing example3.asm from the source file example3.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample3.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example3.c
  3. Display the assembly listing example3.asm created in step 2.:

    Type example3.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example3.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC  _div@16
    PUBLIC  _mod@16
    PUBLIC  _mul@16
    EXTRN   __alldiv:PROC
    EXTRN   __allmul:PROC
    EXTRN   __allrem:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_div@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _div@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example3.c
    ; Line 5
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _foo$[esp+8]
    	push    DWORD PTR _foo$[esp+8]
    	call	__alldiv
    	jmp	__alldiv
    ; Line 6
    	ret	16					; 00000010H
    _div@16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mod@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _mod@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example3.c
    ; Line 10
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _foo$[esp+8]
    	push    DWORD PTR _foo$[esp+8]
    	call	__allrem
    	jmp	__allrem
    ; Line 11
    	ret	16					; 00000010H
    _mod@16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mul@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _mul@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example3.c
    ; Line 15
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _foo$[esp+8]
    	push    DWORD PTR _foo$[esp+8]
    	call	__allmul
    	jmp	__allmul
    ; Line 16
    	ret	16					; 00000010H
    _mul@16	ENDP
    _TEXT	ENDS
    END

Example 4

Division by powers of 2 is only optimised for constant divisors.

Demonstration

  1. Create the text file example4.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #ifndef SIGNED
    unsigned dividebypowerof2(unsigned number, unsigned exponent)
    {
        return number / (1U << exponent);
    }
    
    unsigned modulopowerof2(unsigned number, unsigned exponent)
    {
        return number % (1U << exponent);
    }
    
    unsigned quotient(unsigned argument)
    {
        return dividebypowerof2(argument, 9);
    }
    
    unsigned remainder(unsigned argument)
    {
        return modulopowerof2(argument, 9);
    }
    #else
    signed dividebypowerof2(signed number, unsigned exponent)
    {
        return number / (1 << exponent);
    }
    
    signed modulopowerof2(signed number, unsigned exponent)
    {
        return number % (1 << exponent);
    }
    
    signed quotient(signed argument)
    {
        return dividebypowerof2(argument, 9);
    }
    
    signed remainder(signed argument)
    {
        return modulopowerof2(argument, 9);
    }
    #endif
  2. Generate the assembly listing example4.asm from the source file example4.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample4.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example4.c
  3. Display the assembly listing example4.asm created in step 2.:

    Type example4.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example4.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_dividebypowerof2
    PUBLIC	_modulopowerof2
    PUBLIC	_quotient
    PUBLIC	_remainder
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_dividebypowerof2
    _TEXT	SEGMENT
    _number$ = 8						; size = 4
    _exponent$ = 12						; size = 4
    _dividebypowerof2 PROC					; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 6
    	mov	ecx, DWORD PTR _exponent$[esp-4]
    	xor	edx, edx
    	mov	eax, DWORD PTR _number$[esp-4]
    	push	esi
    	mov	esi, 1
    	shl	esi, cl
    	div	esi
    	pop	esi
    	shr	eax, cl
    ; Line 7
    	ret	0
    _dividebypowerof2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_quotient
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _quotient PROC						; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 6
    	mov	eax, DWORD PTR _argument$[esp-4]
    	shr	eax, 9
    ; Line 17
    	ret	0
    _quotient ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_modulopowerof2
    _TEXT	SEGMENT
    _number$ = 8						; size = 4
    _exponent$ = 12						; size = 4
    _modulopowerof2 PROC					; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 11
    	mov	ecx, DWORD PTR _exponent$[esp-4]
    	xor	edx, edx
    	or	eax, -1
    	shl	eax, cl
    	mov	eax, DWORD PTR _number$[esp-4]
    	and	eax, DWORD PTR _number$[esp-4]
    	push	esi
    	mov	esi, 1
    	shl	esi, cl
    	div	esi
    	pop	esi
    	mov	eax, edx
    ; Line 12
    	ret	0
    _modulopowerof2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_remainder
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _remainder PROC						; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 11
    	mov	eax, DWORD PTR _argument$[esp-4]
    	and	eax, 511				; 000001ffH
    ; Line 22
    	ret	0
    _remainder ENDP
    _TEXT	ENDS
    END
  4. Generate another assembly listing example4.asm from the source file example4.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro SIGNED defined on the command line:

    CL.EXE /Bv /c /DSIGNED /Fa /FoNUL: /Gy /Ox /Tcexample4.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example4.c
  5. Display the assembly listing example4.asm created in step 4.:

    Type example4.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example4.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_dividebypowerof2
    PUBLIC	_modulopowerof2
    PUBLIC	_quotient
    PUBLIC	_remainder
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_dividebypowerof2
    _TEXT	SEGMENT
    _number$ = 8						; size = 4
    _exponent$ = 12						; size = 4
    _dividebypowerof2 PROC					; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 26
    	mov	eax, DWORD PTR _number$[esp-4]
    	mov	ecx, DWORD PTR _exponent$[esp-4]
    	push	esi
    	mov	esi, 1
    	cdq
    	shl	esi, cl
    	idiv	esi
    	pop	esi
    	not	ecx
    	shr	edx, 1
    	shr	edx, cl
    	not	ecx
    	add	eax, edx
    	sar	eax, cl
    ; Line 27
    	ret	0
    _dividebypowerof2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_quotient
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _quotient PROC						; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 26
    	mov	eax, DWORD PTR _argument$[esp-4]
    	cdq
    	and	edx, 511				; 000001ffH
    	shr	edx, 23					; 00000017H
    	add	eax, edx
    	sar	eax, 9
    ; Line 37
    	ret	0
    _quotient ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_modulopowerof2
    _TEXT	SEGMENT
    _number$ = 8						; size = 4
    _exponent$ = 12						; size = 4
    _modulopowerof2 PROC					; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 31
    	mov	eax, DWORD PTR _number$[esp-4]
    	mov	ecx, DWORD PTR _exponent$[esp-4]
    	push	ebx
    	push	esi
    	mov	esi, 1
    	cdq
    	shl	esi, cl
    	idiv	esi
    	pop	esi
    	mov	eax, edx
    	xor	ebx, ebx
    	shld	ebx, edx, cl
    	or	edx, -1
    	add	ebx, eax
    	shl	edx, cl
    	and	edx, ebx
    	sub	eax, edx
    	pop	ebx
    ; Line 32
    	ret	0
    _modulopowerof2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_remainder
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _remainder PROC						; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 31
    	mov	eax, DWORD PTR _argument$[esp-4]
    	and	eax, -2147483137			; 800001ffH
    	jns	SHORT $LN5@remainder
    	dec	eax
    	or	eax, -512				; fffffe00H
    	inc	eax
    $LN5@remainder:
    	cdq
    	shr	edx, 23					; 00000017H
    	add	edx, eax
    	and	edx, -512				; fffffe00H
    	sub	eax, edx
    ; Line 42
    	ret	0
    _remainder ENDP
    _TEXT	ENDS
    END

Example 5

Demonstration

  1. Create the text file example5.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long foo(long long foo)
    {
        foo <<= 1;
        foo += 1;
        foo |= 1;
    
        return foo;
    }
    
    long long bar(long long bar)
    {
        bar += bar;
        bar += 1;
        bar |= 1;
    
        return bar;
    }
  2. Generate the assembly listing example5.asm from the source file example5.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample5.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example5.c
  3. Display the assembly listing example5.asm created in step 2.:

    Type example5.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example5.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_foo
    PUBLIC	_bar
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_foo
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _foo	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example5.c
    ; Line 5
    	mov	eax, DWORD PTR _foo$[esp-4]
    	mov	edx, DWORD PTR _foo$[esp]
    	shld	edx, eax, 1
    	add	eax, eax
    	adc	edx, edx
    ; Line 6
    	add	eax, 1
    	adc	edx, 0
    	inc	eax
    ; Line 10
    	ret	0
    _foo	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_bar
    _TEXT	SEGMENT
    _bar$ = 8						; size = 8
    _bar	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example5.c
    ; Line 14
    	mov	eax, DWORD PTR _bar$[esp-4]
    	mov	edx, DWORD PTR _bar$[esp]
    	shld	edx, eax, 1
    	add	eax, eax
    	adc	edx, edx
    ; Line 15
    	add	eax, 1
    	adc	edx, 0
    	inc	eax
    ; Line 19
    	ret	0
    _foo	ENDP
    _TEXT	ENDS
    END
    While the optimiser recognises that the addition of 1 yields an odd number and therefore generates no code for the logical or, it but fails to recognise that both the shift of foo and the addition of bar to itself yield an even number, so the following addition of 1 can’t produce a carry, and an addition with carry ADC instruction is useless!

Example 6

Demonstration

  1. Create the text file example6.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long rotate32(unsigned long value, unsigned int count)
    {
        return (value << count) | (value >> (32 - count));
    }
    
    unsigned long rotate32x(unsigned long value, unsigned int count)
    {
        return (value << count) ^ (value >> (32 - count));
    }
    
    unsigned long long rotate64x(unsigned long long value, unsigned int count)
    {
        return (value << count) ^ (value >> (64 - count));
    }
    
    unsigned long long rotate64(unsigned long long value, unsigned int count)
    {
        return (value << count) | (value >> (64 - count));
    }
    
    unsigned long long intrinsic(unsigned long long value, unsigned int count)
    {
        return _rotl64(value, count);
    }
  2. Generate the assembly listing example6.asm from the source file example6.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample6.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example6.c
  3. Display the assembly listing example6.asm created in step 2.:

    Type example6.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example6.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_rotate32
    PUBLIC	_rotate32x
    PUBLIC	_rotate64x
    PUBLIC	_rotate64
    PUBLIC	_intrinsic
    EXTRN	__allshl:PROC
    EXTRN	__aullshr:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate32
    _TEXT	SEGMENT
    _value$ = 8						; size = 4
    _count$ = 12						; size = 4
    _rotate32 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 5
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, DWORD PTR _count$[esp-4]
    	rol	eax, cl
    ; Line 6
    	ret	0
    _rotate32 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate32x
    _TEXT	SEGMENT
    _value$ = 8						; size = 4
    _count$ = 12						; size = 4
    _rotate32x PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 9
    	push	esi
    ; Line 10
    	mov	esi, DWORD PTR _value$[esp]
    	mov	ecx, 32					; 00000020H
    	sub	ecx, DWORD PTR _count$[esp]
    	mov	eax, esi
    	shr	eax, cl
    	mov	ecx, DWORD PTR _count$[esp]
    	shl	esi, cl
    	xor	eax, esi
    	pop	esi
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, DWORD PTR _count$[esp-4]
    	rol	eax, cl
    ; Line 11
    	ret	0
    _rotate32x ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate64x
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _rotate64x PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 15
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, 64					; 00000040H
    	sub	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	ebx
    	push	ebp
    	call	__aullshr
    	mov	ecx, DWORD PTR _count$[esp+4]
    	mov	ebx, eax
    	mov	eax, DWORD PTR _value$[esp+4]
    	mov	ebp, edx
    	mov	edx, DWORD PTR _value$[esp+8]
    	call	__allshl
    	xor	edx, ebp
    	xor	eax, ebx
    	pop	ebp
    	mov	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp-4]
    	test	cl, 32					; 00000020H
    	jz	SHORT @F
    	xchg	eax, edx
    @@:
    	test	cl, 31					; 0000001fH
    	jz	SHORT @F
    	push	ebx
    	mov	ebx, edx
    	shld	edx, eax, cl
    	shld	eax, ebx, cl
    	pop	ebx
    @@:
    ; Line 16
    	ret	0
    _rotate64x ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate64
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _rotate64 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 20
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, 64					; 00000040H
    	sub	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	ebx
    	push	ebp
    	call	__aullshr
    	mov	ecx, DWORD PTR _count$[esp+4]
    	mov	ebx, eax
    	mov	eax, DWORD PTR _value$[esp+4]
    	mov	ebp, edx
    	mov	edx, DWORD PTR _value$[esp+8]
    	call	__allshl
    	or	edx, ebp
    	or	eax, ebx
    	pop	ebp
    	mov	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp-4]
    	test	cl, 32					; 00000020H
    	jz	SHORT @F
    	xchg	eax, edx
    @@:
    	test	cl, 31					; 0000001fH
    	jz	SHORT @F
    	push	ebx
    	mov	ebx, edx
    	shld	edx, eax, cl
    	shld	eax, ebx, cl
    	pop	ebx
    @@:
    ; Line 21
    	ret	0
    _rotate64 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_intrinsic
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _intrinsic PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 25
    	mov	cl, BYTE PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	esi
    	mov	esi, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp]
    	mov	esi, edx
    	test	cl, 32					; 00000020H
    	cmovnz	edx, eax
    	cmovnz	eax, esi
    	cmovnz	esi, edx
    	je	SHORT $LN3@intrinsic
    	mov	eax, esi
    	mov	esi, edx
    	mov	edx, eax
    $LN3@intrinsic:
    	mov	eax, esi
    	and	cl, 31					; 0000001fH
    	je	SHORT $LN4@intrinsic
    	shld	eax, edx, cl
    	shld	edx, esi, cl
    $LN4@intrinsic:
    ; Line 26
    	pop	esi
    	ret	0
    _intrinsic ENDP
    _TEXT	ENDS
    END
    Except for the first function, the optimiser fails to recognise the commonly used expressions for rotate operations!
    Also notice the unoptimised code generated for (not only swapping the register EDX with ESI in) the intrinsic function _rotl64().

Example 7

Horrible load of code generated for swapping the bytes of a 64-bit operand instead of a single BSWAP instruction or two MOVBE instructions.

Demonstration

  1. Create the text file example7.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #ifdef ALTERNATE
    unsigned short swap16(unsigned short us)
    {
        return ((us & 0xFF00U) >> 8)
             | ((us & 0x00FFU) << 8);
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return ((ul & 0xFF000000UL) >> 3 * 8)
             | ((ul & 0x00FF0000UL) >>     8)
             | ((ul & 0x0000FF00UL) <<     8)
             | ((ul & 0x000000FFUL) << 3 * 8);
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return ((ull & 0xFF00000000000000ULL) >> 7 * 8)
             | ((ull & 0x00FF000000000000ULL) >> 5 * 8)
             | ((ull & 0x0000FF0000000000ULL) >> 3 * 8)
             | ((ull & 0x000000FF00000000ULL) >>     8)
             | ((ull & 0x00000000FF000000ULL) <<     8)
             | ((ull & 0x0000000000FF0000ULL) << 3 * 8)
             | ((ull & 0x000000000000FF00ULL) << 5 * 8)
             | ((ull & 0x00000000000000FFULL) << 7 * 8);
    }
    #else
    unsigned short swap16(unsigned short us)
    {
        return (us << 8) & 0xFF00U
             | (us >> 8) & 0x00FFU;
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return (ul << 3 * 8) & 0xFF000000UL
             | (ul <<     8) & 0x00FF0000UL
             | (ul >>     8) & 0x0000FF00UL
             | (ul >> 3 * 8) & 0x000000FFUL;
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return (ull << 7 * 8) & 0xFF00000000000000ULL
             | (ull << 5 * 8) & 0x00FF000000000000ULL
             | (ull << 3 * 8) & 0x0000FF0000000000ULL
             | (ull <<     8) & 0x000000FF00000000ULL
             | (ull >>     8) & 0x00000000FF000000ULL
             | (ull >> 3 * 8) & 0x0000000000FF0000ULL
             | (ull >> 5 * 8) & 0x000000000000FF00ULL
             | (ull >> 7 * 8) & 0x00000000000000FFULL;
    }
    #endif
    Note: better use the appropriate intrinsic function _byteswap_ushort(), _byteswap_ulong() or _byteswap_uint64() instead of such expressions!
  2. Generate the assembly listing example7.asm from the source file example7.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample7.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example7.c
  3. Display the assembly listing example7.asm created in step 2.:

    Type example7.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 32
    	movzx	edx, cx
    	mov	eax, edx
    	shr	edx, 8
    	shl	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 34
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 38
    	bswap	ecx
    	mov	eax, ecx
    ; Line 42
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 45
    	mov	r8, rcx
    ; Line 46
    	bswap	rcx
    	mov	rax, rcx
    	mov	r9, rcx
    	mov	rax, 71776119061217280			; 00ff000000000000H
    	mov	rdx, r8
    	and	r9, rax
    	and	edx, 65280				; 0000ff00H
    	mov	rax, rcx
    	shr	rax, 16
    	or	r9, rax
    	mov	rax, rcx
    	shr	r9, 16
    	mov	rcx, 280375465082880			; 0000ff0000000000H
    	and	rax, rcx
    	mov	rcx, 1095216660480			; 000000ff00000000H
    	or	r9, rax
    	mov	rax, r8
    	and	rax, rcx
    	shr	r9, 16
    	or	r9, rax
    	mov	rcx, r8
    	mov	rax, r8
    	shr	r9, 8
    	shl	rax, 16
    	and	ecx, 16711680				; 00ff0000H
    	or	rdx, rax
    	mov	eax, -16777216				; ff000000H
    	and	rax, r8
    	shl	rdx, 16
    	or	rdx, rcx
    	shl	rdx, 16
    	or	rax, rdx
    	shl	rax, 8
    	or	rax, r9
    ; Line 54
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    END
    Note: the assembly listing shows 32 (in words: thirty-two) instructions for the function swap64() instead of only a single (in words: one) BSWAP instruction!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  4. Generate another assembly listing example7.asm from the source file example7.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample7.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example7.c
  5. Display the assembly listing example7.asm created in step 4.:

    Type example7.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example7.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 32
    	movbe	ax, WORD PTR _us$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 34
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 38
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	eax, DWORD PTR _ul$[esp-4]
    	bswap	eax
    ; Line 42
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 46
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	movbe	eax, DWORD PTR _ull$[esp]
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	push	edi
    	mov	edi, DWORD PTR _ull$[esp+12]
    	mov	ebx, edx
    	and	ebx, 16711680				; 00ff0000H
    	mov	eax, edi
    	shrd	eax, ecx, 16
    	xor	ebp, ebp
    	mov	esi, edi
    	or	ebp, eax
    	shr	ecx, 16					; 00000010H
    	or	ebx, ecx
    	mov	eax, edx
    	shrd	ebp, ebx, 16
    	and	eax, 65280				; 0000ff00H
    	and	esi, 65280				; 0000ff00H
    	shr	ebx, 16					; 00000010H
    	xor	ecx, ecx
    	or	ebx, eax
    	movzx	eax, dl
    	shrd	ebp, ebx, 16
    	shr	ebx, 16					; 00000010H
    	or	ebx, eax
    	mov	eax, edi
    	shld	edx, eax, 16
    	shrd	ebp, ebx, 8
    	shl	eax, 16					; 00000010H
    	or	edx, ecx
    	or	esi, eax
    	shr	ebx, 8
    	shld	edx, esi, 16
    	mov	eax, edi
    	and	edi, -16777216				; ff000000H
    	shl	esi, 16					; 00000010H
    	and	eax, 16711680				; 00ff0000H
    	or	esi, eax
    	shld	edx, esi, 16
    	shl	esi, 16					; 00000010H
    	or	esi, edi
    	shld	edx, esi, 8
    	pop	edi
    	shl	esi, 8
    	or	edx, ebx
    	or	ebp, esi
    	pop	esi
    	mov	eax, ebp
    	pop	ebp
    	pop	ebx
    ; Line 54
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    END
    Note: the assembly listing shows 52 (in words: fifty-two) instructions for the function swap64() instead of only 2 (in words: two) MOVBE instructions!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  6. Repeat the previous steps with the alternate implementation; generate another assembly listing example7.asm from the source file example7.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture, with the macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tcexample7.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example7.c
  7. Display the assembly listing example7.asm created in step 6.:

    Type example7.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 6
    	movzx	edx, cx
    	mov	eax, edx
    	shl	dx, 8
    	shr	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 8
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 12
    	bswap	ecx
    	mov	eax, ecx
    ; Line 16
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 19
    	mov	r8, rcx
    ; Line 20
    	bswap	rcx
    	mov	rax, rcx
    	mov	r9, rcx
    	mov	rax, 71776119061217280			; 00ff000000000000H
    	mov	rdx, r8
    	and	r9, rax
    	and	edx, 65280				; 0000ff00H
    	mov	rax, rcx
    	shr	rax, 16
    	or	r9, rax
    	mov	rax, rcx
    	shr	r9, 16
    	mov	rcx, 280375465082880			; 0000ff0000000000H
    	and	rax, rcx
    	mov	rcx, 1095216660480			; 000000ff00000000H
    	or	r9, rax
    	mov	rax, r8
    	and	rax, rcx
    	shr	r9, 16
    	or	r9, rax
    	mov	rcx, r8
    	mov	rax, r8
    	shr	r9, 8
    	shl	rax, 16
    	and	ecx, 16711680				; 00ff0000H
    	or	rdx, rax
    	mov	eax, -16777216				; ff000000H
    	and	rax, r8
    	shl	rdx, 16
    	or	rdx, rcx
    	shl	rdx, 16
    	or	rax, rdx
    	shl	rax, 8
    	or	rax, r9
    ; Line 28
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    END
    Note: the assembly listing shows 32 (in words: thirty-two) instructions for the function swap64() instead of only a single (in words: one) BSWAP instruction!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand in an alternate form too and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  8. Generate another assembly listing example7.asm from the source file example7.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tcexample7.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example7.c
  9. Display the assembly listing example7.asm created in step 8.:

    Type example7.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example7.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 6
    	movbe	ax, DWORD PTR _ul$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 8
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 12
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	eax, DWORD PTR _ul$[esp-4]
    	bswap	eax
    ; Line 16
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 20
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	movbe	eax, DWORD PTR _ull$[esp]
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	push	edi
    	mov	edi, DWORD PTR _ull$[esp+12]
    	mov	ebx, edx
    	and	ebx, 16711680				; 00ff0000H
    	mov	eax, edi
    	shrd	eax, ecx, 16
    	xor	ebp, ebp
    	mov	esi, edi
    	or	ebp, eax
    	shr	ecx, 16					; 00000010H
    	or	ebx, ecx
    	mov	eax, edx
    	shrd	ebp, ebx, 16
    	and	eax, 65280				; 0000ff00H
    	and	esi, 65280				; 0000ff00H
    	shr	ebx, 16					; 00000010H
    	xor	ecx, ecx
    	or	ebx, eax
    	movzx	eax, dl
    	shrd	ebp, ebx, 16
    	shr	ebx, 16					; 00000010H
    	or	ebx, eax
    	mov	eax, edi
    	shld	edx, eax, 16
    	shrd	ebp, ebx, 8
    	shl	eax, 16					; 00000010H
    	or	edx, ecx
    	or	esi, eax
    	shr	ebx, 8
    	shld	edx, esi, 16
    	mov	eax, edi
    	and	edi, -16777216				; ff000000H
    	shl	esi, 16					; 00000010H
    	and	eax, 16711680				; 00ff0000H
    	or	esi, eax
    	shld	edx, esi, 16
    	shl	esi, 16					; 00000010H
    	or	esi, edi
    	shld	edx, esi, 8
    	pop	edi
    	shl	esi, 8
    	or	edx, ebx
    	or	ebp, esi
    	pop	esi
    	mov	eax, ebp
    	pop	ebp
    	pop	ebx
    ; Line 28
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    END
    Note: the assembly listing shows 52 (in words: fifty-two) instructions for the function swap64() instead of only 2 (in words: two) MOVBE instructions!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand in an alternate form too and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

Example 8

Awful load of code generated for swapping the bytes of a 32-bit or 64-bit operand instead of BSWAP or MOVBE instructions.

Demonstration

  1. Create the text file example8.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long swap32rot(unsigned long ul)
    {
        return _lrotr(ul, 8) & 0xFF00FF00UL
             | _lrotl(ul, 8) & 0x00FF00FFUL;
    }
    
    __inline
    unsigned short swap16(unsigned short us)
    {
        return (us << 8) | (us >> 8);
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return (unsigned long) swap16((unsigned short) ul) << 16
             | (unsigned long) swap16((unsigned short) (ul >> 16));
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return (unsigned long long) swap32((unsigned long) ull) << 32
             | (unsigned long long) swap32((unsigned long) (ull >> 32));
    }
    Note: better use the appropriate intrinsic function _byteswap_ushort(), _byteswap_ulong() or _byteswap_uint64() instead of such expressions!
  2. Generate the assembly listing example8.asm from the source file example8.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample8.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example8.c
  3. Display the assembly listing example8.asm created in step 2.:

    Type example8.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap32rot
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32rot
    _TEXT	SEGMENT
    ul$ = 8
    swap32rot PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 5
    	mov	eax, ecx
    	rol	ecx, 8
    	ror	eax, 8
    	and	ecx, 16711935				; 00ff00ffH
    	and	eax, -16711936				; ff00ff00H
    	or	eax, ecx
    	bswap	eax
    ; Line 7
    	ret	0
    swap32rot ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 12
    	movzx	edx, cx
    	mov	eax, edx
    	shl	dx, 8
    	shr	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 13
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 17
    	mov	eax, ecx
    ; Line 12
    	rol	cx, 8
    ; Line 17
    	shr	eax, 16
    ; Line 12
    	rol	ax, 8
    ; Line 17
    	movzx	ecx, cx
    	movzx	eax, ax
    	shl	ecx, 16
    	or	eax, ecx
    	bswap	eax
    ; Line 19
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 17
    	mov	eax, ecx
    ; Line 23
    	mov	r9, rcx
    ; Line 17
    	shr	eax, 16
    ; Line 12
    	rol	ax, 8
    	movzx	r8d, ax
    	rol	cx, 8
    	movzx	eax, cx
    	shl	rax, 16
    ; Line 23
    	or	rax, r8
    	shr	r9, 32					; 00000020H
    ; Line 12
    	movzx	ecx, r9w
    ; Line 23
    	shl	rax, 16
    ; Line 12
    	rol	cx, 8
    	movzx	edx, cx
    ; Line 23
    	or	rax, rdx
    ; Line 17
    	shr	r9d, 16
    ; Line 12
    	rol	r9w, 8
    ; Line 23
    	shl	rax, 16
    ; Line 12
    	movzx	ecx, r9w
    ; Line 23
    	or	rax, rcx
    	mov	rax, rcx
    	bswap	rax
    ; Line 25
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    END
    Note: instead of just 2 instructions for each of the 4 functions, the assembly listing shows 20 (in words: twenty) instructions for the function swap64(), 8 instructions for the function swap32(), 5 instructions for the function swap16(), and 6 instructions for the function swap32rot().

    The optimiser fails to recognise all these commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) for all operand sizes!
    Additionally the commonly used expression for a rotate operation is not recognised for a 16-bit operand.

  4. Generate another assembly listing example8.asm from the source file example8.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample8.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example8.c
  5. Display the assembly listing example8.asm created in step 4.:

    Type example8.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example8.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap32rot
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32rot
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32rot PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 5
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	eax, ecx
    	ror	eax, 8
    	rol	ecx, 8
    	and	eax, -16711936				; ff00ff00H
    	and	ecx, 16711935				; 00ff00ffH
    	or	eax, ecx
    ; Line 5
    	ret	0
    _swap32rot ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 12
    	movbe	ax, WORD PTR _us$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 13
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 17
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	eax, ecx
    	shr	eax, 16					; 00000010H
    ; Line 12
    	rol	cx, 8
    	rol	ax, 8
    ; Line 17
    	movzx	ecx, cx
    	movzx	eax, ax
    	shl	ecx, 16					; 00000010H
    	or	eax, ecx
    ; Line 19
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 23
    	movbe	eax, DWORD PTR _ull$[esp]
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	mov	ecx, DWORD PTR _ull$[esp-4]
    ; Line 17
    	mov	eax, ecx
    	shr	eax, 16					; 00000010H
    ; Line 12
    	rol	ax, 8
    ; Line 23
    	movzx	eax, ax
    	cdq
    	push	ebx
    	mov	ebx, DWORD PTR _ull$[esp+4]
    	push	esi
    	mov	esi, eax
    ; Line 12
    	rol	cx, 8
    ; Line 22
    	push	edi
    ; Line 23
    	mov	edi, edx
    	movzx	eax, cx
    	cdq
    	shld	edx, eax, 16
    	shl	eax, 16					; 00000010H
    	or	edi, edx
    	or	esi, eax
    ; Line 12
    	mov	ax, bx
    ; Line 23
    	shld	edi, esi, 16
    ; Line 12
    	rol	ax, 8
    ; Line 23
    	movzx	eax, ax
    	cdq
    	shl	esi, 16					; 00000010H
    	or	edi, edx
    	or	esi, eax
    ; Line 17
    	shr	ebx, 16					; 00000010H
    ; Line 12
    	rol	bx, 8
    ; Line 23
    	shld	edi, esi, 16
    	movzx	eax, bx
    	cdq
    	shl	esi, 16					; 00000010H
    	or	edx, edi
    	pop	edi
    	or	eax, esi
    	pop	esi
    	pop	ebx
    ; Line 25
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    END
    Note: instead of just 1 or 2 MOVBE instructions for each of the 4 functions, the assembly listing shows 38 (in words: thirty-eight) instructions for the function swap64(), 9 instructions for the function swap32(), 5 instructions for the function swap16(), and 7 instructions for the function swap32rot().

    The optimiser fails to recognise all these commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) for all operand sizes!
    Additionally the commonly used expression for a rotate operation is not recognised for a 16-bit operand.

Example 9

Demonstration

  1. Create the text file example9.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long lreverse(unsigned long ul)
    {
    #ifndef ALTERNATE
        ul = ((ul & 0xAAAAAAAAUL) >> 1)
           | ((ul & 0x55555555UL) << 1);
        ul = ((ul & 0xCCCCCCCCUL) >> 2)
           | ((ul & 0x33333333UL) << 2);
        ul = ((ul & 0xF0F0F0F0UL) >> 4)
           | ((ul & 0x0F0F0F0FUL) << 4);
        ul = ((ul & 0xFF00FF00UL) >> 8)
           | ((ul & 0x00FF00FFUL) << 8);
        ul = ((ul & 0xFFFF0000UL) >> 16)
           | ((ul & 0x0000FFFFUL) << 16);
    #else
        ul = ((ul >> 1) & 0x55555555UL)
           | ((ul << 1) & 0xAAAAAAAAUL);
        ul = ((ul >> 2) & 0x33333333UL)
           | ((ul << 2) & 0xCCCCCCCCUL);
        ul = ((ul >> 4) & 0x0F0F0F0FUL)
           | ((ul << 4) & 0xF0F0F0F0UL);
        ul = ((ul >> 8) & 0x00FF00FFUL)
           | ((ul << 8) & 0xFF00FF00UL);
        ul = ((ul >> 16) & 0x0000FFFFUL)
           | ((ul << 16) & 0xFFFF0000UL);
    #endif
        return ul;
    }
    
    unsigned long long llreverse(unsigned long long ull)
    {
    #ifndef ALTERNATE
        ull = ((ull & 0xAAAAAAAAAAAAAAAAULL) >> 1)
            | ((ull & 0x5555555555555555ULL) << 1);
        ull = ((ull & 0xCCCCCCCCCCCCCCCCULL) >> 2)
            | ((ull & 0x3333333333333333ULL) << 2);
        ull = ((ull & 0xF0F0F0F0F0F0F0F0ULL) >> 4)
            | ((ull & 0x0F0F0F0F0F0F0F0FULL) << 4);
        ull = ((ull & 0xFF00FF00FF00FF00ULL) >> 8)
            | ((ull & 0x00FF00FF00FF00FFULL) << 8);
        ull = ((ull & 0xFFFF0000FFFF0000ULL) >> 16)
            | ((ull & 0x0000FFFF0000FFFFULL) << 16);
        ull = ((ull & 0xFFFFFFFF00000000ULL) >> 32)
            | ((ull & 0x00000000FFFFFFFFULL) << 32);
    #else
        ull = ((ull >> 1) & 0x5555555555555555ULL)
            | ((ull << 1) & 0xAAAAAAAAAAAAAAAAULL);
        ull = ((ull >> 2) & 0x3333333333333333ULL)
            | ((ull << 2) & 0xCCCCCCCCCCCCCCCCULL);
        ull = ((ull >> 4) & 0x0F0F0F0F0F0F0F0FULL)
            | ((ull << 4) & 0xF0F0F0F0F0F0F0F0ULL);
        ull = ((ull >> 8) & 0x00FF00FF00FF00FFULL)
            | ((ull << 8) & 0xFF00FF00FF00FF00ULL);
        ull = ((ull >> 16) & 0x0000FFFF0000FFFFULL)
            | ((ull << 16) & 0xFFFF0000FFFF0000ULL);
        ull = ((ull >> 32) & 0x00000000FFFFFFFFULL)
            | ((ull << 32) & 0xFFFFFFFF00000000ULL);
    #endif
        return ull;
    }
  2. Generate the assembly listing example9.asm from the source file example9.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tcexample9.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example9.c
  3. Display the assembly listing example9.asm created in step 2.:

    Type example9.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example9.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lreverse
    PUBLIC	_llreverse
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lreverse
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _lreverse PROC						; COMDAT
    ; File c:\users\stefan\desktop\example9.c
    ; Line 6
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	edx, ecx
    	shr	edx, 1
    	lea	eax, DWORD PTR [ecx+ecx]
    	xor	edx, eax
    	lea	eax, DWORD PTR [ecx+ecx]
    	and	edx, 1431655765				; 55555555H
    	xor	edx, eax
    	lea	eax, [ecx+ecx]
    	shr	ecx, 1
    	and	eax, -1431655766			; aaaaaaaaH
    	and	ecx, 1431655765				; 55555555H
    	or	ecx, eax
    ; Line 8
    	mov	ecx, edx
    	shr	ecx, 2
    	lea	eax, DWORD PTR [edx*4]
    	xor	ecx, eax
    	lea	eax, DWORD PTR [edx*4]
    	and	ecx, 858993459				; 33333333H
    	xor	ecx, eax
    	lea	eax, [ecx*4]
    	shr	ecx, 2
    	and	eax, -858993460				; ccccccccH
    	and	ecx, 858993459				; 33333333H
    	or	ecx, eax
    ; Line 10
    	mov	edx, ecx
    	mov	eax, ecx
    	shl	eax, 4
    	shr	edx, 4
    	xor	edx, eax
    	shl	ecx, 4
    	and	edx, 252645135				; 0f0f0f0fH
    	xor	edx, ecx
    	mov	eax, ecx
    	shr	ecx, 4
    	shl	eax, 4
    	and	ecx, 252645135				; 0f0f0f0fH
    	and	eax, -252645136				; f0f0f0f0H
    	or	eax, ecx
    ; Line 12
    	mov	eax, edx
    	mov	ecx, edx
    	shr	eax, 8
    	shl	ecx, 8
    	xor	eax, ecx
    	shl	edx, 8
    	and	eax, 16711935				; 00ff00ffH
    	xor	eax, edx
    ; Line 14
    	rol	eax, 16					; 00000010H
    	bswap	eax
    ; Line 29
    	ret	0
    _lreverse ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llreverse
    _TEXT	SEGMENT
    _ull$11$ = 8						; size = 4
    _ull$ = 8						; size = 8
    _llreverse PROC						; COMDAT
    ; File c:\users\stefan\desktop\example9.c
    ; Line 34
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	mov	esi, DWORD PTR _ull$[esp+8]
    	mov	ebx, esi
    	mov	eax, esi
    	shld	ecx, eax, 1
    	push	edi
    	mov	edi, edx
    	add	eax, eax
    	shrd	ebx, edi, 1
    	shld	edx, esi, 1
    	xor	ebx, eax
    	shr	edi, 1
    	xor	edi, ecx
    	add	esi, esi
    	and	ebx, 1431655765				; 55555555H
    	and	edi, 1431655765				; 55555555H
    	xor	ebx, esi
    	xor	edi, edx
    	mov	eax, DWORD PTR _ull$[esp+8]
    	mov	edx, DWORD PTR _ull$[esp+4]
    	mov	ecx, -1431655766			; aaaaaaaaH
    	lea	esi, [eax+eax]
    	lea	edi, [edx+edx]
    	and	esi, ecx
    	and	edi, ecx
    	and	eax, ecx
    	and	edx, ecx
    	shr	eax, 1
    	shr	edx, 1
    	or	eax, esi
    	or	edx, edi
    ; Line 36
    	mov	edx, ebx
    	mov	esi, edi
    	shrd	edx, esi, 2
    	mov	eax, ebx
    	mov	ecx, edi
    	shld	ecx, eax, 2
    	shld	edi, ebx, 2
    	shr	esi, 2
    	xor	esi, ecx
    	shl	eax, 2
    	xor	edx, eax
    	shl	ebx, 2
    	and	esi, 858993459				; 33333333H
    	and	edx, 858993459				; 33333333H
    	xor	esi, edi
    	xor	edx, ebx
    	mov	ecx, -858993460				; ccccccccH
    	lea	esi, [4*eax]
    	lea	edi, [4*edx]
    	and	esi, ecx
    	and	edi, ecx
    	and	eax, ecx
    	and	edx, ecx
    	shr	eax, 2
    	shr	edx, 2
    	or	eax, esi
    	or	edx, edi
    ; Line 38
    	mov	ebx, esi
    	mov	ecx, esi
    	mov	edi, edx
    	mov	eax, edx
    	shrd	edi, ebx, 4
    	shld	ecx, eax, 4
    	shld	esi, edx, 4
    	shl	eax, 4
    	xor	edi, eax
    	shr	ebx, 4
    	xor	ebx, ecx
    	shl	edx, 4
    	and	edi, 252645135				; 0f0f0f0fH
    	and	ebx, 252645135				; 0f0f0f0fH
    	xor	ebx, esi
    	xor	edi, edx
    	mov	ecx, 252645135				; 0f0f0f0fH
    	mov	esi, ecx
    	mov	edi, ecx
    	and	esi, eax
    	and	edi, edx
    	shl	esi, 4
    	shl	edi, 4
    	shr	eax, 4
    	shr	edx, 4
    	and	eax, ecx
    	and	edx, ecx
    	or	eax, esi
    	or	edx, edi
    ; Line 40
    	mov	ebp, edi
    	mov	esi, ebx
    	shrd	ebp, esi, 8
    	mov	eax, edi
    	mov	ecx, ebx
    	shld	ecx, eax, 8
    	shr	esi, 8
    	xor	esi, ecx
    	shl	eax, 8
    	xor	ebp, eax
    	and	esi, 16711935				; 00ff00ffH
    	shld	ebx, edi, 8
    	and	ebp, 16711935				; 00ff00ffH
    	xor	esi, ebx
    	shl	edi, 8
    	mov	DWORD PTR _ull$11$[esp+12], esi
    	xor	ebp, edi
    ; Line 42
    	mov	edi, DWORD PTR _ull$11$[esp+12]
    	mov	eax, ebp
    	mov	ecx, edi
    	mov	edx, ebp
    	shrd	edx, esi, 16
    	shld	ecx, eax, 16
    	shr	esi, 16					; 00000010H
    	shl	eax, 16					; 00000010H
    	xor	edx, eax
    	xor	esi, ecx
    	shld	edi, ebp, 16
    	and	esi, 65535				; 0000ffffH
    	movzx	ecx, dx
    	xor	esi, edi
    	shl	ebp, 16					; 00000010H
    ; Line 60
    	pop	edi
    	mov	eax, esi
    	xor	ecx, ebp
    	pop	esi
    	xor	edx, edx
    	pop	ebp
    	or	edx, ecx
    	pop	ebx
    	bswap	eax
    	bswap	edx
    ; Line 61
    	ret	0
    _llreverse ENDP
    _TEXT	ENDS
    END
    In both functions, the optimiser fails to recognise that the final 2 or 3 shift & mask assignments operating on 8 bits and more can be translated into 1 or 2 BSWAP instructions instead of 9 or even 36 (in words: thirty-six) instructions!

    In the function llreverse() it fails to recognise that no shift operation crosses a register boundary and thus the generation of SHLD and SHRD instructions is not necessary!

  4. Repeat the previous steps with the alternate implementation; generate another assembly listing example9.asm from the source file example9.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /GS- /Gy /Ox /Tcexample9.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example9.c
  5. Display the assembly listing example9.asm created in step 4.:

    Type example9.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example9.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lreverse
    PUBLIC	_llreverse
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lreverse
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _lreverse PROC						; COMDAT
    ; File c:\users\stefan\desktop\example9.c
    ; Line 17
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	edx, ecx
    	shr	edx, 1
    	lea	eax, DWORD PTR [ecx+ecx]
    	xor	edx, eax
    	lea	eax, DWORD PTR [ecx+ecx]
    	and	edx, 1431655765				; 55555555H
    	xor	edx, eax
    	lea	eax, [ecx+ecx]
    	shr	ecx, 1
    	and	eax, -1431655766			; aaaaaaaaH
    	and	ecx, 1431655765				; 55555555H
    	or	ecx, eax
    ; Line 19
    	mov	ecx, edx
    	shr	ecx, 2
    	lea	eax, DWORD PTR [edx*4]
    	xor	ecx, eax
    	lea	eax, DWORD PTR [edx*4]
    	and	ecx, 858993459				; 33333333H
    	xor	ecx, eax
    	lea	eax, [ecx*4]
    	shr	ecx, 2
    	and	eax, -858993460				; ccccccccH
    	and	ecx, 858993459				; 33333333H
    	or	ecx, eax
    ; Line 21
    	mov	edx, ecx
    	mov	eax, ecx
    	shl	eax, 4
    	shr	edx, 4
    	xor	edx, eax
    	shl	ecx, 4
    	and	edx, 252645135				; 0f0f0f0fH
    	xor	edx, ecx
    	mov	eax, ecx
    	shr	ecx, 4
    	shl	eax, 4
    	and	ecx, 252645135				; 0f0f0f0fH
    	and	eax, -252645136				; f0f0f0f0H
    	or	eax, ecx
    ; Line 23
    	mov	eax, edx
    	mov	ecx, edx
    	shr	eax, 8
    	shl	ecx, 8
    	xor	eax, ecx
    	shl	edx, 8
    	and	eax, 16711935				; 00ff00ffH
    	xor	eax, edx
    ; Line 25
    	rol	eax, 16					; 00000010H
    	bswap	eax
    ; Line 29
    	ret	0
    _lreverse ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llreverse
    _TEXT	SEGMENT
    _ull$11$ = 8						; size = 4
    _ull$ = 8						; size = 8
    _llreverse PROC						; COMDAT
    ; File c:\users\stefan\desktopexample9.c
    ; Line 47
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	mov	esi, DWORD PTR _ull$[esp+8]
    	mov	ebx, esi
    	mov	eax, esi
    	shld	ecx, eax, 1
    	push	edi
    	mov	edi, edx
    	add	eax, eax
    	shrd	ebx, edi, 1
    	shld	edx, esi, 1
    	xor	ebx, eax
    	shr	edi, 1
    	xor	edi, ecx
    	add	esi, esi
    	and	ebx, 1431655765				; 55555555H
    	and	edi, 1431655765				; 55555555H
    	xor	ebx, esi
    	xor	edi, edx
    	mov	eax, DWORD PTR _ull$[esp]
    	mov	edx, DWORD PTR _ull$[esp-4]
    	mov	ecx, -1431655766			; aaaaaaaaH
    	lea	esi, [eax+eax]
    	lea	edi, [edx+edx]
    	and	esi, ecx
    	and	edi, ecx
    	and	eax, ecx
    	and	edx, ecx
    	shr	eax, 1
    	shr	edx, 1
    	or	eax, esi
    	or	edx, edi
    ; Line 49
    	mov	edx, ebx
    	mov	esi, edi
    	shrd	edx, esi, 2
    	mov	eax, ebx
    	mov	ecx, edi
    	shld	ecx, eax, 2
    	shld	edi, ebx, 2
    	shr	esi, 2
    	xor	esi, ecx
    	shl	eax, 2
    	xor	edx, eax
    	shl	ebx, 2
    	and	esi, 858993459				; 33333333H
    	and	edx, 858993459				; 33333333H
    	xor	esi, edi
    	xor	edx, ebx
    	mov	ecx, -858993460				; ccccccccH
    	lea	esi, [4*eax]
    	lea	edi, [4*edx]
    	and	esi, ecx
    	and	edi, ecx
    	and	eax, ecx
    	and	edx, ecx
    	shr	eax, 2
    	shr	edx, 2
    	or	eax, esi
    	or	edx, edi
    ; Line 51
    	mov	ebx, esi
    	mov	ecx, esi
    	mov	edi, edx
    	mov	eax, edx
    	shrd	edi, ebx, 4
    	shld	ecx, eax, 4
    	shld	esi, edx, 4
    	shl	eax, 4
    	xor	edi, eax
    	shr	ebx, 4
    	xor	ebx, ecx
    	shl	edx, 4
    	and	edi, 252645135				; 0f0f0f0fH
    	and	ebx, 252645135				; 0f0f0f0fH
    	xor	ebx, esi
    	xor	edi, edx
    	mov	ecx, 252645135				; 0f0f0f0fH
    	mov	esi, ecx
    	mov	edi, ecx
    	and	esi, eax
    	and	edi, edx
    	shl	esi, 4
    	shl	edi, 4
    	shr	eax, 4
    	shr	edx, 4
    	and	eax, ecx
    	and	edx, ecx
    	or	eax, esi
    	or	edx, edi
    ; Line 53
    	mov	ebp, edi
    	mov	esi, ebx
    	shrd	ebp, esi, 8
    	mov	eax, edi
    	mov	ecx, ebx
    	shld	ecx, eax, 8
    	shr	esi, 8
    	xor	esi, ecx
    	shl	eax, 8
    	xor	ebp, eax
    	and	esi, 16711935				; 00ff00ffH
    	shld	ebx, edi, 8
    	and	ebp, 16711935				; 00ff00ffH
    	xor	esi, ebx
    	shl	edi, 8
    	mov	DWORD PTR _ull$11$[esp+12], esi
    	xor	ebp, edi
    ; Line 55
    	mov	edi, DWORD PTR _ull$11$[esp+12]
    	mov	eax, ebp
    	mov	ecx, edi
    	mov	edx, ebp
    	shrd	edx, esi, 16
    	shld	ecx, eax, 16
    	shr	esi, 16					; 00000010H
    	shl	eax, 16					; 00000010H
    	xor	edx, eax
    	xor	esi, ecx
    	shld	edi, ebp, 16
    	and	esi, 65535				; 0000ffffH
    	movzx	ecx, dx
    	xor	esi, edi
    	shl	ebp, 16					; 00000010H
    ; Line 60
    	pop	edi
    	mov	eax, esi
    	xor	ecx, ebp
    	pop	esi
    	xor	edx, edx
    	pop	ebp
    	or	edx, ecx
    	pop	ebx
    	bswap	eax
    	bswap	edx
    ; Line 61
    	ret	0
    _llreverse ENDP
    _TEXT	ENDS
    END

Example 10

Superfluous load and store instructions using superfluous temporary variable generated by the Visual C 2017 and Visual C 2010 compilers.

Demonstration

  1. Create the text file example10.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __inline
    unsigned long htonl(unsigned long ul)
    {
    #if _MSC_VER >= 1900
    	__asm	movbe	eax, ul
    #else
    	__asm	mov	eax, ul
    	__asm	bswap	eax
    #endif
    }
    
    int main(int argc)
    {
        unsigned long array[] = {'MSFT', 'MSVC', 'POOR', 'CODE'};
    
        argc = htonl(argc);
    
        for (argc = 0; argc < sizeof(array) / sizeof(*array); argc++)
            array[argc] = htonl(array[argc]);
    }
  2. Generate the assembly listing example10.asm from the source file example10.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tcexample10.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example10.c
  3. Display the assembly listing example10.asm created in step 2.:

    Type example10.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example10.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_htonl
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_htonl
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _htonl	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example10.c
    ; Line 7
    	movbe	eax, DWORD PTR _ul$[esp-4]
    ; Line 12
    	ret	0
    _htonl	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _array$ = -8						; size = 16
    _ul$ = 8						; size = 4
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example10.c
    ; Line 15
    	sub	esp, 16					; 00000010H
    ; Line 16
    	mov	DWORD PTR _array$[esp+8], 1297303124	; 4d534654H
    	mov	DWORD PTR _array$[esp+12], 1297307203	; 4d535643H
    	mov	DWORD PTR _array$[esp+16], 1347374930	; 504f4f52H
    	mov	DWORD PTR _array$[esp+20], 1129268293	; 434f4445H
    ; Line 18
    	movbe	eax, DWORD PTR _argc$[esp+12]
    ; Line 20
    	xor	ecx, ecx
    	npad	6
    $LL4@main:
    ; Line 21
    	mov	eax, DWORD PTR _array$[esp+ecx*4+16]
    	mov	DWORD PTR _ul$[esp+12], eax
    	movbe	eax, DWORD PTR _ul$[esp+12]
    	movbe	eax, DWORD PTR _array$[esp+ecx*4+16]
    	mov	DWORD PTR _array$[esp+ecx*4+16], eax
    	inc	ecx
    	cmp	ecx, 4
    	jb	SHORT $LL4@main
    ; Line 22
    	add	esp, 16					; 00000010H
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    Notice the superfluous in(s)ane transfer of the EAX register to and from the (intermediate) variable _ul$ generated for line 21!
    Also notice that the superfluous instruction generated for line 18 doesn’t use an intermediate variable!
  4. Generate another assembly listing example10.asm from the source file example10.c created in step 1., now using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tcexample10.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    example10.c
  5. Display the assembly listing example10.asm created in step 4.:

    Type example10.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    	TITLE	C:\Users\Stefan\Desktop\example10.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_htonl
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_htonl
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _htonl	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example10.c
    ; Line 9
    	mov	eax, DWORD PTR _ul$[esp-4]
    ; Line 10
    	bswap	eax
    ; Line 11
    	ret	0
    _htonl	ENDP
    _TEXT	ENDS
    
    PUBLIC	_main
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _array$ = -16						; size = 16
    $T1040 = 8						; size = 4
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; Line 15
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 16					; 00000010H
    ; Line 16
    	mov	DWORD PTR _array$[ebp], 1297303124	; 4d534654H
    	mov	DWORD PTR _array$[ebp+4], 1297307203	; 4d535643H
    	mov	DWORD PTR _array$[ebp+8], 1347374930	; 504f4f52H
    	mov	DWORD PTR _array$[ebp+12], 1129268293	; 434f4445H
    ; Line 18
    	mov	eax, DWORD PTR _argc$[ebp]
    	bswap	eax
    ; Line 20
    	xor	edx, edx
    $LL3@main:
    	lea	ecx, DWORD PTR _array$[ebp+edx*4]
    ; Line 21
    	mov	eax, DWORD PTR [ecx]
    	mov	DWORD PTR $T1040[ebp], eax
    	mov	eax, DWORD PTR $T1040[ebp]
    	bswap	eax
    	inc	edx
    	mov	DWORD PTR [ecx], eax
    	cmp	edx, 4
    	jb	SHORT $LL3@main
    ; Line 22
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    Notice the superfluous in(s)ane transfer of the EAX register to and from the intermediate (temporary) variable $T1040 generated for line 21!
    Again notice that the superfluous instructions generated for line 18 don’t use an intermediate (temporary) variable!

Example 11

Completely wrong code generated with __forceinline versus __inline by the Visual C 2017 compiler (and all previous versions too) when specified for a __fastcall function with a body written in inline assembler.

Note: the advice against this combination given in the MSDN article Using and Preserving Registers in Inline Assembly does not apply here: there is no code which might clobber any register!

Demonstration

  1. Create the text file example11.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __forceinline // here be dragons!
    unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
    {
    #ifdef MITIGATE
        return argument & 0x80000000 ? polynomial ^ (argument << 1) : argument << 1;
    #else
        __asm // 32-bit linear feedback shift register
        {
            add ecx, ecx ; ecx = argument << 1
            sbb eax, eax ; eax = CF ? -1 : 0
            and eax, edx ; eax = CF ? polynomial : 0
            xor eax, ecx ; eax = (argument << 1) ^ (CF ? polynomial : 0)
        }
    #endif
    }
    
    int main()
    {
        unsigned lfsr = 123456789;
        unsigned period = 0;
    
        do
        {
            period++;
            lfsr = lfsr32(lfsr, 0xC5);
        } while (lfsr != 123456789);
    
        return period;
    }
    Note: the constant 0xC5 represents the primitive polynomial x32+x7+x6+x2+x0; it gives the 32-bit LFSR its maximum period length of 232−1.
  2. Generate the assembly listing example11.asm from the source file example11.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample11.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example11.c
    example11.c(4): warning C4100: 'polynomial': unreferenced formal parameter
    example11.c(4): warning C4100: 'argument': unreferenced formal parameter
  3. Display the assembly listing example11.asm created in step 2.:

    Type example11.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example11.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\example11.c
    ; Line 11
    	add	ecx, ecx
    ; Line 12
    	sbb	eax, eax
    ; Line 13
    	and	eax, edx
    ; Line 14
    	xor	eax, ecx
    ; Line 15
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example11.c
    ; Line 21
    	push	ebx
    	mov	eax, 123456789				; 075bcd15H
    ; Line 22
    	mov	edx, 197				; 000000c5H
    	xor	ebx, ebx
    	xor	edx, edx
    $LL4@main:
    ; Line 26
    	inc	edx
    	inc	ebx
    ; Line 27
    	mov	ecx, eax
    	add	ecx, ecx
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 28
    	cmp	eax, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	mov	eax, edx
    	mov	eax, ebx
    	pop	ebx
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    The variable lfsr alias argument, held in register ECX (the first argument of functions with __fastcall calling convention), is not initialized with the constant 123456789, register EDX (the second argument of functions with __fastcall calling convention) is never loaded with the constant 0xC5, and the return value from the (inlined) function held in register EAX is not loaded back into register ECX!

Mitigation

After replacing the __forceinline keyword with __inline the Visual C compiler generates correct code, but does not inline the function any more.

Note: replacing the function call avoids this compiler bug of course too, but generates no optimised code!

  1. Generate another assembly listing example11.asm from the source file example11.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro MITIGATE defined on the command line:

    CL.EXE /Bv /c /DMITIGATE /Fa /FoNUL: /Gy /Ox /Tcexample11.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example11.c
  2. Display the assembly listing example11.asm created in step 4.:

    Type example11.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example11.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\example11.c
    ; Line 5
    	push	esi
    ; Line 7
    	lea	esi, DWORD PTR [ecx+ecx]
    	mov	eax, esi
    	xor	eax, edx
    	test	ecx, ecx
    	cmovns	eax, esi
    	pop	esi
    	add	ecx, ecx
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 17
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example11.c
    ; Line 21
    	mov	ecx, 123456789				; 075bcd15H
    ; Line 22
    	xor	eax, eax
    	push	esi
    $LL4@main:
    ; Line 7
    	lea	esi, DWORD PTR [ecx+ecx]
    ; Line 26
    	inc	eax
    ; Line 28
    	mov	edx, esi
    	xor	edx, 197				; 000000c5H
    	test	ecx, ecx
    	cmovns	edx, esi
    	mov	ecx, edx
    IFDEF ALTERNATE
    	lea	edx, DWORD PTR [ecx+ecx]
    	sar	ecx, 31
    	and	ecx, 197				; 000000c5H
    	xor	ecx, edx
    ELSE
    	add	ecx, ecx
    	sbb	edx, edx
    	and	edx, 197				; 000000c5H
    	xor	ecx, edx
    ENDIF
    	cmp	ecx, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	pop	esi
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, the compiler fails to perform an obvious optimisation: instead to explicitly set the sign flag SF with a separate TEST instruction, the carry flag CF set (from the most significant alias sign bit) by a SHL (as well as an ADD) instruction can be used here; this variant also doesn’t need the extraneous register ESI to preserve the value of the shifted (or doubled) variable!

    Note: the assembly listing also shows an alternative variant.

Example 12

This is the reversed case of the second variant from (the previous) example 11.

Demonstration

  1. Create the text file example12.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __forceinline // here be dragons!
    unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
    {
    #ifdef MITIGATE
        return argument & 1 ? polynomial ^ (argument >> 1) : argument >> 1;
    #else
        __asm // 32-bit linear feedback shift register
        {
            shr ecx, 1   ; ecx = argument >> 1
            sbb eax, eax ; eax = CF ? -1 : 0
            and eax, edx ; eax = CF ? polynomial : 0
            xor eax, ecx ; eax = (argument >> 1) ^ (CF ? polynomial : 0)
        }
    #endif
    }
    
    int main()
    {
        unsigned lfsr = 123456789;
        unsigned period = 0;
    
        do
        {
            period++;
            lfsr = lfsr32(lfsr, 0xA3000000);
        } while (lfsr != 123456789);
    
        return period;
    }
    Note: the constant 0xA3000000 represents the same primitive polynomial x32+x30+x26+x25+x0 alias x32+x7+x6+x2+x0 as 0xC5; it’s just the bit-reversed value and gives the 32-bit LFSR its maximum period length of 232−1.
  2. Generate the assembly listing example12.asm from the source file example12.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro MITIGATE defined on the command line:

    CL.EXE /Bv /c /DMITIGATE /Fa /FoNUL: /Gy /Ox /Tcexample12.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example12.c
  3. Display the assembly listing example12.asm created in step 2.:

    Type example12.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example12.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\example12.c
    ; Line 5
    	push	esi
    ; Line 7
    	mov	esi, ecx
    	shr	esi, 1
    	mov	eax, esi
    	xor	eax, edx
    	and	cl, 1
    	cmove	eax, esi
    	pop	esi
    	shr	ecx, 1
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 17
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example12.c
    ; Line 20
    	push	esi
    ; Line 21
    	mov	eax, 123456789				; 075bcd15H
    ; Line 22
    	xor	esi, esi
    	xor	ecx, ecx
    $LL4@main:
    ; Line 7
    	mov	edx, eax
    	mov	ecx, eax
    	shr	edx, 1
    ; Line 26
    	inc	esi
    	inc	ecx
    ; Line 28
    	mov	eax, edx
    	xor	eax, -1560281088			; a3000000H
    ; Line 7
    	and	cl, 1
    ; Line 28
    	cmove	eax, edx
    	and	eax, 1
    	neg	eax
    	and	eax, -1560281088			; a3000000H
    	xor	eax, edx
    	cmp	eax, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	mov	eax, esi
    	pop	esi
    	mov	eax, ecx
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, the compiler fails to perform an obvious optimisation: instead to evaluate both terms of the ternary operator first, then overwrite one of the results with the other conditionally, depending on the least significant bit of the original value and determined with a separate AND instruction, the carry flag CF already set from the least significant by the SHR instruction bit can be used here; this variant also doesn’t need the extraneous register ECX to preserve the original value of the shifted variable!

    Note: the assembly listing shows an alternative, equally optimised variant.

Example 13

These are the 64-bit variants of (the previous) example 12 and of the second variant from example 11.

Demonstration

  1. Create the text file example13.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long right()
    {
        unsigned long long lfsr = 0x0123456789ABCDEF;
        unsigned long long period = 0;
    
        do
        {
            period++;
    #ifdef ALTERNATE
            lfsr = (-((long long) lfsr & 1) & 0xD800000000000000) ^ (lfsr >> 1);
    #else
            lfsr = lfsr & 1 ? 0xD800000000000000 ^ (lfsr >> 1) : lfsr >> 1;
    #endif
        } while (lfsr != 0x0123456789ABCDEF);
    
        return period;
    }
    
    unsigned long long left()
    {
        unsigned long long lfsr = 0x0123456789ABCDEF;
        unsigned long long period = 0;
    
        do
        {
            period++;
    #ifdef ALTERNATE
            lfsr = (-((long long) lfsr < 0) & 0x1B) ^ (lfsr << 1);
    #else
            lfsr = (long long) lfsr < 0 ? 0x1B ^ (lfsr << 1) : lfsr << 1;
    #endif
        } while (lfsr != 0x0123456789ABCDEF);
    
        return period;
    }
    Note: both constants 0xD800000000000000 and 0x1B represent the primitive polynomial x64+x63+x61+x60+x0 alias x64+x4+x3+x1+x0; it gives the 64-bit LFSR its maximum period length of 264−1.
  2. Generate the assembly listing example13.asm from the source file example13.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample13.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example13.c
  3. Display the assembly listing example13.asm created in step 2.:

    Type example13.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	right
    PUBLIC	left
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	right
    _TEXT	SEGMENT
    right	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 5
    	mov	r9, 81985529216486895			; 0123456789abcdefH
    ; Line 6
    	xor	r8d, r8d
    	mov	rax, r9
    	mov	r10, -2882303761517117440		; d800000000000000H
    	npad	6
    $LL4@right:
    ; Line 14
    	mov	rdx, rax
    	movzx	ecx, al
    	shr	rdx, 1
    	inc	r8
    ; Line 16
    	mov	rax, rdx
    	xor	rax, r10
    	and	cl, 1
    	cmove	rax, rdx
    	and	eax, 1
    	neg	rax
    	and	rax, r10
    	xor	rax, edx
    	cmp	rax, r9
    	jne	SHORT $LL4@right
    ; Line 18
    	mov	rax, r8
    ; Line 19
    	ret	0
    right	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	left
    _TEXT	SEGMENT
    left	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 23
    	mov	r9, 81985529216486895			; 0123456789abcdefH
    ; Line 24
    	xor	eax, eax
    	mov	rcx, r9
    	npad	1
    $LL4@left:
    ; Line 34
    	mov	rdx, rcx
    	lea	r8, QWORD PTR [rcx+rcx]
    	inc	rax
    	mov	rcx, r8
    	xor	rcx, 27
    	test	rdx, rdx
    	cmovns	rcx, r9
    	add	rcx, rcx
    	sbb	rdx, rdx
    	and	rdx, 27
    	xor	rcx, rdx
    	cmp	rcx, r9
    	jne	SHORT $LL4@left
    ; Line 37
    	ret	0
    left	ENDP
    _TEXT	ENDS
    END
    While the code generated for the function right() is correct, the compiler fails to perform an obvious optimisation: instead to evaluate both terms of the ternary operator first, then overwrite one of the results with the other conditionally, depending on the least significant bit of the original value and determined with a separate AND instruction, the carry flag CF already set from the least significant bit by the SHR instruction can be used here; this variant also doesn’t need the extraneous register RCX to preserve the original value of the shifted variable!

    Additionally the registers RAX and R8 can be swapped, making the MOV instruction generated for line 14 superfluous.

    While the code generated for the function left() is correct too, the compiler likewise fails to perform an even more obvious optimisation: instead to explicitly set the sign flag SF with a separate TEST instruction, the carry flag CF set (from the most significant alias sign bit) by a SHL (as well as an ADD) instruction can be used here; this variant also doesn’t need the extraneous register R8 to preserve the value of the shifted (or doubled) variable!

  4. Generate another assembly listing example13.asm from the source file example13.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample13.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example13.c
  5. Display the assembly listing example13.asm created in step 4.:

    Type example13.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example13.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_right
    PUBLIC	_left
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_right
    _TEXT	SEGMENT
    _period$ = -8						; size = 8
    _right	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 4
    	sub	esp, 8
    	push	esi
    	xorps	xmm0, xmm0
    ; Line 5
    	mov	ecx, -1985229329			; 89abcdefH
    ; Line 6
    	movlpd	QWORD PTR _period$[esp+12], xmm0
    	mov	eax, 19088743				; 01234567H
    	mov	esi, DWORD PTR _period$[esp+16]
    	xor	esi, esi
    	push	edi
    	mov	edi, DWORD PTR _period$[esp+16]
    	xor	edi, edi
    $LL4@right:
    ; Line 10
    	add	edi, 1
    ; Line 14
    	mov	edx, ecx
    	adc	esi, 0
    	and	ecx, 1
    	shrd	edx, eax, 1
    	shrd	ecx, eax, 1
    	shr	eax, 1
    	or	ecx, 0
    	mov	ecx, edx
    	je	SHORT $LN7@right
    	xor	ecx, 0
    	xor	eax, -671088640				; d8000000H
    $LN7@right:
    	and	edx, 1
    	neg	edx
    	and	edx, -671088640				; d8000000H
    	xor	eax, edx
    ; Line 16
    	cmp	ecx, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@right
    	cmp	eax, 19088743				; 01234567H
    	jne	SHORT $LL4@right
    ; Line 18
    	mov	eax, edi
    	mov	edx, esi
    	pop	edi
    	pop	esi
    ; Line 19
    	add	esp, 8
    	ret	0
    _right	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_left
    _TEXT	SEGMENT
    _period$ = -8						; size = 8
    _left	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 22
    	sub	esp, 8
    	push	ebx
    	xorps	xmm0, xmm0
    ; Line 23
    	mov	eax, -1985229329			; 89abcdefH
    	push	esi
    ; Line 24
    	movlpd	QWORD PTR _period$[esp+16], xmm0
    	mov	ecx, 19088743				; 01234567H
    	mov	ebx, DWORD PTR _period$[esp+16]
    	xor	ebx, ebx
    	push	edi
    	mov	edi, DWORD PTR _period$[esp+24]
    	xor	edi, edi
    $LL4@left:
    ; Line 28
    	add	ebx, 1
    	mov	edx, eax
    	mov	esi, ecx
    	adc	edi, 0
    	shld	esi, edx, 1
    	add	edx, edx
    	add	eax, eax
    	adc	ecx, ecx
    ; Line 32
    	test	ecx, ecx
    	jg	SHORT $LN6@left
    	jl	SHORT $LN11@left
    	test	eax, eax
    	jae	SHORT $LN6@left
    $LN11@left:
    	mov	eax, edx
    	mov	ecx, esi
    	xor	eax, 27					; 0000001bH
    	xor	ecx, 0
    	jmp	SHORT $LN7@left
    $LN6@left:
    	mov	eax, edx
    	mov	ecx, esi
    $LN7@left:
    	sbb	edx, edx
    	and	edx, 27					; 0000001bH
    	xor	eax, edx
    ; Line 34
    	cmp	eax, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@left
    	cmp	ecx, 19088743				; 01234567H
    	jne	SHORT $LL4@left
    ; Line 36
    	mov	edx, edi
    	mov	eax, ebx
    	pop	edi
    	pop	esi
    	pop	ebx
    ; Line 37
    	add	esp, 8
    	ret	0
    _left	ENDP
    _TEXT	ENDS
    END
    The code generated for the function right() is totally screwed up: the variable period is allocated on the stack, zeroed using the SSE register XMM0, then loaded into the registers ESI and EDI, but never used again; instead to hold the variable period in the register pair EDX:EAX used for the return value, it is held in the registers EDI and ESI, which have to be transferred into EDX:EAX upon exit; register ECX, which holds the lower half of the variable lfsr, is clobbered inside the loop without necessity and has to be reloaded; the result of the AND instruction set in the EFLAGS register is ignored, and evaluated again with an extraneous OR instruction; the XOR instruction with immediate operand 0 has no effect and is superfluous too!

    The code generated for the function left() is even worse: again the variable period is allocated on the stack, zeroed using the SSE register XMM0, then loaded into the registers ESI and EDI, but never used again; instead to hold the variable period in the register pair EDX:EAX used for the return value, it is held in the registers EBX and EDI, which have to be transferred into EDX:EAX upon exit; instead to use the carry flag CF already set by the SHLD instruction, or the sign flag SF set by the first test instruction, a full comparision against 0 is performed, involving three conditional branch instructions; the registers EAX and ECX, which hold the variable lfsr, are copied without necessity into the registers EDX and ESI, which are then used for the shift and exclusive-or operation; the XOR instruction with immediate operand 0 has no effect and is superfluous!

  6. Generate another assembly listing example13.asm from the source file example13.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tcexample13.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example13.c
  7. Display the assembly listing example13.asm created in step 6.:

    Type example13.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example13.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_right
    PUBLIC	_left
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_right
    _TEXT	SEGMENT
    _period$ = -8						; size = 8
    _right	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 4
    	sub	esp, 8
    	push	ebx
    	xorps	xmm0, xmm0
    ; Line 5
    	mov	edx, -1985229329			; 89abcdefH
    	push	esi
    ; Line 6
    	movlpd	QWORD PTR _period$[esp+16], xmm0
    	mov	eax, 19088743				; 01234567H
    	mov	ebx, DWORD PTR _period$[esp+16]
    	push	edi
    	mov	edi, DWORD PTR _period$[esp+24]
    $LL4@right:
    ; Line 10
    	add	ebx, 1
    ; Line 12
    	mov	ecx, edx
    	adc	edi, 0
    	xor	esi, esi
    	and	ecx, 1
    	neg	ecx
    	adc	esi, esi
    	xor	ecx, ecx
    	shrd	edx, eax, 1
    	neg	esi
    	and	esi, -671088640				; d8000000H
    	shr	eax, 1
    	xor	edx, ecx
    	xor	eax, esi
    ; Line 16
    	cmp	edx, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@right
    	cmp	eax, 19088743				; 01234567H
    	jne	SHORT $LL4@right
    ; Line 18
    	mov	edx, edi
    	mov	eax, ebx
    	pop	edi
    	pop	esi
    	pop	ebx
    ; Line 19
    	add	esp, 8
    	ret	0
    _right	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_left
    _TEXT	SEGMENT
    $T1 = -8						; size = 8
    _period$ = -8						; size = 8
    _left	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 22
    	sub	esp, 8
    	push	ebx
    	xorps	xmm0, xmm0
    ; Line 23
    	mov	ecx, -1985229329			; 89abcdefH
    	push	esi
    ; Line 24
    	movlpd	QWORD PTR _period$[esp+16], xmm0
    	mov	eax, 19088743				; 01234567H
    	mov	edx, DWORD PTR _period$[esp+20]
    	mov	esi, DWORD PTR _period$[esp+16]
    	push	edi
    $LL4@left:
    ; Line 28
    	add	esi, 1
    	adc	edx, 0
    ; Line 30
    	test	eax, eax
    	jg	SHORT $LN6@left
    	jl	SHORT $LN11@left
    	test	ecx, ecx
    	jae	SHORT $LN6@left
    $LN11@left:
    	mov	edi, 27					; 0000001bH
    	xor	ebx, ebx
    	jmp	SHORT $LN7@left
    $LN6@left:
    	xorps	xmm0, xmm0
    	movlpd	QWORD PTR $T1[esp+20], xmm0
    	mov	ebx, DWORD PTR $T1[esp+24]
    	mov	edi, DWORD PTR $T1[esp+20]
    $LN7@left:
    	shld	eax, ecx, 1
    	add	ecx, ecx
    	xor	eax, ebx
    	xor	ecx, edi
    ; Line 34
    	cmp	ecx, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@left
    	cmp	eax, 19088743				; 01234567H
    	jne	SHORT $LL4@left
    ; Line 36
    	pop	edi
    	mov	eax, esi
    	pop	esi
    	pop	ebx
    ; Line 37
    	add	esp, 8
    	ret	0
    _left	ENDP
    _TEXT	ENDS
    END
    The code generated is as bad as in step 4. before!

Example 14

This is a variation of the previous examples 11 and 13, supposed to prod and tickle the optimiser.

Demonstration

  1. Create the text file example14.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __forceinline
    long lfsr32(long argument, long polynomial)
    {
        return (((long long) argument >> 32) & polynomial) ^ (argument << 1);
    }
    
    int main()
    {
        unsigned lfsr = 123456789;
        unsigned period = 0;
    
        do
        {
            period++;
            lfsr = lfsr32(lfsr, 0xC5);
        } while (lfsr != 123456789);
    
        return period;
    }
    Note: the constant 0xC5 represents the primitive polynomial x32+x7+x6+x2+x0; it gives the 32-bit LFSR its maximum period length of 232−1.
  2. Generate the assembly listing example14.asm from the source file example14.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample14.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example14.c
  3. Display the assembly listing example14.asm created in step 2.:

    Type example14.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example14.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lfsr32
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lfsr32
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _polynomial$ = 12					; size = 4
    _lfsr32 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example14.c
    ; Line 5
    	push	esi
    ; Line 6
    	mov	esi, DWORD PTR _argument$[esp]
    	mov	eax, esi
    	mov	eax, DWORD PTR _argument$[esp-4]
    	cdq
    	mov	ecx, edx
    	and	edx, DWORD PTR _polynomial$[esp-4]
    	sar	ecx, 31					; 0000001fH
    	lea	eax, DWORD PTR [esi+esi]
    	add	eax, eax
    	xor	eax, edx
    	pop	esi
    ; Line 7
    	ret	0
    _lfsr32 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example14.c
    ; Line 10
    	push	esi
    	push	edi
    ; Line 11
    	mov	ecx, 123456789				; 075bcd15H
    	mov	eax, 123456789				; 075bcd15H
    ; Line 12
    	xor	edi, edi
    	xor	ecx, ecx
    	npad	7
    $LL4@main:
    ; Line 6
    	mov	eax, ecx
    ; Line 16
    	inc	edi
    	inc	ecx
    ; Line 6
    	cdq
    	add	eax, eax
    	add	ecx, ecx
    	mov	esi, edx
    	and	edx, 197				; 000000c5H
    	xor	eax, edx
    	xor	ecx, edx
    	sar	esi, 31					; 0000001fH
    ; Line 18
    	cmp	ecx, 123456789				; 075bcd15H
    	cmp	eax, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 20
    	mov	eax, ecx
    	mov	eax, edi
    	pop	edi
    	pop	esi
    ; Line 21
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    The registers EDI and ESI are used and clobbered without necessity and reason.
    Especially notice the superfluous MOV and SAR instructions generated for line 6: their result is never used!

Example 15

Demonstration

  1. Create the text file example15.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long lcg64() // linear congruential generator
    {
        static unsigned long long z = 1066149217761810ULL;
    
        z = z * 6906969069ULL + 1234567ULL;
    
        return z;
    }
    Note: both constants are from George Marsaglia’s KISS64 pseudo-random number generator; they give the 64-bit LCG its maximum period length of 264.
  2. Generate the assembly listing example15.asm from the source file example15.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample15.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example15.c
  3. Display the assembly listing example15.asm created in step 2.:

    Type example15.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example15.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lcg64
    EXTRN	__allmul:PROC
    _DATA	SEGMENT
    ?z@?1??lcg64@@9@9 DQ 0003c9a83566fa12H			; `lcg64'::`2'::z
    _DATA	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lcg64
    _TEXT	SEGMENT
    _lcg64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example15.c
    ; Line 7
    	push	1
    	push	-1682965523				; 9baffbedH
    	push	DWORD PTR ?z@?1??lcg64@@9@9+4
    	push	DWORD PTR ?z@?1??lcg64@@9@9
    	call	__allmul
    	mov	ecx, -1682965523			; 9baffbedH
    	mov	eax, DWORD PTR ?z@?1??lcg64@@9@9
    	mul	ecx
    	add	edx, DWORD PTR ?z@?1??lcg64@@9@9
    	imul	ecx, DWORD PTR ?z@?1??lcg64@@9@9+4
    	add	eax, 1234567				; 0012d687H
    	mov	DWORD PTR ?z@?1??lcg64@@9@9, eax
    	adc	edx, 0
    	adc	edx, ecx
    	mov	DWORD PTR ?z@?1??lcg64@@9@9+4, edx
    ; Line 10
    	ret	0
    _lcg64	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, the compiler fails to perform an obvious optimisation: the constant 6906969069 is equal to 232+2612001773 (the hexadecimal notation 0x19BAFFBED shows this immediately); multiplication with 232 can be replaced by a simple addition.

    Note: an optimising compiler should clearly not emit 5 instructions for the call of an external routine to multiply 64-bit values, but emit the 6 instructions which perform this operation inline!

Example 16

Demonstration

  1. Create the text file example16.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long msws32(void) // enhanced middle-square generator
    {
        static unsigned long v = 0UL;
        static unsigned long w = 0UL;
    
        w += 0x9E3779B9UL;
        v = (unsigned long) __ull_rshift(__emulu(v, v), 16);
        v += w;
        v = _byteswap_ulong(v);
    
        return v;
    }
    
    unsigned long mswsbw(void) // enhanced middle-square generator
    {
        static unsigned long long v = 0ULL;
        static unsigned long long w = 0ULL;
    
        w += 0x9E3779B97F4A7C15ULL;
        v *= v;
        v += w;
        v = (v << 32) | (v >> 32);
    
        return (unsigned long) v;
    }
    
    unsigned long long msws64(void) // enhanced middle-square generator
    {
        static unsigned long long v = 0ULL;
        static unsigned long long w = 0ULL;
    #ifdef _WIN64
        const unsigned long long x;
        const unsigned long long y = _umul128(v, v, &x);
    
        v = __shiftright128(y, x, 32);
    #else
        v = (__emulu((unsigned long) v, (unsigned long) v) >> 32)
          + (__emulu((unsigned long) v, (unsigned long) (v >> 32)) << 1)
          + (__emulu((unsigned long) (v >> 32), (unsigned long) (v >> 32)) << 32);
    #endif
        w += 0x9E3779B97F4A7C15ULL;
        v += w;
        v = _byteswap_uint64(v);
    
        return v;
    }
    
    int main(void)
    {
    #ifdef _WIN64
        volatile unsigned long long ull = msws64();
    #else
        volatile unsigned long ul = msws32();
    #endif
    }
    Note: the constants 0x9E3779B9 and 0x9E3779B97F4A7C15 are the fractional part of the golden ratio Φ=(√5+1)⁄2, which is equal to the inverse or reciprocal value φ=1⁄Φ=Φ−1=(√5−1)⁄2=0.6180339887498948482…, multiplied by 232 and 264 respectively – or just 232⁄Φ and 264⁄Φ.
  2. Generate the assembly listing example16.asm from the source file example16.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample16.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example16.c
    example16.c(55): warning C4189: 'ul': local variable is initialized but not referenced
  3. Display the assembly listing example16.asm created in step 2.:

    Type example16.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example16.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_msws32
    PUBLIC	_mswsbw
    PUBLIC	_msws64
    PUBLIC	_main
    EXTRN	__allmul:PROC
    _BSS	SEGMENT
    ?v@?1??msws32@@9@9 DQ 01H DUP (?)			; `msws32'::`2'::v
    ?w@?1??msws32@@9@9 DQ 01H DUP (?)			; `msws32'::`2'::w
    ?v@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::v
    ?w@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::w
    ?v@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::v
    ?w@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::w
    _BSS	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_msws32
    _TEXT	SEGMENT
    _msws32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 9
    	mov	eax, DWORD PTR ?v@?1??msws32@@9@9
    	mul	eax
    	push	esi
    	mov	esi, DWORD PTR ?w@?1??msws32@@9@9
    	mov	edx, DWORD PTR ?w@?1??msws32@@9@9
    	shrd	eax, edx, 16
    	sub	esi, 1640531527				; 61c88647H
    	add	edx, -1640531527			; 9e3779b9H
    	add	eax, esi
    	add	eax, edx
    	mov	DWORD PTR ?w@?1??msws32@@9@9, esi
    	mov	DWORD PTR ?w@?1??msws32@@9@9, edx
    ; Line 11
    	shr	edx, 16					; 00000010H
    	mov	DWORD PTR ?v@?1??msws32@@9@9, eax
    ; Line 13
    	pop	esi
    ; Line 14
    	ret	0
    _msws32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mswsbw
    _TEXT	SEGMENT
    _mswsbw	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 21
    	mov	ecx, DWORD PTR ?w@?1??mswsbw@@9@9
    	add	ecx, 2135587861				; 7f4a7c15H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9, ecx
    	mov	ecx, DWORD PTR ?w@?1??mswsbw@@9@9+4
    	adc	ecx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9+4, ecx
    ; Line 22
    	mov	ecx, DWORD PTR ?v@?1??mswsbw@@9@9
    	mov	eax, ecx
    	mul	eax
    	imul	ecx, DWORD PTR ?v@?1??mswsbw@@9@9+4
    	add	ecx, ecx
    	add	edx, ecx
    ; Line 23
    	add	eax, DWORD PTR ?w@?1??mswsbw@@9@9
    	adc	edx, DWORD PTR ?w@?1??mswsbw@@9@9+4
    ; Line 22
    	mov	ecx, DWORD PTR ?v@?1??mswsbw@@9@9+4
    	mov	eax, DWORD PTR ?v@?1??mswsbw@@9@9
    	push	esi
    	mov	esi, DWORD PTR ?w@?1??mswsbw@@9@9+4
    	push	edi
    	mov	edi, DWORD PTR ?w@?1??mswsbw@@9@9
    	push	ecx
    	push	eax
    	add	edi, 2135587861				; 7f4a7c15H
    	push	ecx
    	adc	esi, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9, edi
    	push	eax
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9+4, esi
    	call	__allmul
    	add	eax, edi
    ; Line 26
    	pop	edi
    	adc	edx, esi
    	xor	ecx, ecx
    	or	ecx, eax
    	mov	DWORD PTR ?v@?1??mswsbw@@9@9, edx
    	mov	DWORD PTR ?v@?1??mswsbw@@9@9+4, ecx
    	mov	dword PTR ?v@?1??mswsbw@@9@9+4, eax
    	mov	eax, edx
    	pop	esi
    ; Line 27
    	ret	0
    _mswsbw	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_msws64
    _TEXT	SEGMENT
    _msws64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 43
    	add	DWORD PTR ?w@?1??msws64@@9@9, 2135587861 ; 7f4a7c15H
    	mov	ecx, DWORD PTR ?v@?1??msws64@@9@9+4
    	mov	eax, ecx
    	push	ebx
    	mov	ebx, DWORD PTR ?w@?1??msws64@@9@9+4
    	adc	ebx, -1640531527			; 9e3779b9H
    	push	ebp
    	mul	ecx
    	push	esi
    	mov	esi, DWORD PTR ?v@?1??msws64@@9@9
    	mov	ebp, eax
    	push	edi
    	mov	edi, edx
    	mov	DWORD PTR ?w@?1??msws64@@9@9+4, ebx
    	shld	edi, ebp, 31
    	mov	eax, esi
    	mul	ecx
    	shl	ebp, 31					; 0000001fH
    ; Line 44
    	add	ebp, eax
    	mov	eax, esi
    	adc	edi, ebx
    	shld	edi, ebp, 1
    	mul	esi
    	add	ebp, ebp
    	add	ebp, edx
    	adc	edi, 0
    	add	ebp, DWORD PTR ?w@?1??msws64@@9@9
    ; Line 45
    	bswap	ebp
    	adc	edi, ebx
    	mov	DWORD PTR ?v@?1??msws64@@9@9+4, ebp
    	bswap	edi
    	mov	DWORD PTR ?v@?1??msws64@@9@9, edi
    ; Line 47
    	mov	eax, edi
    	pop	edi
    	pop	esi
    	mov	edx, ebp
    	pop	ebp
    	pop	ebx
    ; Line 39
    	push	ebx
    	mov	eax, DWORD PTR ?v@?1??msws64@@9@9
    	mov	ebx, eax
    	mul	eax
    	mov	ecx, edx
    	mov	eax, ebx
    	mov	ebx, DWORD PTR ?v@?1??msws64@@9@9+4
    	mul	ebx
    	imul	ebx, ebx
    	add	eax, eax
    	adc	edx, edx
    	add	eax, ecx
    	adc	edx, ebx
    ; Line 42
    	mov	ecx, DWORD PTR ?w@?1??msws64@@9@9
    	mov	ebx, DWORD PTR ?w@?1??msws64@@9@9+4
    	add	ecx, 2135587861				; 7f4a7c15H
    	adc	ebx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??msws64@@9@9, ecx
    	mov	DWORD PTR ?w@?1??msws64@@9@9+4, ebx
    ; Line 44
    	add	eax, ecx
    	adc	edx, ebx
    ; Line 45
    	bswap	eax
    	bswap	edx
    	xchg	eax, edx
    	mov	DWORD PTR ?v@?1??msws64@@9@9, eax
    	mov	DWORD PTR ?v@?1??msws64@@9@9+4, edx
    	pop	ebx
    ; Line 47
    	ret	0
    _msws64	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _ul$ = -4						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 51
    	push	ecx
    ; Line 9
    	mov	eax, DWORD PTR ?v@?1??msws32@@9@9
    	mul	eax
    ; Line 51
    	push	esi
    ; Line 8
    	mov	esi, DWORD PTR ?w@?1??msws32@@9@9
    	mov	ecx, DWORD PTR ?w@?1??msws32@@9@9
    ; Line 9
    	shrd	eax, edx, 16
    	sub	esi, 1640531527				; 61c88647H
    	sub	ecx, 1640531527				; 61c88647H
    	add	eax, esi
    	add	eax, ecx
    	mov	DWORD PTR ?w@?1??msws32@@9@9, esi
    	mov	DWORD PTR ?w@?1??msws32@@9@9, ecx
    ; Line 11
    	bswap	eax
    	mov	DWORD PTR ?v@?1??msws32@@9@9, eax
    	shr	edx, 16					; 00000010H
    ; Line 55
    	mov	DWORD PTR _ul$[esp+8], eax
    	mov	DWORD PTR _ul$[esp+4], eax
    ; Line 57
    	xor	eax, eax
    	pop	esi
    	pop	ecx
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    While the code generated for the function msws32() is correct, there is no reason to use and clobber register ESI instead of the volatile register ECX!
    Also notice the superfluous SHR instruction: its result is never used.

    While the code generated for the function mswsbw() is correct, an optimising compiler should not emit 7 instructions to call an external routine for squaring a 64-bit value, but emit the 6 instructions which perform this operation inline!
    Also notice the superfluous XOR and OR instructions generated for line 26.

    While the code generated for the function msws64() is correct too, it has 39 instructions and clobbers all registers, but still performs multiple avoidable transfers between them; the optimal code has only 28 instructions and clobbers just 1 register!
    Especially notice the weird way to move the contents of register EBP into register EDI in lines 43 and 44, using two SHLD plus a SHL instruction.

    While the code generated for the function main() is correct, there is no reason to use and clobber register ESI instead of the volatile register ECX!

  4. Generate another assembly listing example16.asm from the source file example16.c created in step 1., now using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample16.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example16.c
    example16.c(34): warning C4132: 'x': const object should be initialized
    example16.c(53): warning C4189: 'ull': local variable is initialized but not referenced
  5. Display the assembly listing example16.asm created in step 4.:

    Type example16.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	msws32
    PUBLIC	mswsbw
    PUBLIC	msws64
    PUBLIC	main
    _BSS	SEGMENT
    ?v@?1??msws32@@9@9 DD 01H DUP (?)			; `msws32'::`2'::v
    ?w@?1??msws32@@9@9 DD 01H DUP (?)			; `msws32'::`2'::w
    ?v@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::v
    ?w@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::w
    ?v@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::v
    ?w@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::w
    _BSS	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	msws32
    _TEXT	SEGMENT
    msws32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 9
    	mov	eax, DWORD PTR ?v@?1??msws32@@9@9
    	mov	r8d, DWORD PTR ?w@?1??msws32@@9@9
    	mul	rax
    	add	r8d, -1640531527			; 9e3779b9H
    	shr	rax, 16
    	add	eax, r8d
    	mov	DWORD PTR ?w@?1??msws32@@9@9, r8d
    ; Line 11
    	bswap	eax
    	mov	DWORD PTR ?v@?1??msws32@@9@9, eax
    ; Line 14
    	ret	0
    msws32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	mswsbw
    _TEXT	SEGMENT
    mswsbw	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 21
    	mov	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    	mov	rax, 7046029254386353131		; 61c8864680b583ebH
    	sub	rcx, rax
    	mov	rcx, -7046029254386353131		; 9e3779b97f4a7c15H
    	add	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    ; Line 22
    	mov	rax, QWORD PTR ?v@?1??mswsbw@@9@9
    	imul	rax, rax
    	mov	QWORD PTR ?w@?1??mswsbw@@9@9, rcx
    	add	rax, rcx
    ; Line 24
    	rol	rax, 32					; 00000020H
    	mov	QWORD PTR ?v@?1??mswsbw@@9@9, rax
    ; Line 27
    	ret	0
    mswsbw	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	main
    _TEXT	SEGMENT
    x$1 = 8
    ull$ = 8
    main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 35
    	mov	rax, QWORD PTR ?v@?1??msws64@@9@9
    ; Line 43
    	mov	r8, 7046029254386353131			; 61c8864680b583ebH
    	mov	rcx, QWORD PTR ?w@?1??msws64@@9@9
    	mov	rcx, -7046029254386353131		; 9e3779b97f4a7c15H
    	mul	rax
    	sub	rcx, r8
    	add	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    	shrd	rax, rdx, 32				; 00000020H
    	mov	QWORD PTR x$1[rsp], rdx
    ; Line 44
    	add	rax, rcx
    	mov	QWORD PTR ?w@?1??msws64@@9@9, rcx
    ; Line 45
    	bswap	rax
    	mov	QWORD PTR ?v@?1??msws64@@9@9, rax
    ; Line 53
    	mov	QWORD PTR ull$[rsp], rax
    ; Line 57
    	xor	eax, eax
    	ret	0
    main	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	msws64
    _TEXT	SEGMENT
    msws64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 35
    	mov	rax, QWORD PTR ?v@?1??msws64@@9@9
    ; Line 43
    	mov	r8, 7046029254386353131			; 61c8864680b583ebH
    	mov	rcx, QWORD PTR ?w@?1??msws64@@9@9
    	mov	rcx, -7046029254386353131		; 9e3779b97f4a7c15H
    	mul	rax
    	sub	rcx, r8
    	add	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    	shrd	rax, rdx, 32				; 00000020H
    	mov	QWORD PTR ?w@?1??msws64@@9@9, rcx
    ; Line 44
    	add	rax, rcx
    ; Line 45
    	bswap	rax
    	mov	QWORD PTR ?v@?1??msws64@@9@9, rax
    ; Line 48
    	ret	0
    msws64	ENDP
    _TEXT	ENDS
    END
    While the code generated for the function msws64() is correct, there is no reason to clobber register R8.
    Also notice the superfluous MOV instruction to the superfluous temporary variable x$1 in the function main().

Example 17

Demonstration

  1. Create the text file example17.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long sequence(void)
    {
        static unsigned long long weyl = 0ULL;
    
        weyl += 0x9E3779B97F4A7C15ULL;
    
        return weyl ^ (weyl >> 31);
    }
  2. Generate the assembly listing example17.asm from the source file example17.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample17.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example17.c
  3. Display the assembly listing example17.asm created in step 4.:

    Type example17.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example17.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_sequence
    _BSS	SEGMENT
    ?weyl@?1??sequence@@9@9 DQ 01H DUP (?)			; `sequence'::`2'::weyl
    _BSS	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_sequence
    _TEXT	SEGMENT
    _sequence PROC
    ; File c:\users\stefan\desktop\example17.c
    ; Line 7
    	mov	ecx, DWORD PTR ?weyl@?1??sequence@@9@9+4
    	push	esi
    	mov	esi, DWORD PTR ?weyl@?1??sequence@@9@9
    	add	esi, 2135587861				; 7f4a7c15H
    	mov	eax, esi
    	mov	DWORD PTR ?weyl@?1??sequence@@9@9, esi
    	adc	ecx, -1640531527			; 9e3779b9H
    	mov	edx, ecx
    	mov	DWORD PTR ?weyl@?1??sequence@@9@9+4, ecx
    	mov	eax, DWORD PTR ?weyl@?1??sequence@@9@9
    	mov	edx, DWORD PTR ?weyl@?1??sequence@@9@9+4
    	add	eax, 2135587861				; 7f4a7c15H
    	adc	edx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?weyl@?1??sequence@@9@9, eax
    	mov	DWORD PTR ?weyl@?1??sequence@@9@9+4, edx
    	mov	ecx, eax
    	shrd	eax, edx, 31
    	xor	eax, ecx
    	mov	ecx, edx
    	shr	edx, 31					; 0000001fH
    ; Line 9
    	xor	eax, esi
    	xor	edx, ecx
    	pop	esi
    ; Line 10
    	ret	0
    _sequence ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, it clobbers register ESI without necessity.

Example 18

Demonstration

  1. Create the text file example18.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #ifdef _WIN64
    unsigned long long nearlydivisionless(unsigned long long range,
                                          unsigned long long (*random64)(void))
    {
        unsigned long long value = random64();
        unsigned long long limit;
        unsigned long long high;
        unsigned long long low = _umul128(value, range, &high);
    
        if (low < range)
            for (limit = (0 - range) % range;
                 low < limit;
                 low = _umul128(value, range, &high))
                value = random64();
    
        return high;
    }
    #else
    unsigned long nearlydivisionless(unsigned long range,
                                     unsigned long (*random32)(void))
    {
        unsigned long      value = random32();
        unsigned long      limit;
        unsigned long long multi = __emulu(value, range);
    
        if (range > (unsigned long) multi)
            for (limit = (0 - range) % range;
                 limit > (unsigned long) multi;
                 multi = __emulu(value, range))
                value = random32();
    
        return multi >> 32;
    }
    #endif
    Note: the function nearlydivisionless() returns a uniform distributed (pseudo-random) value in the interval [0, range); for the discussion of the algorithm see Daniel Lemire’s blog post Fast Bounded Random Numbers on GPUs.
  2. Generate the assembly listing example18.asm from the source file example18.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample18.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example18.c
  3. Display the assembly listing example18.asm created in step 2.:

    Type example18.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	nearlydivisionless
    …
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	nearlydivisionless
    _TEXT	SEGMENT
    range$ = 48
    random64$ = 56
    nearlydivisionless PROC					; COMDAT
    ; File c:\users\stefan\desktop\example18.c
    ; Line 6
    $LN15:
    	mov	QWORD PTR [rsp+16], rbx
    	push	rsi
    	sub	rsp, 32					; 00000020H
    	sub	rsp, 40					; 00000028H
    	mov	rsi, rdx
    	mov	r11, rdx
    	mov	rbx, rcx
    	mov	r10, rcx
    ; Line 7
    	call	rsi
    	call	r11
    	mov	r8, rax
    ; Line 10
    	mov	rax, rbx
    	mov	rax, r10
    	mul	r8
    	mov	rcx, rdx
    	mov	r8, rax
    ; Line 12
    	cmp	rax, rbx
    	cmp	rax, r10
    	jae	SHORT $LN12@nearlydivi
    ; Line 13
    	xor	edx, edx
    	mov	QWORD PTR [rsp+48], rdi
    	mov	rax, rbx
    	mov	rax, r10
    	neg	rax
    	div	rbx
    	div	r10
    	mov	rdi, rdx
    	mov	r9, rdx
    ; Line 14
    	cmp	r8, rdx
    	jae	SHORT $LN11@nearlydivi
    	npad	2
    $LL4@nearlydivi:
    ; Line 16
    	call	rsi
    	call	r11
    	mov	rcx, rax
    	mov	rax, rbx
    	mov	rax, r10
    	mul	rcx
    	cmp	rax, rdi
    	cmp	rax, r9
    	jb	SHORT $LL4@nearlydivi
    ; Line 18
    	mov	rdi, QWORD PTR [rsp+48]
    	mov	rax, rdx
    ; Line 19
    	mov	rbx, QWORD PTR [rsp+56]
    	add	rsp, 40					; 00000028H
    	add	rsp, 32					; 00000020H
    	pop	rsi
    	ret	0
    $LN11@nearlydivi:
    	mov	rdi, QWORD PTR [rsp+48]
    ; Line 18
    	mov	rax, rcx
    ; Line 19
    	mov	rbx, QWORD PTR [rsp+56]
    	add	rsp, 32					; 00000020H
    	pop	rsi
    	ret	0
    $LN12@nearlydivi:
    	mov	rbx, QWORD PTR [rsp+56]
    	mov	rax, rcx
    	add	rsp, 40					; 00000028H
    	add	rsp, 32					; 00000020H
    	pop	rsi
    	ret	0
    nearlydivisionless ENDP
    _TEXT	ENDS
    END
    Instead to use the volatile registers R9, R10 and R11, the generated code clobbers the registers RBX, RDI and RSI without necessity, and uses 11 (in words: eleven) superfluous instructions to save and restore them.
    Additionally notice the 5 instructions emitted for lines 18 and 19 before the label $LN12@nearlydivi:, and the same 5 instructions emitted again immediately after that label: 14 (in words: fourteen) from a total of 45 instructions are superfluous!
  4. Generate another assembly listing example18.asm from the source file example18.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample18.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example18.c
  5. Display the assembly listing example18.asm created in step 4.:

    Type example18.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example18.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_nearlydivisionless
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_nearlydivisionless
    _TEXT	SEGMENT
    _range$ = 8						; size = 4
    _random32$ = 12						; size = 4
    _nearlydivisionless PROC				; COMDAT
    ; File c:\users\stefan\desktop\example18.c
    ; Line 23
    	push	ebx
    ; Line 24
    	mov	ebx, DWORD PTR _random32$[esp]
    	push	ebp
    	push	esi
    	call	ebx
    ; Line 26
    	mov	esi, DWORD PTR _range$[esp+8]
    	mul	esi
    	mov	ebp, eax
    	mov	ecx, edx
    ; Line 28
    	cmp	esi, ebp
    	jbe	SHORT $LN12@nearlydivi
    ; Line 29
    	mov	eax, esi
    	xor	edx, edx
    	neg	eax
    	div	esi
    	push	edi
    	mov	edi, edx
    ; Line 30
    	cmp	edi, ebp
    	jbe	SHORT $LN11@nearlydivi
    $LL4@nearlydivi:
    ; Line 32
    	call	ebx
    	mul	esi
    	cmp	edi, eax
    	ja	SHORT $LL4@nearlydivi
    ; Line 35
    	pop	edi
    	pop	esi
    	pop	ebp
    	mov	eax, edx
    	pop	ebx
    	ret	0
    $LN11@nearlydivi:
    	pop	edi
    	pop	esi
    	pop	ebp
    	mov	eax, ecx
    	pop	ebx
    	ret	0
    $LN12@nearlydivi:
    	pop	esi
    	pop	ebp
    	mov	eax, ecx
    	pop	ebx
    	ret	0
    _nearlydivisionless ENDP
    _TEXT	ENDS
    END

Example 19

Demonstration

  1. Create the text file example19.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long ullmul(unsigned long long p, unsigned long long q)
    {
    #ifdef OPTIMIZE
        if (((unsigned long) p | (unsigned long) q) == 0)
            return 0;
    
        if (((unsigned long) (p >> 32) | (unsigned long) (q >> 32)) == 0)
            return __emulu((unsigned long) p, (unsigned long) q);
    #endif
        return __emulu((unsigned long) p, (unsigned long) q)
             + ((unsigned long long) ((unsigned long) p * (unsigned long) (q >> 32)) << 32)
             + ((unsigned long long) ((unsigned long) q * (unsigned long) (p >> 32)) << 32);
    }
    
    long long llmul(long long p, long long q)
    {
    #ifdef OPTIMIZE
        if (((unsigned long) (p >> 32) | (unsigned long) (q >> 32)) == 0)
            return __emulu((unsigned long) p, (unsigned long) q);
    
        if (((unsigned long) p | (unsigned long) q) == 0)
            return 0;
    #endif
        return __emulu((unsigned long) p, (unsigned long) q)
             + ((unsigned long long) ((unsigned long) p * (unsigned long) (q >> 32)) << 32)
             + ((unsigned long long) ((unsigned long) q * (unsigned long) (p >> 32)) << 32);
    }
  2. Generate the assembly listing example19.asm from the source file example19.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample19.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example19.c
  3. Display the assembly listing example19.asm created in step 2.:

    Type example19.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example19.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ullmul
    PUBLIC	_llmul
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ullmul
    _TEXT	SEGMENT
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _ullmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example19.c
    ; Line 12
    	mov	eax, DWORD PTR _p$[esp-4]
    	mul	DWORD PTR _q$[esp-4]
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	push	esi
    	mov	esi, DWORD PTR _q$[esp+4]
    	imul	esi, DWORD PTR _p$[esp]
    	add	esi, ecx
    	add	eax, 0
    	adc	edx, esi
    	pop	esi
    	add	edx, ecx
    	mov	ecx, DWORD PTR _q$[esp]
    	imul	ecx, DWORD PTR _p$[esp-4]
    	add	edx, ecx
    ; Line 15
    	ret	0
    _ullmul	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llmul
    _TEXT	SEGMENT
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _llmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example19.c
    ; Line 26
    	mov	eax, DWORD PTR _p$[esp-4]
    	mul	DWORD PTR _q$[esp-4]
    	push	ebp
    	push	edi
    	mov	edi, DWORD PTR _q$[esp+8]
    	mov	ebp, eax
    	mov	ecx, edi
    	imul	edi, DWORD PTR _p$[esp+4]
    	sar	ecx, 31					; 0000001fH
    	mov	ecx, DWORD PTR _p$[esp+8]
    	mov	eax, ecx
    	imul	ecx, DWORD PTR _q$[esp+4]
    	sar	eax, 31					; 0000001fH
    	add	edi, ecx
    	add	ebp, 0
    	mov	eax, ebp
    	adc	edx, edi
    	pop	edi
    	pop	ebp
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _q$[esp]
    	imul	ecx, DWORD PTR _p$[esp-4]
    	add	edx, ecx
    ; Line 29
    	ret	0
    _llmul	ENDP
    _TEXT	ENDS
    END
    Especially notice the superfluous arithmetic right shifts by 31 generated for the llmul() routine, and the preceding loads of the registers ECX and EAX: their results are never used!
    The other highlight is the addition of 0, which can’t set the carry flag CF, followed by an addition with carry ADC instruction, which adds this flag.
  4. Generate another assembly listing example19.asm from the source file example19.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro OPTIMIZE defined on the command line:

    CL.EXE /Bv /c /DOPTIMIZE /Fa /FoNUL: /Gy /Ox /Tcexample19.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example19.c
  5. Display the assembly listing example19.asm created in step 4.:

    Type example19.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example19.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ullmul
    PUBLIC	_llmul
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ullmul
    _TEXT	SEGMENT
    tv261 = 8						; size = 8
    tv252 = 8						; size = 8
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _ullmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example19.c
    ; Line 4
    	push	esi
    	mov	esi, DWORD PTR _q$[esp]
    	push	edi
    	mov	edi, DWORD PTR _p$[esp+4]
    ; Line 6
    	mov	eax, edi
    	or	eax, esi
    	jne	SHORT $LN2@ullmul
    ; Line 7
    	pop	edi
    	xor	edx, edx
    ; Line 15
    	pop	esi
    	ret	0
    $LN2@ullmul:
    	push	ebx
    ; Line 9
    	mov	ebx, DWORD PTR _q$[esp+12]
    	mov	eax, edi
    	push	ebp
    	mov	ebp, DWORD PTR _p$[esp+16]
    	mov	ecx, ebp
    	or	ecx, ebx
    	mov	DWORD PTR tv252[esp+16], 0
    	mov	DWORD PTR tv261[esp+16], 0
    	jne	SHORT $LN3@ullmul
    ; Line 10
    	pop	ebp
    	pop	ebx
    	pop	edi
    	mul	esi
    ; Line 15
    	pop	esi
    	ret	0
    $LN3@ullmul:
    ; Line 12
    	imul	ebx, edi
    	imul	ebp, esi
    	mul	esi
    	add	ebx, ebp
    	add	eax, 0
    	pop	ebp
    	adc	edx, ebx
    	pop	ebx
    	pop	edi
    ; Line 15
    	pop	esi
    ; Line 6
    	mov	eax, DWORD PTR _p$[esp-4]
    	mov	edx, DWORD PTR _q$[esp-4]
    	or	edx, eax
    	je	SHORT $LN2@ullmul
    ; Line 9
    	mov	ecx, DWORD PTR _q$[esp]
    	mov	edx, DWORD PTR _p$[esp]
    	or	edx, ecx
    	jne	SHORT $LN3@ullmul
    ; Line 10
    	mul	DWORD PTR _q$[esp-4]
    $LN2@ullmul:
    ; Line 15
    	ret	0
    $LN3@ullmul:
    ; Line 12
    	imul	ecx, eax
    	mul	DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    ; Line 15
    	ret	0
    _ullmul	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llmul
    _TEXT	SEGMENT
    tv249 = 8						; size = 8
    tv240 = 8						; size = 8
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _llmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example19.c
    ; Line 18
    	sub	esp, 8
    	push	ebx
    ; Line 20
    	mov	ebx, DWORD PTR _p$[esp+12]
    	mov	eax, ebx
    	sar	eax, 31					; 0000001fH
    	mov	ecx, ebx
    	push	ebp
    	mov	ebp, DWORD PTR _q$[esp+16]
    	mov	DWORD PTR tv240[esp+20], eax
    	mov	eax, ebp
    	sar	eax, 31					; 0000001fH
    	or	ecx, ebp
    	push	esi
    	mov	esi, DWORD PTR _p$[esp+16]
    	push	edi
    	mov	edi, DWORD PTR _q$[esp+20]
    	mov	DWORD PTR tv249[esp+28], eax
    	mov	eax, esi
    	jne	SHORT $LN2@llmul
    ; Line 21
    	mul	edi
    	pop	edi
    ; Line 29
    	pop	esi
    	pop	ebp
    	pop	ebx
    	add	esp, 8
    	ret	0
    $LN2@llmul:
    ; Line 23
    	or	eax, edi
    	jne	SHORT $LN3@llmul
    ; Line 29
    	pop	edi
    	pop	esi
    	pop	ebp
    	xor	edx, edx
    	pop	ebx
    	add	esp, 8
    	ret	0
    $LN3@llmul:
    ; Line 26
    	mov	eax, esi
    	imul	esi, ebp
    	mul	edi
    	imul	edi, ebx
    	add	esi, edi
    	add	eax, 0
    	pop	edi
    	adc	edx, esi
    ; Line 29
    	pop	esi
    	pop	ebp
    	pop	ebx
    	add	esp, 8
    ; Line 20
    	mov	ecx, DWORD PTR _q$[esp]
    	mov	edx, DWORD PTR _p$[esp]
    	or	edx, ecx
    	jne	SHORT $LN2@ullmul
    ; Line 21
    	mul	DWORD PTR _q$[esp-4]
    ; Line 29
    	ret	0
    $LN2@llmul:
    ; Line 23
    	mov	eax, DWORD PTR _p$[esp-4]
    	mov	edx, DWORD PTR _q$[esp-4]
    	or	edx, eax
    	je	SHORT $LN3@llmul
    ; Line 26
    	imul	ecx, eax
    	mul	DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    $LN3@llmul:
    ; Line 29
    	ret	0
    _llmul	ENDP
    _TEXT	ENDS
    END
    Instead to load the low parts of both arguments into the registers EAX and EDX (which return the result) and test their logical or for 0, the registers ESI and EDI are clobbered, which both must be saved and restored.
    In both routines superfluous temporary variables tv252 and tv261 respectively tv240 and tv249 are allocated and values assigned to them, which are but never used elsewhere – an advanced technique known as WORN!
    Again notice the superfluous arithmetic right shifts by 31 generated for the llmul() routine: their results are assigned to the (otherwise unused) temporary variables.
    The other highlight is still the addition of 0, which can’t set the carry flag CF, followed by an addition with carry ADC instruction, which adds this flag.

Example 20

Demonstration

  1. Create the text file example20.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long __udivmoddi4(unsigned long long numerator,
                                    unsigned long long denominator,
                                    unsigned long long *remainder);
    
    #ifndef ALTERNATE
    long long __absdi3(long long argument)
    {
        long long s = argument >> 63;   // s = argument < 0 ? -1 : 0
        return (argument ^ s) - s;      // negate if argument < 0
    }
    
    long long __divdi3(long long dividend, long long divisor)
    {
        long long r = divisor >> 63;    // r = divisor < 0 ? -1 : 0
        long long s = dividend >> 63;   // s = dividend < 0 ? -1 : 0
        divisor = (divisor ^ r) - r;    // negate if divisor < 0
        dividend = (dividend ^ s) - s;  // negate if dividend < 0
        s ^= r;                         // sign of quotient
                                        // negate if quotient < 0
        return (__udivmoddi4(dividend, divisor, 0) ^ s) - s;
    }
    
    long long __moddi3(long long dividend, long long divisor)
    {
        long long r = divisor >> 63;    // r = divisor < 0 ? -1 : 0
        long long s = dividend >> 63;   // s = dividend < 0 ? -1 : 0
        divisor = (divisor ^ r) - r;    // negate if divisor < 0
        dividend = (dividend ^ s) - s;  // negate if dividend < 0
        __udivmoddi4(dividend, divisor, (unsigned long long *) &r);
        return (r ^ s) - s;             // negate if dividend < 0
    }
    #else
    typedef union _large
    {
        long long ll;
        unsigned long long ull;
        struct
        {
            unsigned long low;
            long high;
        };
    } LARGE;
    
    long long __absdi3(long long argument)
    {
        LARGE value = {argument};
        long long s = (long long) value.high >> 32;
        return (value.ll ^ s) - s;
    }
    
    long long __divdi3(long long numerator, long long denominator)
    {
        LARGE divisor = {denominator};
        LARGE dividend = {numerator};
        long long r = (long long) divisor.high >> 32;
        long long s = (long long) dividend.high >> 32;
        divisor.ll = (divisor.ll ^ r) - r;
        dividend.ll = (dividend.ll ^ s) - s;
        s ^= r;
        return (__udivmoddi4(dividend.ull, divisor.ull, (unsigned long long *) 0) ^ s) - s;
    }
    
    long long __moddi3(long long numerator, long long denominator)
    {
        LARGE divisor = {denominator};
        LARGE dividend = {numerator};
        LARGE remainder;
        long long r = (long long) divisor.high >> 32;
        long long s = (long long) dividend.high >> 32;
        divisor.ll = (divisor.ll ^ r) - r;
        dividend.ll = (dividend.ll ^ s) - s;
        __udivmoddi4(dividend.ull, divisor.ull, &remainder.ull);
        return (remainder.ll ^ s) - s;
    }
    
    long long __muldi3(long long multiplicand, long long multiplier)
    {
        LARGE p = {multiplicand};
        LARGE q = {multiplier};
        LARGE product = {__emulu(p.low, q.low)};
        product.high += p.low * q.high;
        product.high += q.low * p.high;
        return product.ll;
    }
    #endif
  2. Generate the assembly listing example20.asm from the source file example20.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample20.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example20.c
  3. Display the assembly listing example20.asm created in step 2.:

    Type example20.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example20.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	___absdi3
    PUBLIC	___divdi3
    PUBLIC	___moddi3
    EXTRN	___udivmoddi4:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___absdi3
    _TEXT	SEGMENT
    _argument$ = 8						; size = 8
    ___absdi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example20.c
    ; Line 9
    	push	esi
    	push	edi
    ; Line 10
    	mov	edi, DWORD PTR _argument$[esp+8]
    	mov	esi, edi
    	sar	esi, 31					; 0000001fH
    	mov	ecx, edi
    	mov	edx, DWORD PTR _argument$[esp]
    	mov	eax, DWORD PTR _argument$[esp-4]
    	mov	ecx, edx
    	sar	ecx, 31					; 0000001fH
    ; Line 11
    	xor	eax, ecx
    	xor	edx, ecx
    	sub	eax, ecx
    	sbb	edx, ecx
    	mov	eax, esi
    	xor	eax, DWORD PTR _argument$[esp+4]
    	mov	edx, ecx
    	xor	edx, edi
    	sub	eax, esi
    	pop	edi
    	sbb	edx, ecx
    	pop	esi
    ; Line 12
    	ret	0
    ___absdi3 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___divdi3
    _TEXT	SEGMENT
    _s$1$ = -4						; size = 4
    _s$2$ = 8						; size = 4
    _dividend$ = 8						; size = 8
    _divisor$ = 16						; size = 8
    ___divdi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example20.c
    ; Line 15
    	push	ecx
    ; Line 17
    	mov	eax, DWORD PTR _dividend$[esp+4]
    	mov	edx, eax
    	push	ebx
    	push	ebp
    	mov	ebp, DWORD PTR _divisor$[esp+12]
    	mov	ecx, eax
    	push	esi
    	sar	edx, 31					; 0000001fH
    	mov	ebx, ebp
    	sar	ecx, 31					; 0000001fH
    ; Line 19
    	mov	esi, edx
    	xor	esi, DWORD PTR _dividend$[esp+12]
    	push	edi
    	mov	DWORD PTR _s$1$[esp+20], edx
    	mov	edi, ebp
    	mov	edx, ecx
    	sar	edi, 31					; 0000001fH
    	xor	edx, eax
    	sar	ebx, 31					; 0000001fH
    	mov	eax, DWORD PTR _s$1$[esp+20]
    	sub	esi, eax
    	mov	eax, DWORD PTR _divisor$[esp+4]
    	mov	ecx, DWORD PTR _divisor$[esp]
    	cdq
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	mov	ebx, edx
    ; Line 22
    	push	0
    	sbb	edx, ecx
    	xor	eax, ebx
    	xor	ecx, edi
    	mov	DWORD PTR _s$1$[esp+24], eax
    	mov	DWORD PTR _s$2$[esp+20], ecx
    	mov	eax, edi
    	mov	ecx, ebx
    	xor	eax, ebp
    	xor	ecx, DWORD PTR _divisor$[esp+20]
    	sub	ecx, ebx
    	sbb	eax, edi
    	push	eax
    	push	ecx
    	push	edx
    	push	esi
    	mov	eax, DWORD PTR _dividend$[esp+16]
    	mov	ecx, DWORD PTR _dividend$[esp+12]
    	cdq
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	push	eax
    	push	ecx
    	xor	ebx, edx
    	call	___udivmoddi4
    	xor	eax, ebx
    	xor	edx, ebx
    	sub	eax, ebx
    	sbb	edx, ebx
    	xor	eax, DWORD PTR _s$1$[esp+40]
    	add	esp, 20					; 00000014H
    	xor	edx, DWORD PTR _s$2$[esp+16]
    	sub	eax, DWORD PTR _s$1$[esp+20]
    	sbb	edx, DWORD PTR _s$2$[esp+16]
    	pop	edi
    	pop	esi
    	pop	ebp
    	pop	ebx
    ; Line 23
    	pop	ecx
    	ret	0
    ___divdi3 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___moddi3
    _TEXT	SEGMENT
    _s$1$ = -12						; size = 4
    _r$ = -8						; size = 8
    _dividend$ = 8						; size = 8
    _divisor$ = 16						; size = 8
    ___moddi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example20.c
    ; Line 26
    	sub	esp, 12					; 0000000cH
    	sub	esp, 8
    ; Line 28
    	mov	edx, DWORD PTR _dividend$[esp+12]
    	mov	eax, edx
    	push	ebx
    	mov	ebx, DWORD PTR _divisor$[esp+16]
    	push	ebp
    	push	esi
    	sar	eax, 31					; 0000001fH
    	mov	esi, ebx
    	push	edi
    	mov	DWORD PTR _s$1$[esp+28], eax
    	mov	edi, ebx
    	sar	esi, 31					; 0000001fH
    ; Line 31
    	lea	eax, DWORD PTR _r$[esp+28]
    	lea	eax, DWORD PTR _r$[esp+12]
    	push	eax
    	sar	edi, 31					; 0000001fH
    	mov	ebp, edx
    	sar	ebp, 31					; 0000001fH
    	mov	ecx, edi
    	xor	ecx, DWORD PTR _divisor$[esp+28]
    	mov	eax, esi
    	xor	eax, ebx
    	mov	DWORD PTR _r$[esp+36], esi
    	sub	ecx, edi
    	mov	DWORD PTR _r$[esp+32], edi
    	sbb	eax, esi
    	mov	esi, DWORD PTR _s$1$[esp+32]
    	mov	eax, DWORD PTR _divisor$[esp+12]
    	mov	ecx, DWORD PTR _divisor$[esp+8]
    	cdq
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	push	eax
    	push	ecx
    	mov	ecx, esi
    	mov	eax, ebp
    	xor	ecx, DWORD PTR _dividend$[esp+36]
    	xor	eax, edx
    	sub	ecx, esi
    	sbb	eax, ebp
    	mov	eax, DWORD PTR _dividend$[esp+12]
    	mov	ecx, DWORD PTR _dividend$[esp+8]
    	cdq
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	mov	ebx, edx
    	push	eax
    	push	ecx
    	call	___udivmoddi4
    	add	esp, 20					; 00000014H
    ; Line 32
    	mov	eax, esi
    	mov	eax, ebx
    	xor	eax, DWORD PTR _r$[esp+28]
    	xor	eax, DWORD PTR _r$[esp+12]
    	mov	edx, ebp
    	mov	edx, ebx
    	xor	edx, DWORD PTR _r$[esp+32]
    	xor	edx, DWORD PTR _r$[esp+16]
    	sub	eax, esi
    	sub	eax, ebx
    	pop	edi
    	pop	esi
    	sbb	edx, ebp
    	sbb	edx, ebx
    	pop	ebp
    	pop	ebx
    ; Line 33
    	add	esp, 12					; 0000000cH
    	add	esp, 8
    	ret	0
    ___moddi3 ENDP
    _TEXT	ENDS
    END
    While the code generated for the function __absdi3() is correct, it has 16 instructions and uses the registers EDI and ESI without necessity; the properly optimised code has only 9 instructions and clobbers no registers.
    Especially notice the 2 arithmetic right shifts by 31: they are performed on the same value, but in different registers!

    While the code generated for the function __divdi3() is correct, it has 50 instructions, clobbers all registers, but still performs multiple avoidable transfers between them, and additionally uses two superfluous temporary variables _s$1$ and _s$2$, which hold even the same value; the optimal code has only 30 instructions and clobbers just 1 register!
    Especially notice the repeated arithmetic right shifts by 31: half of them are superfluous; this includes the instructions to load the registers used too.

    While the code generated for the function __moddi3() is correct too, it has 51 instructions, clobbers all registers, but still performs multiple avoidable transfers between them, and additionally uses a superfluous temporary variable _s$1$; the properly optimised code has only 34 instructions and clobbers just 1 register!
    Again notice the repeated arithmetic right shifts by 31: half of them are superfluous; this includes the instructions to load the registers used too.

  4. Repeat the previous steps with the alternate implementation; generate another assembly listing example20.asm from the source file example20.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tcexample20.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example20.c
    example20.c(43): warning C4201: nonstandard extension used: nameless struct/union
    example20.c(48): warning C4204: nonstandard extension used: non-constant aggregate initializer
    example20.c(55): warning C4204: nonstandard extension used: non-constant aggregate initializer
    example20.c(56): warning C4204: nonstandard extension used: non-constant aggregate initializer
    example20.c(67): warning C4204: nonstandard extension used: non-constant aggregate initializer
    example20.c(68): warning C4204: nonstandard extension used: non-constant aggregate initializer
    example20.c(80): warning C4204: nonstandard extension used: non-constant aggregate initializer
    example20.c(81): warning C4204: nonstandard extension used: non-constant aggregate initializer
    example20.c(82): warning C4204: nonstandard extension used: non-constant aggregate initializer
  5. Display the assembly listing example20.asm created in step 4.:

    Type example20.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example20.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	___absdi3
    PUBLIC	___divdi3
    PUBLIC	___moddi3
    PUBLIC	___muldi3
    EXTRN	___udivmoddi4:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___absdi3
    _TEXT	SEGMENT
    _argument$ = 8						; size = 8
    ___absdi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example20.c
    ; Line 48
    	mov	eax, DWORD PTR _argument$[esp]
    	cdq
    ; Line 49
    	mov	ecx, edx
    	push	esi
    	mov	esi, ecx
    	mov	eax, ecx
    	xor	eax, DWORD PTR _argument$[esp]
    	sar	esi, 31					; 0000001fH
    	mov	edx, esi
    	xor	edx, DWORD PTR _argument$[esp+4]
    ; Line 50
    	sub	eax, ecx
    	sbb	edx, esi
    	pop	esi
    	mov	ecx, DWORD PTR _argument$[esp-4]
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	mov	edx, eax
    	mov	eax, ecx
    ; Line 51
    	ret	0
    ___absdi3 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___divdi3
    _TEXT	SEGMENT
    _s$2$ = -4						; size = 4
    _s$1$ = 8						; size = 4
    _numerator$ = 8						; size = 8
    _denominator$ = 16					; size = 8
    ___divdi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example20.c
    ; Line 54
    	push	ecx
    ; Line 56
    	mov	eax, DWORD PTR _denominator$[esp+4]
    	cdq
    ; Line 57
    	mov	eax, DWORD PTR _numerator$[esp+4]
    	push	ebx
    	push	ebp
    	mov	ebx, edx
    	cdq
    	push	esi
    	push	edi
    ; Line 58
    	mov	eax, edx
    	mov	ebp, ebx
    	sar	edx, 31					; 0000001fH
    	mov	edi, eax
    	xor	edi, DWORD PTR _numerator$[esp+16]
    	mov	esi, edx
    	xor	esi, DWORD PTR _numerator$[esp+20]
    ; Line 62
    	mov	ecx, ebx
    	sar	ebp, 31					; 0000001fH
    	sub	edi, eax
    	push	0
    	sbb	esi, edx
    	xor	ecx, DWORD PTR _denominator$[esp+20]
    	xor	eax, ebx
    	xor	edx, ebp
    	mov	DWORD PTR _s$1$[esp+20], eax
    	mov	eax, ebp
    	xor	eax, DWORD PTR _denominator$[esp+24]
    	sub	ecx, ebx
    	mov	DWORD PTR _s$2$[esp+24], edx
    	sbb	eax, ebp
    	push	eax
    	push	ecx
    	push	esi
    	push	edi
    	call	___udivmoddi4
    	xor	eax, DWORD PTR _s$1$[esp+36]
    	add	esp, 20					; 00000014H
    	xor	edx, DWORD PTR _s$2$[esp+20]
    	sub	eax, DWORD PTR _s$1$[esp+16]
    	sbb	edx, DWORD PTR _s$2$[esp+20]
    	pop	edi
    	pop	esi
    	pop	ebp
    	pop	ebx
    ; Line 63
    	pop	ecx
    	ret	0
    ___divdi3 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___moddi3
    _TEXT	SEGMENT
    _s$2$ = -16						; size = 4
    _s$1$ = -12						; size = 4
    _remainder$ = -8					; size = 8
    _numerator$ = 8						; size = 8
    _denominator$ = 16					; size = 8
    ___moddi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example20.c
    ; Line 66
    	sub	esp, 16					; 00000010H
    ; Line 68
    	mov	eax, DWORD PTR _denominator$[esp+16]
    	cdq
    ; Line 70
    	mov	eax, DWORD PTR _numerator$[esp+16]
    	push	esi
    	mov	esi, edx
    	cdq
    	push	edi
    ; Line 71
    	mov	eax, edx
    	mov	DWORD PTR _s$1$[esp+24], edx
    	sar	eax, 31					; 0000001fH
    	mov	edi, esi
    	mov	DWORD PTR _s$2$[esp+24], eax
    ; Line 74
    	mov	ecx, esi
    	xor	ecx, DWORD PTR _denominator$[esp+20]
    	lea	eax, DWORD PTR _remainder$[esp+24]
    	push	eax
    	sar	edi, 31					; 0000001fH
    	mov	eax, edi
    	xor	eax, DWORD PTR _denominator$[esp+28]
    	sub	ecx, esi
    	mov	esi, DWORD PTR _s$2$[esp+28]
    	mov	esi, DWORD PTR _s$1$[esp+28]
    	sbb	eax, edi
    	push	eax
    	push	ecx
    	mov	ecx, edx
    	mov	eax, esi
    	xor	ecx, DWORD PTR _numerator$[esp+32]
    	xor	eax, DWORD PTR _numerator$[esp+36]
    	sub	ecx, edx
    	sbb	eax, esi
    	push	eax
    	push	ecx
    	call	___udivmoddi4
    ; Line 75
    	mov	eax, DWORD PTR _remainder$[esp+44]
    	add	esp, 20					; 00000014H
    	xor	eax, DWORD PTR _s$1$[esp+24]
    	mov	edx, DWORD PTR _remainder$[esp+28]
    	xor	edx, esi
    	sub	eax, DWORD PTR _s$1$[esp+24]
    	pop	edi
    	sbb	edx, esi
    	pop	esi
    ; Line 76
    	add	esp, 16					; 00000010H
    	ret	0
    ___moddi3 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___muldi3
    _TEXT	SEGMENT
    _multiplicand$ = 8					; size = 8
    _multiplier$ = 16					; size = 8
    ___muldi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example20.c
    ; Line 82
    	mov	eax, DWORD PTR _multiplicand$[esp-4]
    	mul	DWORD PTR _multiplier$[esp-4]
    	mov	ecx, DWORD PTR _multiplier$[esp]
    ; Line 84
    	imul	ecx, DWORD PTR _multiplicand$[esp-4]
    	push	esi
    	mov	esi, DWORD PTR _multiplicand$[esp+4]
    	imul	esi, DWORD PTR _multiplier$[esp]
    	add	edx, ecx
    	add	edx, esi
    ; Line 85
    	pop	esi
    ; Line 86
    	ret	0
    ___muldi3 ENDP
    _TEXT	ENDS
    END
    The code generated for the alternate implementation, which is supposed to prod and tickle the optimiser, is just marginally better: again all registers are clobbered, superfluous temporary variables which hold the same value are used, and superfluous arithmetic right shifts by 31 are emitted.

Example 21

Demonstration

  1. Create the text file example21.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long llsgn0(long long value)
    {
        return value < 0 ? -1 : 0;
    }
    
    int llsgn1(long long value)
    {
        return value < 0;
    }
    
    int llsgn2(long long value)
    {
        return value >> 63;
    }
    
    int llsgn3(long long value)
    {
        return (value >> 63) != 0;
    }
    
    int llsgn4(long long value)
    {
        return (value & (1LL << 63)) != 0;
    }
  2. Generate the assembly listing example21.asm from the source file example21.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample21.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example21.c
  3. Display the assembly listing example21.asm created in step 2.:

    Type example21.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example21.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_llsgn0
    PUBLIC	_llsgn1
    PUBLIC	_llsgn2
    PUBLIC	_llsgn3
    PUBLIC	_llsgn4
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn0
    _TEXT	SEGMENT
    $T1 = 8							; size = 8
    _value$ = 8						; size = 8
    _llsgn0	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example21.c
    ; Line 5
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn0
    	jl	SHORT $LN5@llsgn0
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn0
    $LN5@llsgn0:
    	or	eax, -1
    	or	edx, eax
    ; Line 6
    	ret	0
    $LN3@llsgn0:
    	xorps	xmm0, xmm0
    ; Line 5
    	movlpd	QWORD PTR $T1[esp-4], xmm0
    	mov	eax, DWORD PTR $T1[esp-4]
    	mov	edx, DWORD PTR $T1[esp]
    	mov	eax, DWORD PTR _value$[esp]
    	cdq
    	mov	eax, edx
    ; Line 6
    	ret	0
    _llsgn0	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn1
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn1	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example21.c
    ; Line 10
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn1
    	jl	SHORT $LN5@llsgn1
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn1
    $LN5@llsgn1:
    	mov	eax, 1
    ; Line 11
    	ret	0
    $LN3@llsgn1:
    ; Line 10
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 11
    	ret	0
    _llsgn1	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn2
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn2	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example21.c
    ; Line 15
    	mov	ecx, DWORD PTR _value$[esp]
    	mov	eax, ecx
    	sar	eax, 31					; 0000001fH
    	sar	ecx, 31					; 0000001fH
    	mov	eax, DWORD PTR _value$[esp]
    	sar	eax, 31					; 0000001fH
    ; Line 16
    	ret	0
    _llsgn2	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn3
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn3	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example21.c
    ; Line 20
    	mov	ecx, DWORD PTR _value$[esp]
    	xor	eax, eax
    	and	ecx, -2147483648			; 80000000H
    	or	eax, ecx
    	je	SHORT $LN3@llsgn3
    	mov	eax, 1
    ; Line 21
    	ret	0
    $LN3@llsgn3:
    ; Line 20
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 21
    	ret	0
    _llsgn3	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn4
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn4	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example21.c
    ; Line 25
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn4
    	jl	SHORT $LN5@llsgn4
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn4
    $LN5@llsgn4:
    	mov	eax, 1
    ; Line 26
    	ret	0
    $LN3@llsgn4:
    ; Line 25
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 26
    	ret	0
    _llsgn4	ENDP
    _TEXT	ENDS
    END
    The optimiser fails to recognise all these commonly used expressions to determine the sign of an integer value!

    Especially notice the completely in(s)ane use of the SSE register XMM0 and the temporary variable $T1 instead of just two XOR instructions to zero the registers EAX and EDX in the function llsgn0(), the two SAR instructions in the function llsgn2(), and the completely insane code generated for the function llsgn3()!

Example 22

Demonstration

  1. Create the text file example22.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int absolute(int x)
    {
    #ifdef ALTERNATE
        long long z = x;
        return x - ((x + x) & (z >> 32));
    #else
        return x - ((x + x) & (x >> 31));
    #endif
    }
    
    int maximum(int x, int y)
    {
    #ifdef ALTERNATE
        long long z = (y = x - y);
        x -= y & (z >> 32);
    #else
        y = -y;
        y += x;
        x -= y & (y >> 31);
    #endif
        return x;
    }
    
    int minimum(int x, int y)
    {
    #ifdef ALTERNATE
        long long z = (y -= x);
        x += y & (z >> 32);
    #else
        y -= x;
        x += y & (y >> 31);
    #endif
        return x;
    }
    
    int sign(int x)
    {
    #ifdef ALTERNATE
        long long z = x;
        return z >> 32;
    #else
        return x < 0 ? -1 : 0;
    #endif
    }
  2. Generate the assembly listing example22.asm from the source file example22.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample22.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example22.c
  3. Display the assembly listing example22.asm created in step 2.:

    Type example22.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example22.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_absolute
    PUBLIC	_maximum
    PUBLIC	_minimum
    PUBLIC	_sign
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_absolute
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _absolute PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 9
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	edx, eax
    	sar	edx, 31					; 0000001fH
    	lea	ecx, DWORD PTR [eax+eax]
    	and	edx, ecx
    	sub	eax, edx
    ; Line 11
    	ret	0
    _absolute ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_maximum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _maximum PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 20
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	edx, eax
    	sub	edx, DWORD PTR _y$[esp-4]
    ; Line 21
    	mov	ecx, edx
    	sar	ecx, 31					; 0000001fH
    	and	ecx, edx
    	sub	eax, ecx
    ; Line 24
    	ret	0
    _maximum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_minimum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _minimum PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 32
    	mov	ecx, DWORD PTR _y$[esp-4]
    	sub	ecx, DWORD PTR _x$[esp-4]
    ; Line 33
    	mov	eax, ecx
    	sar	eax, 31					; 0000001fH
    	and	eax, ecx
    	add	eax, DWORD PTR _x$[esp-4]
    ; Line 36
    	ret	0
    _minimum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_sign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _sign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 44
    	mov	eax, DWORD PTR _x$[esp-4]
    	sar	eax, 31					; 0000001fH
    ; Line 46
    	ret	0
    _sign	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, the optimiser fails to recognise these well-known but superfluous old school expressions optimisations and emit code better suited for current processors less than 23 years old instead, as shown in step 6. and following below.
  4. Repeat the previous steps with the alternate implementation; generate another assembly listing example22.asm from the source file example22.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tcexample22.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example22.c
  5. Display the assembly listing example22.asm created in step 4.:

    Type example22.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example22.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_absolute
    PUBLIC	_maximum
    PUBLIC	_minimum
    PUBLIC	_sign
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_absolute
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _absolute PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 4
    	push	esi
    ; Line 6
    	mov	eax, DWORD PTR _x$[esp]
    	mov	esi, DWORD PTR _x$[esp]
    	mov	eax, esi
    	cdq
    ; Line 7
    	mov	ecx, edx
    	sar	ecx, 31					; 0000001fH
    	lea	ecx, DWORD PTR [esi+esi]
    	lea	ecx, DWORD PTR [eax+eax]
    	and	edx, ecx
    	sub	eax, edx
    	sub	esi, edx
    	mov	eax, esi
    	pop	esi
    ; Line 11
    	ret	0
    _absolute ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_maximum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _maximum PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 14
    	push	esi
    	push	edi
    ; Line 16
    	mov	edi, DWORD PTR _x$[esp+4]
    	mov	esi, edi
    	sub	esi, DWORD PTR _y$[esp+4]
    	mov	eax, esi
    	mov	ecx, DWORD PTR _x$[esp+4]
    	mov	eax, ecx
    	sub	eax, DWORD PTR _y$[esp+4]
    	cdq
    ; Line 17
    	mov	ecx, edx
    	and	edx, esi
    	and	edx, eax
    	sub	ecx, edx
    	sub	edi, edx
    	sar	ecx, 31					; 0000001fH
    ; Line 23
    	mov	eax, ecx
    	mov	eax, edi
    	pop	edi
    	pop	esi
    ; Line 24
    	ret	0
    _maximum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_minimum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _minimum PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 27
    	push	esi
    ; Line 29
    	mov	esi, DWORD PTR _y$[esp]
    	sub	esi, DWORD PTR _x$[esp]
    	mov	eax, esi
    	mov	eax, DWORD PTR _y$[esp]
    	sub	eax, DWORD PTR _x$[esp]
    	cdq
    ; Line 30
    	mov	ecx, edx
    	and	edx, esi
    	and	edx, eax
    	add	edx, DWORD PTR _x$[esp]
    	sar	ecx, 31					; 0000001fH
    ; Line 35
    	mov	eax, edx
    	pop	esi
    ; Line 36
    	ret	0
    _minimum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_sign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _sign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 41
    	mov	eax, DWORD PTR _x$[esp-4]
    	cdq
    ; Line 42
    	mov	eax, edx
    	sar	eax, 31					; 0000001fH
    	mov	eax, edx
    ; Line 46
    	ret	0
    _sign	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, it uses the registers EDI and ESI without necessity, and the majority of instructions are superfluous.
    Especially notice the arithmetic right shifts: their results are never used!
  6. Overwrite the text file example22.c created in step 1. with the following content:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int absolute(int x)
    {
        return x < 0 ? -x : x;
    }
    
    int maximum(int x, int y)
    {
        return x > y ? x : y;
    }
    
    int minimum(int x, int y)
    {
        return x < y ? x : y;
    }
    
    int sign(int x)
    {
        return x < 0 ? -1 : 0;
    }
  7. Generate the assembly listing example22.asm from the source file example22.c created in step 6., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample22.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example22.c
  8. Display the assembly listing example22.asm created in step 2.:

    Type example22.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example22.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_absolute
    PUBLIC	_maximum
    PUBLIC	_minimum
    PUBLIC	_sign
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_absolute
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _absolute PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 5
    	mov	eax, DWORD PTR _x$[esp-4]
    	cdq
    	xor	eax, edx
    	sub	eax, edx
    ; Line 6
    	ret	0
    _absolute ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_maximum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _maximum PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 10
    	mov	eax, DWORD PTR _y$[esp-4]
    	cmp	DWORD PTR _x$[esp-4], eax
    	cmovg	eax, DWORD PTR _x$[esp-4]
    ; Line 11
    	ret	0
    _maximum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_minimum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _minimum PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 15
    	mov	eax, DWORD PTR _y$[esp-4]
    	cmp	DWORD PTR _x$[esp-4], eax
    	cmovl	eax, DWORD PTR _x$[esp-4]
    ; Line 16
    	ret	0
    _minimum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_sign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _sign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example22.c
    ; Line 20
    	mov	eax, DWORD PTR _x$[esp-4]
    	sar	eax, 31					; 0000001fH
    ; Line 21
    	ret	0
    _sign	ENDP
    _TEXT	ENDS
    END

Example 23

Superfluous instructions generated for the intrinsic function __getcallerseflags() by the Visual C 2017 compiler (and previous versions too):

Demonstration

  1. Create the text file example23.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int main()
    {
        return __getcallerseflags();
    }
  2. Generate the assembly listing example23.asm from the source file example23.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample23.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example23.c
  3. Display the assembly listing example23.asm created in step 2.:

    Type example23.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example23.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    __$Eflags$ = 4						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example23.c
    ; Line 4
    	pushfd
    	push	ebp
    	mov	ebp, esp
    ; Line 5
    	mov	eax, DWORD PTR __$Eflags$[ebp]
    ; Line 6
    	pop	ebp
    	pop	ecx
    	pop	eax
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
  4. Generate the assembly listing example23.asm from the source file example23.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample23.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example23.c
  5. Display the assembly listing example23.asm created in step 4.:

    Type example23.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	main
    …
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	main
    _TEXT	SEGMENT
    __$Eflags$ = 0
    main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example23.c
    ; Line 4
    $LN4:
    	pushfq
    ; Line 5
    	mov	eax, DWORD PTR __$Eflags$[rsp]
    ; Line 6
    	pop	rcx
    	pop	rax
    	ret	0
    main	ENDP
    _TEXT	ENDS
    END

Example 24

Demonstration

  1. Create the text file example24.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define STRICT
    #define UNICODE
    #define WIN32_LEAN_AND_MEAN
    
    #include <windows.h>
    #include <unknwn.h>
    
    #define IF2CO(class, member, interface)	(&((class *) 0)->member == interface, \
    					 ((class *) (((char *) interface) - (size_t) &(((class *) 0)->member))))
    
    extern	const	GUID	CLSID_NULL;
    
    extern	DWORD	dwCount;
    
    typedef	struct	_CUnknown
    {
    	DWORD		dwCount;
    
    	IUnknown	Unknown;
    } CUnknown;
    
    HRESULT	WINAPI	Unknown_QueryInterface(IUnknown *this, REFIID rIID, VOID **ppv)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    
    	if (ppv == NULL)
    		return E_POINTER;
    
    	*ppv = NULL;
    
    	if (rIID == NULL)
    		return E_INVALIDARG;
    
    	if (!IsEqualIID(rIID, &IID_IUnknown))
    		return E_NOINTERFACE;
    
    	*ppv = &that->Unknown;
    
    	_InterlockedIncrement(&that->dwCount);
    
    	return S_OK;
    }
    
    DWORD	WINAPI	Unknown_AddRef(IUnknown *this)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    
    	return _InterlockedIncrement(&that->dwCount);
    }
    
    DWORD	WINAPI	Unknown_Release(IUnknown *this)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    	DWORD		dw = _InterlockedDecrement(&that->dwCount);
    
    	if (dw != 0L)
    		return dw;
    
    	_InterlockedDecrement(&dwCount);
    
    	CoTaskMemFree(that);
    
    	return 0L;
    }
    
    const	IUnknownVtbl	Unknown_Vtbl = {Unknown_QueryInterface, Unknown_AddRef, Unknown_Release};
    Note: this ANSI C source is a minimum implementation of the IUnknown interface.
  2. Generate the assembly listing example24.asm from the source file example24.c created in step 1., using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1is /Tcexample24.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    example24.c
  3. Display the assembly listing example24.asm created in step 2.:

    Type example24.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    	TITLE	C:\Users\Stefan\Desktop\example24.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_Unknown_Release@4
    PUBLIC	_Unknown_AddRef@4
    PUBLIC	_Unknown_QueryInterface@12
    PUBLIC	_Unknown_Vtbl
    
    ;	COMDAT	CONST
    CONST	SEGMENT
    _Unknown_Vtbl DD FLAT:_Unknown_QueryInterface@12
    	DD	FLAT:_Unknown_AddRef@4
    	DD	FLAT:_Unknown_Release@4
    CONST	ENDS
    
    EXTRN	_IID_IUnknown:BYTE
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_QueryInterface@12
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _rIID$ = 12						; size = 4
    _ppv$ = 16						; size = 4
    _Unknown_QueryInterface@12 PROC				; COMDAT
    ; File c:\users\stefan\desktop\example24.c
    ; Line 26
    	mov	edx, DWORD PTR _this$[esp-4]
    ; Line 28
    	mov	eax, DWORD PTR _ppv$[esp-4]
    	add	edx, -4					; fffffffcH
    	test	eax, eax
    	jne	SHORT $LN3@Unknown_Qu
    ; Line 29
    	mov	eax, -2147467261			; 80004003H
    	jmp	SHORT $LN4@Unknown_Qu
    $LN3@Unknown_Qu:
    ; Line 31
    	and	DWORD PTR [eax], 0
    	push	esi
    ; Line 33
    	mov	esi, DWORD PTR _rIID$[esp]
    	test	esi, esi
    	jne	SHORT $LN2@Unknown_Qu
    ; Line 34
    	mov	eax, -2147024809			; 80070057H
    	jmp	SHORT $LN7@Unknown_Qu
    $LN2@Unknown_Qu:
    	push	ebx
    	push	edi
    ; Line 36
    	push	4
    	pop	ecx
    	xor	ebx, ebx
    	mov	edi, OFFSET _IID_IUnknown
    	repe	cmpsd
    	pop	edi
    	pop	ebx
    	je	SHORT $LN1@Unknown_Qu
    ; Line 37
    	mov	eax, -2147467262			; 80004002H
    	jmp	SHORT $LN7@Unknown_Qu
    $LN1@Unknown_Qu:
    ; Line 39
    	lea	ecx, DWORD PTR [edx+4]
    	mov	DWORD PTR [eax], ecx
    ; Line 41
    	xor	eax, eax
    	inc	eax
    	lock	xadd DWORD PTR [edx], eax
    ; Line 43
    	xor	eax, eax
    $LN7@Unknown_Qu:
    	pop	esi
    $LN4@Unknown_Qu:
    ; Line 44
    	ret	12					; 0000000cH
    _Unknown_QueryInterface@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_AddRef@4
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _Unknown_AddRef@4 PROC					; COMDAT
    ; Line 48
    	mov	ecx, DWORD PTR _this$[esp-4]
    ; Line 50
    	xor	eax, eax
    	add	ecx, -4					; fffffffcH
    	inc	eax
    	lock	xadd DWORD PTR [ecx], eax
    	inc	eax
    ; Line 51
    	ret	4
    _Unknown_AddRef@4 ENDP
    _TEXT	ENDS
    
    EXTRN	__imp__CoTaskMemFree@4:PROC
    EXTRN	_dwCount:DWORD
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_Release@4
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _Unknown_Release@4 PROC					; COMDAT
    ; Line 55
    	mov	ecx, DWORD PTR _this$[esp-4]
    	add	ecx, -4					; fffffffcH
    ; Line 56
    	mov	edx, ecx
    	or	eax, -1
    	lock	xadd DWORD PTR [edx], eax
    	dec	eax
    ; Line 59
    	jne	SHORT $LN2@Unknown_Re
    ; Line 61
    	mov	eax, OFFSET _dwCount
    	or	edx, -1
    	lock	xadd DWORD PTR [eax], edx
    ; Line 63
    	push	ecx
    	call	DWORD PTR __imp__CoTaskMemFree@4
    ; Line 65
    	xor	eax, eax
    $LN2@Unknown_Re:
    ; Line 66
    	ret	4
    _Unknown_Release@4 ENDP
    _TEXT	ENDS
    END
    Notice the in(s)ane use of the EBX register around the inlined memcmp() function.

Example 25

Superfluous unreachable call of external routine __report_rangecheckfailure() generated by the Visual C 2017 compiler.

Demonstration

  1. Create the text file example25.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2019, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define MAX_PATH 260
    
    typedef short wchar_t;
    
    unsigned __stdcall GetModuleFileNameA(void *, char *, unsigned);
    
    int main()
    {
        char sz[MAX_PATH];
        unsigned dw = GetModuleFileNameA(0, sz, MAX_PATH);
    
        if (dw < MAX_PATH)
            sz[dw] = '\0';
    }
    
    unsigned __stdcall GetModuleFileNameW(void *, wchar_t *, unsigned);
    
    int wmain()
    {
        wchar_t sz[MAX_PATH];
        unsigned dw = GetModuleFileNameW(0, sz, MAX_PATH);
    
        if (dw < MAX_PATH)
            sz[dw] = L'\0';
    }
  2. Generate the assembly listing example25.asm from the source file example25.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tcexample25.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example25.c
  3. Display the assembly listing example25.asm created in step 2.:

    Type example25.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example25.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    PUBLIC	_wmain
    EXTRN	___report_rangecheckfailure:PROC
    EXTRN	_GetModuleFileNameA@12:PROC
    EXTRN	_GetModuleFileNameW@12:PROC
    EXTRN	@__security_check_cookie@4:PROC
    EXTRN	___security_cookie:DWORD
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _sz$ = -264						; size = 260
    __$ArrayPad$ = -4					; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example25.c
    ; Line 10
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 264				; 00000108H
    	mov	eax, DWORD PTR ___security_cookie
    	xor	eax, ebp
    	mov	DWORD PTR __$ArrayPad$[ebp], eax
    	push	esi
    ; Line 12
    	mov	esi, 260				; 00000104H
    	lea	eax, DWORD PTR _sz$[ebp]
    	push	esi
    	push	eax
    	push	0
    	call	_GetModuleFileNameA@12
    ; Line 14
    	cmp	eax, esi
    	pop	esi
    	jae	SHORT $LN2@main
    ; Line 15
    	mov	BYTE PTR _sz$[ebp+eax], 0
    $LN2@main:
    ; Line 16
    	mov	ecx, DWORD PTR __$ArrayPad$[ebp]
    	xor	eax, eax
    	xor	ecx, ebp
    	call	@__security_check_cookie@4
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_wmain
    _TEXT	SEGMENT
    _sz$ = -524						; size = 520
    __$ArrayPad$ = -4					; size = 4
    _wmain	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example25.c
    ; Line 21
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 524				; 0000020cH
    	mov	eax, DWORD PTR ___security_cookie
    	xor	eax, ebp
    	mov	DWORD PTR __$ArrayPad$[ebp], eax
    	push	esi
    ; Line 23
    	mov	esi, 260				; 00000104H
    	lea	eax, DWORD PTR _sz$[ebp]
    	push	esi
    	push	eax
    	push	0
    	call	_GetModuleFileNameW@12
    ; Line 25
    	cmp	eax, esi
    	pop	esi
    	jae	SHORT $LN2@wmain
    ; Line 26
    	add	eax, eax
    	cmp	eax, 520				; 00000208H
    	jae	SHORT $LN9@wmain
    $LN2@wmain:
    ; Line 27
    	mov	ecx, DWORD PTR __$ArrayPad$[ebp]
    	xor	eax, eax
    	xor	ecx, ebp
    	call	@__security_check_cookie@4
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    $LN9@wmain:
    ; Line 26
    	call	___report_rangecheckfailure
    $LN11@wmain:
    $LN8@wmain:
    	int	3
    _wmain	ENDP
    _TEXT	ENDS
    END
    Notice the difference between the single-byte character routine main() and the double-byte character routine wmain(): in the former, the conditional assignment of the terminating NUL character is not removed; in the latter, a superfluous range check with a conditional branch that can never be taken is inserted instead, plus an unreachable call of the external routine __report_rangecheckfailure()!
  4. Generate the assembly listing example25.asm from the source file example25.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tcexample25.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example25.c
  5. Display the assembly listing example25.asm created in step 4.:

    Type example25.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	main
    PUBLIC	wmain
    EXTRN	__report_rangecheckfailure:PROC
    EXTRN	GetModuleFileNameA:PROC
    EXTRN	GetModuleFileNameW:PROC
    EXTRN	__GSHandlerCheck:PROC
    EXTRN	__security_check_cookie:PROC
    EXTRN	__security_cookie:QWORD
    …
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	main
    _TEXT	SEGMENT
    sz$ = 32
    __$ArrayPad$ = 304
    main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example25.c
    ; Line 10
    $LN10:
    	sub	rsp, 328				; 00000148H
    	mov	rax, QWORD PTR __security_cookie
    	xor	rax, rsp
    	mov	QWORD PTR __$ArrayPad$[rsp], rax
    ; Line 12
    	mov	r8d, 260				; 00000104H
    	lea	rdx, QWORD PTR sz$[rsp]
    	xor	ecx, ecx
    	call	GetModuleFileNameA
    ; Line 14
    	cmp	eax, 260				; 00000104H
    	jae	SHORT $LN2@main
    ; Line 15
    	mov	eax, eax
    	cmp	rax, 260				; 00000104H
    	jae	SHORT $LN9@main
    	mov	BYTE PTR sz$[rsp+rax], 0
    $LN2@main:
    ; Line 16
    	xor	eax, eax
    	mov	rcx, QWORD PTR __$ArrayPad$[rsp]
    	xor	rcx, rsp
    	call	__security_check_cookie
    	add	rsp, 328				; 00000148H
    	ret	0
    $LN9@main:
    ; Line 15
    	call	__report_rangecheckfailure
    	int	3
    $LN8@main:
    main	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	wmain
    _TEXT	SEGMENT
    sz$ = 32
    __$ArrayPad$ = 560
    wmain	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example25.c
    ; Line 21
    $LN11:
    	sub	rsp, 584				; 00000248H
    	mov	rax, QWORD PTR __security_cookie
    	xor	rax, rsp
    	mov	QWORD PTR __$ArrayPad$[rsp], rax
    ; Line 23
    	mov	r8d, 260				; 00000104H
    	lea	rdx, QWORD PTR sz$[rsp]
    	xor	ecx, ecx
    	call	GetModuleFileNameW
    ; Line 25
    	cmp	eax, 260				; 00000104H
    	jae	SHORT $LN2@wmain
    ; Line 26
    	mov	eax, eax
    	add	rax, rax
    	cmp	rax, 520				; 00000208H
    	jae	SHORT $LN9@wmain
    $LN2@wmain:
    ; Line 27
    	xor	eax, eax
    	mov	rcx, QWORD PTR __$ArrayPad$[rsp]
    	xor	rcx, rsp
    	call	__security_check_cookie
    	add	rsp, 584				; 00000248H
    	ret	0
    $LN9@wmain:
    ; Line 26
    	call	__report_rangecheckfailure
    	int	3
    $LN8@wmain:
    wmain	ENDP
    _TEXT	ENDS
    END
    Notice the superfluous range checks with conditional branches that can never be taken, plus the unreachable calls of the external routine __report_rangecheckfailure()!
    Also notice that the conditional assignment of the terminating NUL character is not removed in the single-byte character routine main().

Contact

If you miss anything here, have additions, comments, corrections, criticism or questions, want to give feedback, hints or tipps, report broken links, bugs, errors, inaccuracies, omissions, vulnerabilities or weaknesses, …:
don’t hesitate to contact me and feel free to ask, comment, criticise, flame, notify or report!

Notes: I dislike HTML (and even weirder formats too) in email, I prefer to receive plain text.
I also expect to see your full (real) name as sender, not your nickname!
Emails in weird formats and without a proper sender name are likely to be discarded.
I abhor top posts and expect inline quotes in replies.

Terms and Conditions

By using this site, you signify your agreement to these terms and conditions. If you do not agree to these terms and conditions, do not use this site!

Data Protection Declaration

This web page records no data and sets no cookies.

The service provider for *.homepage.t-online.de, Deutsche Telekom AG,


Copyright © 1995–2019 • Stefan Kanthak • <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>