Valid HTML 4.01 Transitional Valid CSS Valid SVG 1.0

Me, myself & IT

Optimising Microsoft® Visual C compilers

Purpose

Demonstrate poor and unoptimised or wrong code generation of Microsoft’s (not quite so) optimising Visual C compilers, with (currently) 19 examples (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18), plus a side note on the implementation of the arithmetic division and multiplication routines for 64-bit operands on the I386 alias x86 processor architecture.

The most hilarious examples are 0 and 1, the bug is shown in example 9, while the examples 5 to 7, 11, 14 and 15 present the worst cases.

Side note

On the I386 alias x86 processor architecture, 64-bit arithmetic division and multiplication operations are implemented via calls of several (almost) undocumented helper routines: _alldiv(), _allrem() and _alldvrm() for division of two signed 64-bit operands, returning a signed 64-bit quotient, remainder or both, _aulldiv(), _aullrem() and _aulldvrm() for division of two unsigned 64-bit operands, returning an unsigned 64-bit quotient, remainder or both, plus _aullmul() for multiplication of two signed as well as unsigned 64-bit operands, returning the (un)signed product modulo 264.
Additionally, 64-bit shift operations are implemented via calls of some more undocumented helper routines: _allshl() for both a signed or an unsigned 64-bit operand, _allshr() for a signed 64-bit operand, plus _aullshr() for an unsigned 64-bit operand.

Note: all helper routines use non-standard calling or naming convention, none of them can be called from C/C++ code!

Especially the implementation of the division routines (albeit written in assembler) is very poor: they are about 10× to 15× slower than native 64-bit division operations on the same processor, and about 5× to 10× slower than properly optimised code.

Note: according to comments in their source code, the initial version was written for 32-bit operands on 16-bit Intel processors on November 29, 1983; they were modified for 64-bit operands on 32-bit Intel processors on November 19, 1993, without taking advantage of the 32-bit processor’s (introduced October 1985) new capabilities: the loop with SHR and RCR instructions, which shifts the operands by just one bit per pass, was not replaced with a BSR followed by two pairs of SHLD and SHL or SHRD and SHR instructions to shift the operands in one go.

Measured on an Intel® processor of the Core2 family running under Windows® PE, dividing 10 billion pairs of 64-bit pseudo-random numbers produced by 5 different (deterministic random bit) generators, _aulldiv() and _aullrem() use from 130 to 144 processor clock cycles per call; the routines provided with my own NOMSVCRT.LIB use from 24 to 35 processor clock cycles per call, while the native 64-bit machine instructions use from 10 to 16 processor clock cycles per operation.

For comparision: the (corresponding) __divdi3(), __moddi3(), __udivdi3() and __umoddi3() routines from the builtins library of LLVM’s compiler-rt runtime libraries, originally written in December 2008 by Apple’s Stephen Canon, adapted for the Visual C compiler and improved by me, use from 34 to 50 processor clock cycles per call.

Caveat: these fast __divdi3(), __moddi3(), __udivdi3() and __umoddi3() routines written in assembler are not shipped with the packages of LLVM for Windows, but (slower and bigger) routines written in C.

Note: even this C implementation is faster than Microsoft’s assembler implementation!

Caveat: the _lldiv(), _llrem(), _ulldiv() and _ullrem() routines published by AMD® in their Software Optimization Guide for AMD Family 15h Processors, Publication No. 47414, Revision 3.06, January 2012, Software Optimization Guide for AMD Family 10h and 12h Processors, Publication No. 40546, Revision 3.13, February 2011, Software Optimization Guide for AMD Family 10h and 12h Processors, Publication No. 40546, Revision 3.10, February 2009, Software Optimization Guide for AMD64 Processors, Publication No. 25112, Revision 3.06, September 2005, Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors, Publication No. 25112, Revision 3.04, March 2004, Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors, Publication No. 25112, Revision 3.03, September 2003, AMD Athlon Processor x86 Code Optimization Guide, Publication No. 22007, Revision K, February 2002, have bugs and return wrong results; for example, unsigned division of 18446744073709551615÷4294967299 yields the quotient 4294967294 instead of 4294967293 (in other notation (264−1)÷(232+3)=232−2 instead of 232−3), and the remainder 18446744069414584325 instead of 8.

Execution times

Measurements are performed using 5 runs of 64-bit pseudo-random numbers, each produced by a different (deterministic random bit) generator, with 1 billion divisions per run returning the quotient and 1 billion divisions per run returning the remainder, totalling in 10 billion divisions.

The table shows the execution times of 64-bit division routines from different libraries on several processors, in average ⁄ minimum – maximum processor clock cycles per run for a call, as well as their code sizes in bytes and number of instructions; the upper half for the routines written in assembler, the lower half for the native 64-bit hardware and the routines written in C.

NOMSVCRT.LIB LLVM Compiler-RT Microsoft
_aulldiv()_aullrem() _aulldiv()_aullrem() _aulldiv()_aullrem()
4242 52 [66]53 [71] 4243 Instructions
105119 130 [157]137 [172] 102115 Bytes
AMD® Ryzen 7 2700X 10 ⁄ 8 – 1312 ⁄ 9 – 14 16 ⁄ 10 – 1816 ⁄ 10 – 18 58 ⁄ 53 – 6158 ⁄ 54 – 60
Intel Core i5-7400 18 ⁄ 10 – 3720 ⁄ 13 – 38 24 ⁄ 13 – 4524 ⁄ 13 – 45 136 ⁄ 129 – 141140 ⁄ 132 – 148
Intel Core i5-6600 20 ⁄ 11 – 4123 ⁄ 14 – 42 27 ⁄ 15 – 5027 ⁄ 14 – 50 140 ⁄ 122 – 149145 ⁄ 126 – 151
Intel Core i5-4670 19 ⁄ 8 – 3622 ⁄ 12 – 37 28 ⁄ 19 – 4727 ⁄ 18 – 47 131 ⁄ 115 – 142134 ⁄ 114 – 147
Intel Core i5-3550 ⁄ 15 – 20 ⁄ 20 – 26 ⁄ 19 – 26 ⁄ 19 – 25 ⁄ 114 – 136 ⁄ 118 – 140
Intel Core i3-2328M ⁄ 21 – 22 ⁄ 24 – 26 ⁄ 23 – 35 ⁄ 22 – 35 ⁄ 129 – 158 ⁄ 142 – 164
Intel® Core2 P8700 21 ⁄ 17 – 2526 ⁄ 23 – 31 33 ⁄ 28 – 4034 ⁄ 28 – 41 117 ⁄ 114 – 121122 ⁄ 117 – 125
Intel® Core2 E8500 ⁄ 24 – 35 ⁄ 33 – 35 ⁄ 34 – 50 ⁄ 34 – 48 ⁄ 130 – 132 ⁄ 134 – 144

Note: the values in brackets are for the original, not improved __udivdi3() and __umoddi3() routines written in assembler.

Native LLVM Compiler-RT Microsoft
DIVREM __udivdi3()__umoddi3() __udivdi3()__umoddi3()
11 8 + 25433 + 254 14 + 26417 + 264 Instructions
33 27 + 73279 + 732 30 + 63138 + 631 Bytes
AMD® Ryzen 7 2700X 7 ⁄ 5 – 97 ⁄ 5 – 9 51 ⁄ 35 – 6355 ⁄ 40 – 63 60 ⁄ 36 – 7460 ⁄ 39 – 77
Intel Core i5-7400 18 ⁄ 16 – 1918 ⁄ 16 – 20 45 ⁄ 30 – 5251 ⁄ 36 – 59 58 ⁄ 33 – 7258 ⁄ 33 – 72
Intel Core i5-6600 20 ⁄ 18 – 2120 ⁄ 18 – 22 49 ⁄ 33 – 5856 ⁄ 39 – 65 64 ⁄ 37 – 8063 ⁄ 37 – 79
Intel Core i5-4670 21 ⁄ 20 – 2422 ⁄ 20 – 24 47 ⁄ 36 – 5456 ⁄ 43 – 63 60 ⁄ 37 – 7360 ⁄ 39 – 70
Intel Core i5-3550 ⁄ 20 – 24 ⁄ 20 – 25 ⁄ 33 – 53 ⁄ 41 – 61
Intel Core i3-2328M ⁄ 21 – 26 ⁄ 23 – 25 ⁄ 45 – 65 ⁄ 56 – 74
Intel® Core2 P8700 16 ⁄ 10 – 2017 ⁄ 10 – 20 58 ⁄ 47 – 6870 ⁄ 55 – 80 68 ⁄ 48 – 7971 ⁄ 51 – 83
Intel® Core2 E8500

Note: optimising the __udivmoddi4() routine, which is called from both __udivdi3() and __umoddi3(), for speed, Microsoft’s Visual C 2010 SP1 compiler emits 10 instructions more than LLVM’s clang compiler, counting but 101 bytes less!

Example 0

According to their documentation on MSDN, the macros Int32x32To64 and UInt32x32To64 defined in the header file WINNT.H of the Windows SDK (are supposed to) generate just a single multiply instruction:
Multiplies two signed 32-bit integers, returning a signed 64-bit integer result. The function performs optimally on 32-bit Windows.

This function is implemented on all platforms by optimal inline code: a single multiply instruction that returns a 64-bit result.
Multiplies two unsigned 32-bit integers, returning an unsigned 64-bit integer result. The function performs optimally on 32-bit Windows.

This function is implemented on all platforms by optimal inline code: a single multiply instruction that returns a 64-bit result.
Contrary to these stateadvertisements, the 32-bit Visual C compilers but generate a call to the external routine _allmul() instead of the single multiply instruction!

Note: _allmul() is an undocumented helper routine for the 32-bit compiler which multiplies two 64-bit integers, similar to the (sort of documented) _alldiv() and _aulldiv() helper routines.

Demonstration

  1. Create the text file example0.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2004-2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define Int32x32To64(a, b)  ((long long)(((long long)((long)(a))) * ((long)(b))))
    #define UInt32x32To64(a, b) ((unsigned long long)(((unsigned long long)((unsigned int)(a))) * ((unsigned int)(b))))
    
    int main(int argc)
    {
        long long x = argc * -argc;
        long long y = Int32x32To64(argc, -argc);
        long long z = UInt32x32To64(argc, -argc);
    }
    
  2. Generate the assembly listing example0.asm from the source file example0.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Tcexample0.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example0.c
    
  3. Display the assembly listing example0.asm created in step 2.:

    Type example0.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example0.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    EXTRN	__allmul:PROC
    
    ; Function compile flags: /Odtp
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _z$ = -24						; size = 8
    _y$ = -16						; size = 8
    _x$ = -8						; size = 8
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example0.c
    ; Line 7
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 24					; 00000018H
    	push	esi
    ; Line 8
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	imul	DWORD PTR _argc$[ebp]
    	imul	eax, DWORD PTR _argc$[ebp]
    	cdq
    	mov	DWORD PTR _x$[ebp], eax
    	mov	DWORD PTR _x$[ebp+4], edx
    ; Line 9
    	mov	eax, DWORD PTR _argc$[ebp]
    	cdq
    	mov	ecx, eax
    	mov	esi, edx
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	imul	DWORD PTR _argc$[ebp]
    	cdq
    	push	edx
    	push	eax
    	push	esi
    	push	ecx
    	call	__allmul
    	mov	DWORD PTR _y$[ebp], eax
    	mov	DWORD PTR _y$[ebp+4], edx
    ; Line 10
    	mov	edx, DWORD PTR _argc$[ebp]
    	neg	edx
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	mul	DWORD PTR _argc$[ebp]
    	mul	edx
    	mov	DWORD PTR _z$[ebp], eax
    	mov	DWORD PTR _z$[ebp+4], edx
    ; Line 11
    	xor	eax, eax
    	pop	esi
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    
    Notice the difference between unsigned and signed multiplication: while a single multiply instruction is generated for the former, a call of the external routine _allmul() is generated for the latter!
Note: for a real life example where such unoptimised code is generated, see the MSDN article Converting a time_t Value to a File Time and the MSKB article 167296!

Fix

Both macros should have been replaced a long time ago by the intrinsic functions __emul() and __emulu() introduced with the Visual C 2005 compiler!
#if _MSC_VER < 1400
#define Int32x32To64(a, b)  ((long long)(((long long)((long)(a))) * ((long)(b))))
#define UInt32x32To64(a, b) ((unsigned long long)(((unsigned long long)((unsigned int)(a))) * ((unsigned int)(b))))
#else
         long long __emul(int, int);
unsigned long long __emulu(unsigned int, unsigned int);
#pragma intrinsic(__emul, __emulu)
#define Int32x32To64  __emul
#define UInt32x32To64 __emulu
#endif
Note: of course this also applies to the macros (really: inline assembler functions) Int64ShllMod32(), Int64ShraMod32() and Int64ShrlMod32() defined in the header file WINNT.H of the Windows SDK; these too should have been replaced a long time ago by the intrinsic functions __ll_lshift(), __ll_rshift() and __ull_rshift() introduced with the Visual C 2005 compiler!
#if _MSC_VER < 1400
…
#else
unsigned long long __ll_lshift(unsigned long long, int);
         long long __ll_rshift(long long, int);
unsigned long long __ull_rshift(unsigned long long, int);
#pragma intrinsic(__ll_lshift, __ll_rshift, __ull_rshift)
#define Int64ShllMod32 __ll_lshift
#define Int64ShraMod32 __ll_rshift
#define Int64ShrlMod32 __ull_rshift
#endif
Note: the sample code for converting from seconds since January 1, 1970, to 100 nano-seconds since January 1, 1601, should be written without macros and intrinsic functions.
#include <windows.h>

VOID EpochToFileTime(ULONG seconds, LPFILETIME pft)
{
    ULONGLONG ull = seconds * 10000000ULL + 116444736000000000ULL;
    pft->dwLowDateTime = ull;
    pft->dwHighDateTime = ull >> 32;
}

Example 1

Demonstration

  1. Create the text file example1.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __inline
    long long __fastcall Int32x32To64(long v, long w)
    {
        return (long long) v * w;
    }
    
    long __fastcall Int32x32To64Div32(long x, long y, long z)
    {
        return Int32x32To64(x, y) / z;
    }
    
    long __fastcall Int32x32To64Rem32(long x, long y, long z)
    {
        return Int32x32To64(x, y) % z;
    }
    
    __inline
    unsigned long long __fastcall UInt32x32To64(unsigned long v, unsigned long w)
    {
        return (unsigned long long) v * w;
    }
    
    unsigned long __fastcall UInt32x32To64Div32(unsigned long x, unsigned long y, unsigned long z)
    {
        return UInt32x32To64(x, y) / z;
    }
    
    unsigned long __fastcall UInt32x32To64Rem32(unsigned long x, unsigned long y, unsigned long z)
    {
        return UInt32x32To64(x, y) % z;
    }
    
  2. Generate the assembly listing example1.asm from the source file example1.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample1.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example1.c
    example1.c(11): warning C4244: 'return' : conversion from '__int64' to 'long', possible loss of data
    example1.c(27): warning C4244: 'return' : conversion from 'unsigned __int64' to 'unsigned long', possible loss of data
    
  3. Display the assembly listing example1.asm created in step 2.:

    Type example1.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example1.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@Int32x32To64@8
    PUBLIC	@Int32x32To64Div32@12
    PUBLIC	@Int32x32To64Rem32@12
    PUBLIC	@UInt32x32To64@8
    PUBLIC	@UInt32x32To64Div32@12
    PUBLIC	@UInt32x32To64Rem32@12
    EXTRN	__alldiv:PROC
    EXTRN	__allrem:PROC
    EXTRN	__aulldiv:PROC
    EXTRN	__aullrem:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64@8
    _TEXT	SEGMENT
    @Int32x32To64@8 PROC					; COMDAT
    ; _v$ = ecx
    ; _w$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 7
    	ret	0
    @Int32x32To64@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64Div32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @Int32x32To64Div32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 10
    	push	esi
    ; Line 6
    	mov	esi, eax
    	mov	ecx, edx
    ; Line 11
    	idiv	DWORD PTR _z$[esp-4]
    	mov	eax, DWORD PTR _z$[esp]
    	cdq
    	push	edx
    	push	eax
    	push	ecx
    	push	esi
    	call	__alldiv
    	pop	esi
    ; Line 12
    	ret	4
    @Int32x32To64Div32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64Rem32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @Int32x32To64Rem32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 15
    	push	esi
    ; Line 6
    	mov	esi, eax
    	mov	ecx, edx
    ; Line 16
    	idiv	DWORD PTR _z$[esp-4]
    	mov	eax, edx
    	mov	eax, DWORD PTR _z$[esp]
    	cdq
    	push	edx
    	push	eax
    	push	ecx
    	push	esi
    	call	__allrem
    	pop	esi
    ; Line 17
    	ret	0
    @Int32x32To64Rem32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64@8
    _TEXT	SEGMENT
    @UInt32x32To64@8 PROC					; COMDAT
    ; _v$ = ecx
    ; _w$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 23
    	ret	0
    @UInt32x32To64@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64Div32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @UInt32x32To64Div32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 27
    	div	DWORD PTR _z$[esp-4]
    	push	0
    	push	DWORD PTR _z$[esp]
    	push	edx
    	push	eax
    	call	__aulldiv
    ; Line 28
    	ret	0
    @UInt32x32To64Div32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64Rem32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @UInt32x32To64Rem32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\example1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 32
    	div	DWORD PTR _z$[esp-4]
    	mov	eax, edx
    	push	0
    	push	DWORD PTR _z$[esp]
    	push	edx
    	push	eax
    	call	__aullrem
    ; Line 33
    	ret	0
    @UInt32x32To64Rem32@12 ENDP
    _TEXT	ENDS
    END
    
    While the compiler here (contrary to example 0) generates the optimal code for the multiplications, it but fails to generate the corresponding optimal code for the immediately following divisions.

    Also notice the difference between the signed and unsigned variants of the combined multiplication and division routines: instead to push the (properly sign-extended) divisor first and the product afterwards, the product is computed first, then moved into two (intermediate) registers which are finally pushed for the calls of the _alldiv(), _allrem(), _aulldiv() and _aullrem() helper routines.

Example 2

Demonstration

  1. Create the text file example2.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    extern long long foo(void);
    extern long long bar(void);
    
    long long product(void)
    {
        return foo() * bar();
    }
    
  2. Generate the assembly listing example2.asm from the source file example2.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample2.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example2.c
    
  3. Display the assembly listing example2.asm created in step 2.:

    Type example2.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example2.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_product
    EXTRN	_foo:PROC
    EXTRN	_bar:PROC
    EXTRN	__allmul:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_product
    _TEXT	SEGMENT
    _product PROC						; COMDAT
    ; File c:\users\stefan\desktop\example2.c
    ; Line 7
    	push	esi
    	push	edi
    ; Line 8
    	call	_foo
    	mov	edi, eax
    	mov	esi, edx
    	push	edx
    	push	eax
    	call	_bar
    	push	edx
    	push	eax
    	push	esi
    	push	edi
    	call	__allmul
    	pop	edi
    	pop	esi
    ; Line 9
    	ret	0
    _product ENDP
    _TEXT	ENDS
    END
    
    Multiplication is commutative, so the arguments for the external routine _allmul() can be swapped, saving 6 of the 13 instructions generated, and without clobbering the registers EDI and ESI for intermediate storage.

Example 3

Demonstration

  1. Create the text file example3.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long __stdcall div(long long foo, long long bar)
    {
        return foo / bar;
    }
    
    long long __stdcall mod(long long foo, long long bar)
    {
        return foo % bar;
    }
    
    long long __stdcall mul(long long foo, long long bar)
    {
        return foo * bar;
    }
    
  2. Generate the assembly listing example3.asm from the source file example3.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample3.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example3.c
    
  3. Display the assembly listing example3.asm created in step 2.:

    Type example3.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example3.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC  _div@16
    PUBLIC  _mod@16
    PUBLIC  _mul@16
    EXTRN   __alldiv:PROC
    EXTRN   __allmul:PROC
    EXTRN   __allrem:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_div@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _div@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example3.c
    ; Line 5
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _foo$[esp+8]
    	push    DWORD PTR _foo$[esp+8]
    	call	__alldiv
    	jmp	__alldiv
    ; Line 6
    	ret	16					; 00000010H
    _div@16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mod@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _mod@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example3.c
    ; Line 10
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _foo$[esp+8]
    	push    DWORD PTR _foo$[esp+8]
    	call	__allrem
    	jmp	__allrem
    ; Line 11
    	ret	16					; 00000010H
    _mod@16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mul@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _mul@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example3.c
    ; Line 15
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _bar$[esp]
    	push    DWORD PTR _foo$[esp+8]
    	push    DWORD PTR _foo$[esp+8]
    	call	__allmul
    	jmp	__allmul
    ; Line 16
    	ret	16					; 00000010H
    _mul@16	ENDP
    _TEXT	ENDS
    END
    

Example 4

Demonstration

  1. Create the text file example4.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long foo(long long foo)
    {
        foo <<= 1;
        foo += 1;
        foo |= 1;
    
        return foo;
    }
    
    long long bar(long long bar)
    {
        bar += bar;
        bar += 1;
        bar |= 1;
    
        return bar;
    }
    
  2. Generate the assembly listing example4.asm from the source file example4.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample4.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example4.c
    
  3. Display the assembly listing example4.asm created in step 2.:

    Type example4.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example4.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_foo
    PUBLIC	_bar
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_foo
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _foo	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 5
    	mov	eax, DWORD PTR _foo$[esp-4]
    	mov	edx, DWORD PTR _foo$[esp]
    	shld	edx, eax, 1
    	add	eax, eax
    	adc	edx, edx
    ; Line 6
    	add	eax, 1
    	adc	edx, 0
    	inc	eax
    ; Line 10
    	ret	0
    _foo	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_bar
    _TEXT	SEGMENT
    _bar$ = 8						; size = 8
    _bar	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example4.c
    ; Line 14
    	mov	eax, DWORD PTR _bar$[esp-4]
    	mov	edx, DWORD PTR _bar$[esp]
    	shld	edx, eax, 1
    	add	eax, eax
    	adc	edx, edx
    ; Line 15
    	add	eax, 1
    	adc	edx, 0
    	inc	eax
    ; Line 19
    	ret	0
    _foo	ENDP
    _TEXT	ENDS
    END
    
    While the optimiser recognises that the addition of 1 yields an odd number and therefore generates no code for the logical or, it but fails to recognise that both the shift of foo and the addition of bar to itself yield an even number, so the following addition of 1 can’t produce a carry, and an addition with carry ADC instruction is useless!

Example 5

Demonstration

  1. Create the text file example5.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long rotate32(unsigned long value, unsigned int count)
    {
        return (value << count) | (value >> (32 - count));
    }
    
    unsigned long rotate32x(unsigned long value, unsigned int count)
    {
        return (value << count) ^ (value >> (32 - count));
    }
    
    unsigned long long rotate64x(unsigned long long value, unsigned int count)
    {
        return (value << count) ^ (value >> (64 - count));
    }
    
    unsigned long long rotate64(unsigned long long value, unsigned int count)
    {
        return (value << count) | (value >> (64 - count));
    }
    
    unsigned long long intrinsic(unsigned long long value, unsigned int count)
    {
        return _rotl64(value, count);
    }
    
  2. Generate the assembly listing example5.asm from the source file example5.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample5.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example5.c
    
  3. Display the assembly listing example5.asm created in step 2.:

    Type example5.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example5.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_rotate32
    PUBLIC	_rotate32x
    PUBLIC	_rotate64x
    PUBLIC	_rotate64
    PUBLIC	_intrinsic
    EXTRN	__allshl:PROC
    EXTRN	__aullshr:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate32
    _TEXT	SEGMENT
    _value$ = 8						; size = 4
    _count$ = 12						; size = 4
    _rotate32 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example5.c
    ; Line 5
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, DWORD PTR _count$[esp-4]
    	rol	eax, cl
    ; Line 6
    	ret	0
    _rotate32 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate32x
    _TEXT	SEGMENT
    _value$ = 8						; size = 4
    _count$ = 12						; size = 4
    _rotate32x PROC						; COMDAT
    ; File c:\users\stefan\desktop\example5.c
    ; Line 9
    	push	esi
    ; Line 10
    	mov	esi, DWORD PTR _value$[esp]
    	mov	ecx, 32					; 00000020H
    	sub	ecx, DWORD PTR _count$[esp]
    	mov	eax, esi
    	shr	eax, cl
    	mov	ecx, DWORD PTR _count$[esp]
    	shl	esi, cl
    	xor	eax, esi
    	pop	esi
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, DWORD PTR _count$[esp-4]
    	rol	eax, cl
    ; Line 11
    	ret	0
    _rotate32x ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate64x
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _rotate64x PROC						; COMDAT
    ; File c:\users\stefan\desktop\example5.c
    ; Line 15
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, 64					; 00000040H
    	sub	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	ebx
    	push	ebp
    	call	__aullshr
    	mov	ecx, DWORD PTR _count$[esp+4]
    	mov	ebx, eax
    	mov	eax, DWORD PTR _value$[esp+4]
    	mov	ebp, edx
    	mov	edx, DWORD PTR _value$[esp+8]
    	call	__allshl
    	xor	edx, ebp
    	xor	eax, ebx
    	pop	ebp
    	mov	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp-4]
    	test	cl, 32					; 00000020H
    	jz	SHORT @F
    	xchg	eax, edx
    @@:
    	test	cl, 31					; 0000001fH
    	jz	SHORT @F
    	push	ebx
    	mov	ebx, edx
    	shld	edx, eax, cl
    	shld	eax, ebx, cl
    	pop	ebx
    @@:
    ; Line 16
    	ret	0
    _rotate64x ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate64
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _rotate64 PROC						; COMDAT
    ; File c:\users\stefan\desktop\example5.c
    ; Line 20
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, 64					; 00000040H
    	sub	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	ebx
    	push	ebp
    	call	__aullshr
    	mov	ecx, DWORD PTR _count$[esp+4]
    	mov	ebx, eax
    	mov	eax, DWORD PTR _value$[esp+4]
    	mov	ebp, edx
    	mov	edx, DWORD PTR _value$[esp+8]
    	call	__allshl
    	or	edx, ebp
    	or	eax, ebx
    	pop	ebp
    	mov	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp-4]
    	test	cl, 32					; 00000020H
    	jz	SHORT @F
    	xchg	eax, edx
    @@:
    	test	cl, 31					; 0000001fH
    	jz	SHORT @F
    	push	ebx
    	mov	ebx, edx
    	shld	edx, eax, cl
    	shld	eax, ebx, cl
    	pop	ebx
    @@:
    ; Line 21
    	ret	0
    _rotate64 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_intrinsic
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _intrinsic PROC						; COMDAT
    ; File c:\users\stefan\desktop\example5.c
    ; Line 25
    	mov	cl, BYTE PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	esi
    	mov	esi, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp]
    	mov	esi, edx
    	test	cl, 32					; 00000020H
    	cmovnz	edx, eax
    	cmovnz	eax, esi
    	cmovnz	esi, edx
    	je	SHORT $LN3@intrinsic
    	mov	eax, esi
    	mov	esi, edx
    	mov	edx, eax
    $LN3@intrinsic:
    	mov	eax, esi
    	and	cl, 31					; 0000001fH
    	je	SHORT $LN4@intrinsic
    	shld	eax, edx, cl
    	shld	edx, esi, cl
    $LN4@intrinsic:
    ; Line 26
    	pop	esi
    	ret	0
    _intrinsic ENDP
    _TEXT	ENDS
    END
    
    Except for the first function, the optimiser fails to recognise the commonly used expressions for rotate operations!
    Also notice the unoptimised code generated for (not only swapping the register EDX with ESI in) the intrinsic function _rotl64().

Example 6

Horrible load of code generated for swapping the bytes of a 64-bit operand instead of a single BSWAP instruction or two MOVBE instructions.

Demonstration

  1. Create the text file example6.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #ifdef ALTERNATE
    unsigned short swap16(unsigned short us)
    {
        return ((us & 0xFF00U) >> 8)
             | ((us & 0x00FFU) << 8);
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return ((ul & 0xFF000000UL) >> 3 * 8)
             | ((ul & 0x00FF0000UL) >>     8)
             | ((ul & 0x0000FF00UL) <<     8)
             | ((ul & 0x000000FFUL) << 3 * 8);
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return ((ull & 0xFF00000000000000ULL) >> 7 * 8)
             | ((ull & 0x00FF000000000000ULL) >> 5 * 8)
             | ((ull & 0x0000FF0000000000ULL) >> 3 * 8)
             | ((ull & 0x000000FF00000000ULL) >>     8)
             | ((ull & 0x00000000FF000000ULL) <<     8)
             | ((ull & 0x0000000000FF0000ULL) << 3 * 8)
             | ((ull & 0x000000000000FF00ULL) << 5 * 8)
             | ((ull & 0x00000000000000FFULL) << 7 * 8);
    }
    #else
    unsigned short swap16(unsigned short us)
    {
        return (us << 8) & 0xFF00U
             | (us >> 8) & 0x00FFU;
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return (ul << 3 * 8) & 0xFF000000UL
             | (ul <<     8) & 0x00FF0000UL
             | (ul >>     8) & 0x0000FF00UL
             | (ul >> 3 * 8) & 0x000000FFUL;
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return (ull << 7 * 8) & 0xFF00000000000000ULL
             | (ull << 5 * 8) & 0x00FF000000000000ULL
             | (ull << 3 * 8) & 0x0000FF0000000000ULL
             | (ull <<     8) & 0x000000FF00000000ULL
             | (ull >>     8) & 0x00000000FF000000ULL
             | (ull >> 3 * 8) & 0x0000000000FF0000ULL
             | (ull >> 5 * 8) & 0x000000000000FF00ULL
             | (ull >> 7 * 8) & 0x00000000000000FFULL;
    }
    #endif
    
    Note: better use the appropriate intrinsic function _byteswap_ushort(), _byteswap_ulong() or _byteswap_uint64() instead of such expressions!
  2. Generate the assembly listing example6.asm from the source file example6.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample6.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example6.c
    
  3. Display the assembly listing example6.asm created in step 2.:

    Type example6.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 32
    	movzx	edx, cx
    	mov	eax, edx
    	shr	edx, 8
    	shl	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 34
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 38
    	bswap	ecx
    	mov	eax, ecx
    ; Line 42
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 45
    	mov	r8, rcx
    ; Line 46
    	bswap	rcx
    	mov	rax, rcx
    	mov	r9, rcx
    	mov	rax, 71776119061217280			; 00ff000000000000H
    	mov	rdx, r8
    	and	r9, rax
    	and	edx, 65280				; 0000ff00H
    	mov	rax, rcx
    	shr	rax, 16
    	or	r9, rax
    	mov	rax, rcx
    	shr	r9, 16
    	mov	rcx, 280375465082880			; 0000ff0000000000H
    	and	rax, rcx
    	mov	rcx, 1095216660480			; 000000ff00000000H
    	or	r9, rax
    	mov	rax, r8
    	and	rax, rcx
    	shr	r9, 16
    	or	r9, rax
    	mov	rcx, r8
    	mov	rax, r8
    	shr	r9, 8
    	shl	rax, 16
    	and	ecx, 16711680				; 00ff0000H
    	or	rdx, rax
    	mov	eax, -16777216				; ff000000H
    	and	rax, r8
    	shl	rdx, 16
    	or	rdx, rcx
    	shl	rdx, 16
    	or	rax, rdx
    	shl	rax, 8
    	or	rax, r9
    ; Line 54
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    END
    
    Note: the assembly listing shows 32 (in words: thirty-two) instructions for the function swap64() instead of only a single (in words: one) BSWAP instruction!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  4. Generate another assembly listing example6.asm from the source file example6.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample6.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example6.c
    
  5. Display the assembly listing example6.asm created in step 4.:

    Type example6.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example6.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 32
    	movbe	ax, WORD PTR _us$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 34
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 38
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	eax, DWORD PTR _ul$[esp-4]
    	bswap	eax
    ; Line 42
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 46
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	movbe	eax, DWORD PTR _ull$[esp]
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	push	edi
    	mov	edi, DWORD PTR _ull$[esp+12]
    	mov	ebx, edx
    	and	ebx, 16711680				; 00ff0000H
    	mov	eax, edi
    	shrd	eax, ecx, 16
    	xor	ebp, ebp
    	mov	esi, edi
    	or	ebp, eax
    	shr	ecx, 16					; 00000010H
    	or	ebx, ecx
    	mov	eax, edx
    	shrd	ebp, ebx, 16
    	and	eax, 65280				; 0000ff00H
    	and	esi, 65280				; 0000ff00H
    	shr	ebx, 16					; 00000010H
    	xor	ecx, ecx
    	or	ebx, eax
    	movzx	eax, dl
    	shrd	ebp, ebx, 16
    	shr	ebx, 16					; 00000010H
    	or	ebx, eax
    	mov	eax, edi
    	shld	edx, eax, 16
    	shrd	ebp, ebx, 8
    	shl	eax, 16					; 00000010H
    	or	edx, ecx
    	or	esi, eax
    	shr	ebx, 8
    	shld	edx, esi, 16
    	mov	eax, edi
    	and	edi, -16777216				; ff000000H
    	shl	esi, 16					; 00000010H
    	and	eax, 16711680				; 00ff0000H
    	or	esi, eax
    	shld	edx, esi, 16
    	shl	esi, 16					; 00000010H
    	or	esi, edi
    	shld	edx, esi, 8
    	pop	edi
    	shl	esi, 8
    	or	edx, ebx
    	or	ebp, esi
    	pop	esi
    	mov	eax, ebp
    	pop	ebp
    	pop	ebx
    ; Line 54
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    END
    
    Note: the assembly listing shows 52 (in words: fifty-two) instructions for the function swap64() instead of only 2 (in words: two) MOVBE instructions!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  6. Repeat the previous steps with the alternate implementation; generate the assembly listing example6.asm from the source file example6.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture, with the macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tcexample6.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example6.c
    
  7. Display the assembly listing example6.asm created in step 6.:

    Type example6.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 6
    	movzx	edx, cx
    	mov	eax, edx
    	shl	dx, 8
    	shr	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 8
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 12
    	bswap	ecx
    	mov	eax, ecx
    ; Line 16
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 19
    	mov	r8, rcx
    ; Line 20
    	bswap	rcx
    	mov	rax, rcx
    	mov	r9, rcx
    	mov	rax, 71776119061217280			; 00ff000000000000H
    	mov	rdx, r8
    	and	r9, rax
    	and	edx, 65280				; 0000ff00H
    	mov	rax, rcx
    	shr	rax, 16
    	or	r9, rax
    	mov	rax, rcx
    	shr	r9, 16
    	mov	rcx, 280375465082880			; 0000ff0000000000H
    	and	rax, rcx
    	mov	rcx, 1095216660480			; 000000ff00000000H
    	or	r9, rax
    	mov	rax, r8
    	and	rax, rcx
    	shr	r9, 16
    	or	r9, rax
    	mov	rcx, r8
    	mov	rax, r8
    	shr	r9, 8
    	shl	rax, 16
    	and	ecx, 16711680				; 00ff0000H
    	or	rdx, rax
    	mov	eax, -16777216				; ff000000H
    	and	rax, r8
    	shl	rdx, 16
    	or	rdx, rcx
    	shl	rdx, 16
    	or	rax, rdx
    	shl	rax, 8
    	or	rax, r9
    ; Line 28
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    END
    
    Note: the assembly listing shows 32 (in words: thirty-two) instructions for the function swap64() instead of only a single (in words: one) BSWAP instruction!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand in an alternate form too and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  8. Generate another assembly listing example6.asm from the source file example6.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tcexample6.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example6.c
    
  9. Display the assembly listing example6.asm created in step 8.:

    Type example6.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example6.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 6
    	movbe	ax, DWORD PTR _ul$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 8
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 12
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	eax, DWORD PTR _ul$[esp-4]
    	bswap	eax
    ; Line 16
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example6.c
    ; Line 20
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	movbe	eax, DWORD PTR _ull$[esp]
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	push	edi
    	mov	edi, DWORD PTR _ull$[esp+12]
    	mov	ebx, edx
    	and	ebx, 16711680				; 00ff0000H
    	mov	eax, edi
    	shrd	eax, ecx, 16
    	xor	ebp, ebp
    	mov	esi, edi
    	or	ebp, eax
    	shr	ecx, 16					; 00000010H
    	or	ebx, ecx
    	mov	eax, edx
    	shrd	ebp, ebx, 16
    	and	eax, 65280				; 0000ff00H
    	and	esi, 65280				; 0000ff00H
    	shr	ebx, 16					; 00000010H
    	xor	ecx, ecx
    	or	ebx, eax
    	movzx	eax, dl
    	shrd	ebp, ebx, 16
    	shr	ebx, 16					; 00000010H
    	or	ebx, eax
    	mov	eax, edi
    	shld	edx, eax, 16
    	shrd	ebp, ebx, 8
    	shl	eax, 16					; 00000010H
    	or	edx, ecx
    	or	esi, eax
    	shr	ebx, 8
    	shld	edx, esi, 16
    	mov	eax, edi
    	and	edi, -16777216				; ff000000H
    	shl	esi, 16					; 00000010H
    	and	eax, 16711680				; 00ff0000H
    	or	esi, eax
    	shld	edx, esi, 16
    	shl	esi, 16					; 00000010H
    	or	esi, edi
    	shld	edx, esi, 8
    	pop	edi
    	shl	esi, 8
    	or	edx, ebx
    	or	ebp, esi
    	pop	esi
    	mov	eax, ebp
    	pop	ebp
    	pop	ebx
    ; Line 28
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    END
    
    Note: the assembly listing shows 52 (in words: fifty-two) instructions for the function swap64() instead of only 2 (in words: two) MOVBE instructions!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand in an alternate form too and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

Example 7

Awful load of code generated for swapping the bytes of a 32-bit or 64-bit operand instead of BSWAP or MOVBE instructions.

Demonstration

  1. Create the text file example7.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long swap32rot(unsigned long ul)
    {
        return _lrotr(ul, 8) & 0xFF00FF00UL
             | _lrotl(ul, 8) & 0x00FF00FFUL;
    }
    
    __inline
    unsigned short swap16(unsigned short us)
    {
        return (us << 8) | (us >> 8);
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return (unsigned long) swap16((unsigned short) ul) << 16
             | (unsigned long) swap16((unsigned short) (ul >> 16));
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return (unsigned long long) swap32((unsigned long) ull) << 32
             | (unsigned long long) swap32((unsigned long) (ull >> 32));
    }
    
    Note: better use the appropriate intrinsic function _byteswap_ushort(), _byteswap_ulong() or _byteswap_uint64() instead of such expressions!
  2. Generate the assembly listing example7.asm from the source file example7.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample7.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example7.c
    
  3. Display the assembly listing example7.asm created in step 2.:

    Type example7.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap32rot
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32rot
    _TEXT	SEGMENT
    ul$ = 8
    swap32rot PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 5
    	mov	eax, ecx
    	rol	ecx, 8
    	ror	eax, 8
    	and	ecx, 16711935				; 00ff00ffH
    	and	eax, -16711936				; ff00ff00H
    	or	eax, ecx
    	bswap	eax
    ; Line 7
    	ret	0
    swap32rot ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 12
    	movzx	edx, cx
    	mov	eax, edx
    	shl	dx, 8
    	shr	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 13
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 17
    	mov	eax, ecx
    ; Line 12
    	rol	cx, 8
    ; Line 17
    	shr	eax, 16
    ; Line 12
    	rol	ax, 8
    ; Line 17
    	movzx	ecx, cx
    	movzx	eax, ax
    	shl	ecx, 16
    	or	eax, ecx
    	bswap	eax
    ; Line 19
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 17
    	mov	eax, ecx
    ; Line 23
    	mov	r9, rcx
    ; Line 17
    	shr	eax, 16
    ; Line 12
    	rol	ax, 8
    	movzx	r8d, ax
    	rol	cx, 8
    	movzx	eax, cx
    	shl	rax, 16
    ; Line 23
    	or	rax, r8
    	shr	r9, 32					; 00000020H
    ; Line 12
    	movzx	ecx, r9w
    ; Line 23
    	shl	rax, 16
    ; Line 12
    	rol	cx, 8
    	movzx	edx, cx
    ; Line 23
    	or	rax, rdx
    ; Line 17
    	shr	r9d, 16
    ; Line 12
    	rol	r9w, 8
    ; Line 23
    	shl	rax, 16
    ; Line 12
    	movzx	ecx, r9w
    ; Line 23
    	or	rax, rcx
    	mov	rax, rcx
    	bswap	rax
    ; Line 25
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    END
    
    Note: instead of just 2 instructions for each of the 4 functions, the assembly listing shows 20 (in words: twenty) instructions for the function swap64(), 8 instructions for the function swap32(), 5 instructions for the function swap16(), and 6 instructions for the function swap32rot().

    The optimiser fails to recognise all these commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) for all operand sizes!
    Additionally the commonly used expression for a rotate operation is not recognised for a 16-bit operand.

  4. Generate another assembly listing example7.asm from the source file example7.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample7.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example7.c
    
  5. Display the assembly listing example7.asm created in step 4.:

    Type example7.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example7.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap32rot
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32rot
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32rot PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 5
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	eax, ecx
    	ror	eax, 8
    	rol	ecx, 8
    	and	eax, -16711936				; ff00ff00H
    	and	ecx, 16711935				; 00ff00ffH
    	or	eax, ecx
    ; Line 5
    	ret	0
    _swap32rot ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 12
    	movbe	ax, WORD PTR _us$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 13
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 17
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	eax, ecx
    	shr	eax, 16					; 00000010H
    ; Line 12
    	rol	cx, 8
    	rol	ax, 8
    ; Line 17
    	movzx	ecx, cx
    	movzx	eax, ax
    	shl	ecx, 16					; 00000010H
    	or	eax, ecx
    ; Line 19
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example7.c
    ; Line 23
    	movbe	eax, DWORD PTR _ull$[esp]
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	mov	ecx, DWORD PTR _ull$[esp-4]
    ; Line 17
    	mov	eax, ecx
    	shr	eax, 16					; 00000010H
    ; Line 12
    	rol	ax, 8
    ; Line 23
    	movzx	eax, ax
    	cdq
    	push	ebx
    	mov	ebx, DWORD PTR _ull$[esp+4]
    	push	esi
    	mov	esi, eax
    ; Line 12
    	rol	cx, 8
    ; Line 22
    	push	edi
    ; Line 23
    	mov	edi, edx
    	movzx	eax, cx
    	cdq
    	shld	edx, eax, 16
    	shl	eax, 16					; 00000010H
    	or	edi, edx
    	or	esi, eax
    ; Line 12
    	mov	ax, bx
    ; Line 23
    	shld	edi, esi, 16
    ; Line 12
    	rol	ax, 8
    ; Line 23
    	movzx	eax, ax
    	cdq
    	shl	esi, 16					; 00000010H
    	or	edi, edx
    	or	esi, eax
    ; Line 17
    	shr	ebx, 16					; 00000010H
    ; Line 12
    	rol	bx, 8
    ; Line 23
    	shld	edi, esi, 16
    	movzx	eax, bx
    	cdq
    	shl	esi, 16					; 00000010H
    	or	edx, edi
    	pop	edi
    	or	eax, esi
    	pop	esi
    	pop	ebx
    ; Line 25
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    END
    
    Note: instead of just 1 or 2 MOVBE instructions for each of the 4 functions, the assembly listing shows 38 (in words: thirty-eight) instructions for the function swap64(), 9 instructions for the function swap32(), 5 instructions for the function swap16(), and 7 instructions for the function swap32rot().

    The optimiser fails to recognise all these commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) for all operand sizes!
    Additionally the commonly used expression for a rotate operation is not recognised for a 16-bit operand.

Example 8

Superfluous load and store instructions using superfluous temporary variable generated by the Visual C 2017 and Visual C 2010 compilers.

Demonstration

  1. Create the text file example8.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __inline
    unsigned long htonl(unsigned long ul)
    {
    #if _MSC_VER >= 1900
    	__asm	movbe	eax, ul
    #else
    	__asm	mov	eax, ul
    	__asm	bswap	eax
    #endif
    }
    
    int main(int argc)
    {
        unsigned long array[] = {'MSFT', 'MSVC', 'POOR', 'CODE'};
    
        argc = htonl(argc);
    
        for (argc = 0; argc < sizeof(array) / sizeof(*array); argc++)
            array[argc] = htonl(array[argc]);
    }
    
  2. Generate the assembly listing example8.asm from the source file example8.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tcexample8.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example8.c
    
  3. Display the assembly listing example8.asm created in step 2.:

    Type example8.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example8.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_htonl
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_htonl
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _htonl	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 7
    	movbe	eax, DWORD PTR _ul$[esp-4]
    ; Line 12
    	ret	0
    _htonl	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _array$ = -8						; size = 16
    _ul$ = 8						; size = 4
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 15
    	sub	esp, 16					; 00000010H
    ; Line 16
    	mov	DWORD PTR _array$[esp+8], 1297303124	; 4d534654H
    	mov	DWORD PTR _array$[esp+12], 1297307203	; 4d535643H
    	mov	DWORD PTR _array$[esp+16], 1347374930	; 504f4f52H
    	mov	DWORD PTR _array$[esp+20], 1129268293	; 434f4445H
    ; Line 18
    	movbe	eax, DWORD PTR _argc$[esp+12]
    ; Line 20
    	xor	ecx, ecx
    	npad	6
    $LL4@main:
    ; Line 21
    	mov	eax, DWORD PTR _array$[esp+ecx*4+16]
    	mov	DWORD PTR _ul$[esp+12], eax
    	movbe	eax, DWORD PTR _ul$[esp+12]
    	movbe	eax, DWORD PTR _array$[esp+ecx*4+16]
    	mov	DWORD PTR _array$[esp+ecx*4+16], eax
    	inc	ecx
    	cmp	ecx, 4
    	jb	SHORT $LL4@main
    ; Line 22
    	add	esp, 16					; 00000010H
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    
    Notice the superfluous in(s)ane transfer of the EAX register to and from the (intermediate) variable _ul$ generated for line 21!
    Also notice that the superfluous instruction generated for line 18 uses no superfluous intermediate variable!
  4. Generate the assembly listing example8.asm from the source file example8.c created in step 1., using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tcexample8.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    example8.c
    
  5. Display the assembly listing example8.asm created in step 4.:

    Type example8.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    	TITLE	C:\Users\Stefan\Desktop\example8.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_htonl
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_htonl
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _htonl	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example8.c
    ; Line 9
    	mov	eax, DWORD PTR _ul$[esp-4]
    ; Line 10
    	bswap	eax
    ; Line 11
    	ret	0
    _htonl	ENDP
    _TEXT	ENDS
    
    PUBLIC	_main
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _array$ = -16						; size = 16
    $T1040 = 8						; size = 4
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; Line 15
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 16					; 00000010H
    ; Line 16
    	mov	DWORD PTR _array$[ebp], 1297303124	; 4d534654H
    	mov	DWORD PTR _array$[ebp+4], 1297307203	; 4d535643H
    	mov	DWORD PTR _array$[ebp+8], 1347374930	; 504f4f52H
    	mov	DWORD PTR _array$[ebp+12], 1129268293	; 434f4445H
    ; Line 18
    	mov	eax, DWORD PTR _argc$[ebp]
    	bswap	eax
    ; Line 20
    	xor	edx, edx
    $LL3@main:
    	lea	ecx, DWORD PTR _array$[ebp+edx*4]
    ; Line 21
    	mov	eax, DWORD PTR [ecx]
    	mov	DWORD PTR $T1040[ebp], eax
    	mov	eax, DWORD PTR $T1040[ebp]
    	bswap	eax
    	inc	edx
    	mov	DWORD PTR [ecx], eax
    	cmp	edx, 4
    	jb	SHORT $LL3@main
    ; Line 22
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    
    Notice the superfluous in(s)ane transfer of the EAX register to and from the intermediate variable $T1040 generated for line 21!
    Again notice that the superfluous instructions generated for line 18 use no superfluous intermediate variable!

Example 9

Completely wrong code generated with __forceinline versus __inline by the Visual C 2017 compiler (and all previous versions too) when specified for a __fastcall function using inline assembler.

Note: the advice against this combination given in the MSDN article Using and Preserving Registers in Inline Assembly does not apply here: there is no code which might clobber any register!

Demonstration

  1. Create the text file example9.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __forceinline // here be dragons!
    unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
    {
    #ifdef MITIGATE
        return argument & 0x80000000 ? polynomial ^ (argument << 1) : argument << 1;
    #else
        __asm // 32-bit linear feedback shift register
        {
            add ecx, ecx ; ecx = argument << 1
            sbb eax, eax ; eax = CF ? -1 : 0
            and eax, edx ; eax = CF ? polynomial : 0
            xor eax, ecx ; eax = (argument << 1) ^ (CF ? polynomial : 0)
        }
    #endif
    }
    
    int main()
    {
        unsigned lfsr = 123456789;
        unsigned period = 0;
    
        do
        {
            period++;
            lfsr = lfsr32(lfsr, 0xC5);
        } while (lfsr != 123456789);
    
        return period;
    }
    
    Note: the constant 0xC5 represents the primitive polynomial x32+x30+x26+x25+x0, giving the 32-bit LFSR its maximum period length of 232−1.
  2. Generate the assembly listing example9.asm from the source file example9.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample9.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example9.c
    example9.c(4): warning C4100: 'polynomial': unreferenced formal parameter
    example9.c(4): warning C4100: 'argument': unreferenced formal parameter
    
  3. Display the assembly listing example9.asm created in step 2.:

    Type example9.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example9.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\example9.c
    ; Line 11
    	add	ecx, ecx
    ; Line 12
    	sbb	eax, eax
    ; Line 13
    	and	eax, edx
    ; Line 14
    	xor	eax, ecx
    ; Line 15
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example9.c
    ; Line 21
    	push	ebx
    	mov	eax, 123456789				; 075bcd15H
    ; Line 22
    	mov	edx, 197				; 000000c5H
    	xor	ebx, ebx
    	xor	edx, edx
    $LL4@main:
    ; Line 26
    	inc	edx
    	inc	ebx
    ; Line 27
    	mov	ecx, eax
    	add	ecx, ecx
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 28
    	cmp	eax, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	mov	eax, edx
    	mov	eax, ebx
    	pop	ebx
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    
    The variable lfsr alias argument, held in register ECX (the first argument of functions with __fastcall calling convention), is not initialized with the constant 123456789, register EDX (the second argument of functions with __fastcall calling convention) is never loaded with the constant 0xC5, and the return value from the (inlined) function held in register EAX is not loaded back into register ECX!

Mitigation

After replacing the __forceinline keyword with __inline the Visual C compiler generates correct code, but does not inline the function any more.

Note: replacing the function call avoids this compiler bug of course too, but generates no optimised code!

  1. Generate another assembly listing example9.asm from the source file example9.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro MITIGATE defined on the command line:

    CL.EXE /Bv /c /DMITIGATE /Fa /FoNUL: /Gy /Ox /Tcexample9.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example9.c
    
  2. Display the assembly listing example9.asm created in step 4.:

    Type example9.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example9.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\example9.c
    ; Line 5
    	push	esi
    ; Line 7
    	lea	esi, DWORD PTR [ecx+ecx]
    	mov	eax, esi
    	xor	eax, edx
    	test	ecx, ecx
    	cmovns	eax, esi
    	pop	esi
    	add	ecx, ecx
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 17
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example9.c
    ; Line 21
    	mov	ecx, 123456789				; 075bcd15H
    ; Line 22
    	xor	eax, eax
    	push	esi
    $LL4@main:
    ; Line 7
    	lea	esi, DWORD PTR [ecx+ecx]
    ; Line 26
    	inc	eax
    ; Line 28
    	mov	edx, esi
    	xor	edx, 197				; 000000c5H
    	test	ecx, ecx
    	cmovns	edx, esi
    	mov	ecx, edx
    IFDEF ALTERNATE
    	lea	edx, DWORD PTR [ecx+ecx]
    	sar	ecx, 31
    	and	ecx, 197				; 000000c5H
    	xor	ecx, edx
    ELSE
    	add	ecx, ecx
    	sbb	edx, edx
    	and	edx, 197				; 000000c5H
    	xor	ecx, edx
    ENDIF
    	cmp	ecx, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	pop	esi
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    
    While the generated code is correct, the compiler fails to perform an obvious optimisation: instead to explicitly set the sign flag SF with a separate TEST instruction, the carry flag CF set (from the most significant alias sign bit) by a SHL (as well as an ADD) instruction can be used here; this variant also doesn‛t need the extraneous register ESI to preserve the value of the shifted (or doubled) variable!

    Note: the assembly listing also shows an alternative variant.

Example 10

This is the reversed case of the second variant from (the previous) example 9.

Demonstration

  1. Create the text file example10.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __forceinline // here be dragons!
    unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
    {
    #ifdef MITIGATE
        return argument & 1 ? polynomial ^ (argument >> 1) : argument >> 1;
    #else
        __asm // 32-bit linear feedback shift register
        {
            shr ecx, 1   ; ecx = argument >> 1
            sbb eax, eax ; eax = CF ? -1 : 0
            and eax, edx ; eax = CF ? polynomial : 0
            xor eax, ecx ; eax = (argument >> 1) ^ (CF ? polynomial : 0)
        }
    #endif
    }
    
    int main()
    {
        unsigned lfsr = 123456789;
        unsigned period = 0;
    
        do
        {
            period++;
            lfsr = lfsr32(lfsr, 0xA3000000);
        } while (lfsr != 123456789);
    
        return period;
    }
    
    Note: the constant 0xA3000000 represents the same primitive polynomial x32+x30+x26+x25+x0 as 0xC5, it’s just the bit-reversed value, giving the 32-bit LFSR its maximum period length of 232−1.
  2. Generate the assembly listing example10.asm from the source file example10.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro MITIGATE defined on the command line:

    CL.EXE /Bv /c /DMITIGATE /Fa /FoNUL: /Gy /Ox /Tcexample10.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example10.c
    
  3. Display the assembly listing example10.asm created in step 2.:

    Type example10.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example10.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\example10.c
    ; Line 5
    	push	esi
    ; Line 7
    	mov	esi, ecx
    	shr	esi, 1
    	mov	eax, esi
    	xor	eax, edx
    	and	cl, 1
    	cmove	eax, esi
    	pop	esi
    	shr	ecx, 1
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 17
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example10.c
    ; Line 20
    	push	esi
    ; Line 21
    	mov	eax, 123456789				; 075bcd15H
    ; Line 22
    	xor	esi, esi
    	xor	ecx, ecx
    $LL4@main:
    ; Line 7
    	mov	edx, eax
    	mov	ecx, eax
    	shr	edx, 1
    ; Line 26
    	inc	esi
    	inc	ecx
    ; Line 28
    	mov	eax, edx
    	xor	eax, -1560281088			; a3000000H
    ; Line 7
    	and	cl, 1
    ; Line 28
    	cmove	eax, edx
    	and	eax, 1
    	neg	eax
    	and	eax, -1560281088			; a3000000H
    	xor	eax, edx
    	cmp	eax, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	mov	eax, esi
    	pop	esi
    	mov	eax, ecx
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    
    While the generated code is correct, the compiler fails to perform an obvious optimisation: instead to evaluate both terms of the ternary operator first, then overwrite one of the results with the other conditionally, depending on the least significant bit of the original value and determined with a separate AND instruction, the carry flag CF already set from the least significant by the SHR instruction bit can be used here; this variant also doesn‛t need the extraneous register ECX to preserve the original value of the shifted variable!

    Note: the assembly listing shows an alternative, equally optimised variant.

Example 11

These are the 64-bit variants of (the previous) example 10 and of the second variant from example 9.

Demonstration

  1. Create the text file example11.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long right()
    {
        unsigned long long lfsr = 0x0123456789ABCDEF;
        unsigned long long period = 0;
    
        do
        {
            period++;
            lfsr = lfsr & 1 ? 0xD800000000000000 ^ (lfsr >> 1) : lfsr >> 1;
        } while (lfsr != 0x0123456789ABCDEF);
    
        return period;
    }
    
    unsigned long long left()
    {
        unsigned long long lfsr = 0x0123456789ABCDEF;
        unsigned long long period = 0;
    
        do
        {
            period++;
            lfsr = (long long) lfsr < 0 ? 0x1B ^ (lfsr << 1) : lfsr << 1;
        } while (lfsr != 0x0123456789ABCDEF);
    
        return period;
    }
    
    Note: both constants 0xD800000000000000 and 0x1B represent the primitive polynomial x64+x63+x61+x60+x0, giving the 64-bit LFSR its maximum period length of 264−1.
  2. Generate the assembly listing example11.asm from the source file example11.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample11.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example11.c
    
  3. Display the assembly listing example11.asm created in step 2.:

    Type example11.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	right
    PUBLIC	left
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	right
    _TEXT	SEGMENT
    right	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example11.c
    ; Line 5
    	mov	r9, 81985529216486895			; 0123456789abcdefH
    ; Line 6
    	xor	r8d, r8d
    	mov	rax, r9
    	mov	r10, -2882303761517117440		; d800000000000000H
    	npad	6
    $LL4@right:
    ; Line 11
    	mov	rdx, rax
    	movzx	ecx, al
    	shr	rdx, 1
    	inc	r8
    ; Line 12
    	mov	rax, rdx
    	xor	rax, r10
    	and	cl, 1
    	cmove	rax, rdx
    	and	eax, 1
    	neg	rax
    	and	rax, r10
    	xor	rax, edx
    	cmp	rax, r9
    	jne	SHORT $LL4@right
    ; Line 14
    	mov	rax, r8
    ; Line 15
    	ret	0
    right	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	left
    _TEXT	SEGMENT
    left	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example11.c
    ; Line 19
    	mov	r9, 81985529216486895			; 0123456789abcdefH
    ; Line 20
    	xor	eax, eax
    	mov	rcx, r9
    	npad	1
    $LL4@left:
    ; Line 26
    	mov	rdx, rcx
    	lea	r8, QWORD PTR [rcx+rcx]
    	inc	rax
    	mov	rcx, r8
    	xor	rcx, 27
    	test	rdx, rdx
    	cmovns	rcx, r9
    	add	rcx, rcx
    	sbb	rdx, rdx
    	and	rdx, 27
    	xor	rcx, rdx
    	cmp	rcx, r9
    	jne	SHORT $LL4@left
    ; Line 29
    	ret	0
    left	ENDP
    _TEXT	ENDS
    END
    
    While the code generated for the function right() is correct, the compiler fails to perform an obvious optimisation: instead to evaluate both terms of the ternary operator first, then overwrite one of the results with the other conditionally, depending on the least significant bit of the original value and determined with a separate AND instruction, the carry flag CF already set from the least significant bit by the SHR instruction can be used here; this variant also doesn‛t need the extraneous register RCX to preserve the original value of the shifted variable!

    Additionally the registers RAX and R8 can be swapped, making the MOV instruction generated for line 14 superfluous.

    While the code generated for the function left() is correct too, the compiler likewise fails to perform an even more obvious optimisation: instead to explicitly set the sign flag SF with a separate TEST instruction, the carry flag CF set (from the most significant alias sign bit) by a SHL (as well as an ADD) instruction can be used here; this variant also doesn‛t need the extraneous register R8 to preserve the value of the shifted (or doubled) variable!

  4. Generate another assembly listing example11.asm from the source file example11.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample11.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example11.c
    
  5. Display the assembly listing example11.asm created in step 4.:

    Type example11.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example11.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_right
    PUBLIC	_left
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_right
    _TEXT	SEGMENT
    _period$ = -8						; size = 8
    _right	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example11.c
    ; Line 4
    	sub	esp, 8
    	push	esi
    	xorps	xmm0, xmm0
    ; Line 5
    	mov	ecx, -1985229329			; 89abcdefH
    ; Line 6
    	movlpd	QWORD PTR _period$[esp+12], xmm0
    	mov	eax, 19088743				; 01234567H
    	mov	esi, DWORD PTR _period$[esp+16]
    	xor	esi, esi
    	push	edi
    	mov	edi, DWORD PTR _period$[esp+16]
    	xor	edi, edi
    $LL4@right:
    ; Line 10
    	add	edi, 1
    ; Line 11
    	mov	edx, ecx
    	adc	esi, 0
    	and	ecx, 1
    	shrd	edx, eax, 1
    	shrd	ecx, eax, 1
    	shr	eax, 1
    	or	ecx, 0
    	mov	ecx, edx
    	je	SHORT $LN7@right
    	xor	ecx, 0
    	xor	eax, -671088640				; d8000000H
    $LN7@right:
    	and	edx, 1
    	neg	edx
    	and	edx, -671088640				; d8000000H
    	xor	eax, edx
    ; Line 12
    	cmp	ecx, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@right
    	cmp	eax, 19088743				; 01234567H
    	jne	SHORT $LL4@right
    ; Line 14
    	mov	eax, edi
    	mov	edx, esi
    	pop	edi
    	pop	esi
    ; Line 15
    	add	esp, 8
    	ret	0
    _right	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_left
    _TEXT	SEGMENT
    _period$ = -8						; size = 8
    _left	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example11.c
    ; Line 18
    	sub	esp, 8
    	push	ebx
    	xorps	xmm0, xmm0
    ; Line 19
    	mov	eax, -1985229329			; 89abcdefH
    	push	esi
    ; Line 20
    	movlpd	QWORD PTR _period$[esp+16], xmm0
    	mov	ecx, 19088743				; 01234567H
    	mov	ebx, DWORD PTR _period$[esp+16]
    	xor	ebx, ebx
    	push	edi
    	mov	edi, DWORD PTR _period$[esp+24]
    	xor	edi, edi
    $LL4@left:
    ; Line 24
    	add	ebx, 1
    	mov	edx, eax
    	mov	esi, ecx
    	adc	edi, 0
    	shld	esi, edx, 1
    	add	edx, edx
    	add	eax, eax
    	adc	ecx, ecx
    ; Line 25
    	test	ecx, ecx
    	jg	SHORT $LN6@left
    	jl	SHORT $LN11@left
    	test	eax, eax
    	jae	SHORT $LN6@left
    $LN11@left:
    	mov	eax, edx
    	mov	ecx, esi
    	xor	eax, 27					; 0000001bH
    	xor	ecx, 0
    	jmp	SHORT $LN7@left
    $LN6@left:
    	mov	eax, edx
    	mov	ecx, esi
    $LN7@left:
    	sbb	edx, edx
    	and	edx, 27					; 0000001bH
    	xor	eax, edx
    ; Line 26
    	cmp	eax, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@left
    	cmp	ecx, 19088743				; 01234567H
    	jne	SHORT $LL4@left
    ; Line 28
    	mov	edx, edi
    	mov	eax, ebx
    	pop	edi
    	pop	esi
    	pop	ebx
    ; Line 29
    	add	esp, 8
    	ret	0
    _left	ENDP
    _TEXT	ENDS
    END
    
    The code generated for the function right() is totally screwed up: the variable period is allocated on the stack, zeroed using the SSE register XMM0, then loaded into the registers ESI and EDI, but never used again; instead to hold the variable period in the register pair EDX:EAX used for the return value, it is held in the registers EDI and ESI, which have to be transferred into EDX:EAX upon exit; register ECX, which holds the lower half of the variable lfsr, is clobbered inside the loop without necessity and has to be reloaded; the result of the AND instruction set in the EFLAGS register is ignored, and evaluated again with an extraneous OR instruction; the XOR instruction with immediate operand 0 has no effect and is superfluous too!

    The code generated for the function left() is even worse: again the variable period is allocated on the stack, zeroed using the SSE register XMM0, then loaded into the registers ESI and EDI, but never used again; instead to hold the variable period in the register pair EDX:EAX used for the return value, it is held in the registers EBX and EDI, which have to be transferred into EDX:EAX upon exit; instead to use the carry flag CF already set by the SHLD instruction, or the sign flag SF set by the first test instruction, a full comparision against 0 is performed, involving three conditional branch instructions; the registers EAX and ECX, which hold the variable lfsr, are copied without necessity into the registers EDX and ESI, which are then used for the shift and exclusive-or operation; the XOR instruction with immediate operand 0 has no effect and is superfluous!

Example 12

Demonstration

  1. Create the text file example12.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long lcg64() // linear congruential generator
    {
        static unsigned long long z = 1066149217761810ULL;
    
        z = z * 6906969069ULL + 1234567ULL;
    
        return z;
    }
    
    Note: both constants are from George Marsaglia’s KISS64 pseudo-random number generator, giving the 64-bit LCG its maximum period length of 264.
  2. Generate the assembly listing example12.asm from the source file example12.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample12.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example12.c
    
  3. Display the assembly listing example12.asm created in step 2.:

    Type example12.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example12.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lcg64
    EXTRN	__allmul:PROC
    _DATA	SEGMENT
    ?z@?1??lcg64@@9@9 DQ 0003c9a83566fa12H			; `lcg64'::`2'::z
    _DATA	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lcg64
    _TEXT	SEGMENT
    _lcg64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example12.c
    ; Line 7
    	push	1
    	push	-1682965523				; 9baffbedH
    	push	DWORD PTR ?z@?1??lcg64@@9@9+4
    	push	DWORD PTR ?z@?1??lcg64@@9@9
    	call	__allmul
    	mov	ecx, -1682965523			; 9baffbedH
    	mov	eax, DWORD PTR ?z@?1??lcg64@@9@9
    	mul	ecx
    	add	edx, DWORD PTR ?z@?1??lcg64@@9@9
    	imul	ecx, DWORD PTR ?z@?1??lcg64@@9@9+4
    	add	eax, 1234567				; 0012d687H
    	mov	DWORD PTR ?z@?1??lcg64@@9@9, eax
    	adc	edx, 0
    	adc	edx, ecx
    	mov	DWORD PTR ?z@?1??lcg64@@9@9+4, edx
    ; Line 10
    	ret	0
    _lcg64	ENDP
    _TEXT	ENDS
    END
    
    While the generated code is correct, the compiler fails to perform an obvious optimisation: the constant 6906969069 is 232+2612001773 (the hexadecimal notation 0x19BAFFBED shows this immediately); multiplication with 232 can be replaced by a simple addition.

Example 13

Demonstration

  1. Create the text file example13.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long msws32(void) // Weyl sequence enhanced middle-square generator
    {
        static unsigned long v = 0UL;
        static unsigned long w = 0UL;
    
        w += 0x9E3779B9UL;
        v = (unsigned long) __ull_rshift(__emulu(v, v), 16);
        v += w;
    
        return v;
    }
    
    unsigned long mswsbw(void) // Weyl sequence enhanced middle-square generator
    {
        static unsigned long long v = 0ULL;
        static unsigned long long w = 0ULL;
    
        w += 0x9E3779B97F4A7C15ULL;
        v *= v;
        v += w;
        v = (v << 32) | (v >> 32);
    
        return (unsigned long) v;
    }
    
    unsigned long long msws64(void) // Weyl sequence enhanced middle-square generator
    {
        static unsigned long long v = 0ULL;
        static unsigned long long w = 0ULL;
    #ifdef _WIN64
        const unsigned long long q;
        const unsigned long long p = _umul128(v, v, &q);
    
        v = __shiftright128(p, q, 32);
        w += 0x9E3779B97F4A7C15ULL;
        v += w;
    #else
        w += 0x9E3779B97F4A7C15ULL;
        v = w
          + (__emulu((unsigned long) v, (unsigned long) v) >> 32)
          + (__emulu((unsigned long) v, (unsigned long) (v >> 32)) << 1);
    #endif
        return v;
    }
    
    Note: the constants 0x9E3779B9 and 0x9E3779B97F4A7C15 are the fractional part of the golden ratio Φ, which is also its inverse or reciprocal value φ=1⁄Φ=Φ−1=(√5−1)⁄2=0.6180339887…, multiplied by 232 or 264 respectively.
  2. Generate the assembly listing example13.asm from the source file example13.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample13.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example13.c
    
  3. Display the assembly listing example13.asm created in step 2.:

    Type example13.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example13.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_msws32
    PUBLIC	_mswsbw
    PUBLIC	_msws64
    EXTRN	__allmul:PROC
    _BSS	SEGMENT
    ?v@?1??msws32@@9@9 DQ 01H DUP (?)			; `msws32'::`2'::v
    ?w@?1??msws32@@9@9 DQ 01H DUP (?)			; `msws32'::`2'::w
    ?v@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::v
    ?w@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::w
    ?v@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::v
    ?w@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::w
    _BSS	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_msws32
    _TEXT	SEGMENT
    _msws32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 9
    	mov	eax, DWORD PTR ?v@?1??msws32@@9@9
    	mul	eax
    	push	esi
    	mov	esi, DWORD PTR ?w@?1??msws32@@9@9
    	sub	esi, 1640531527				; 61c88647H
    	shrd	eax, edx, 16
    	mov	edx, DWORD PTR ?w@?1??msws32@@9@9
    	add	edx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??msws32@@9@9, edx
    	add	eax, edx
    	mov	DWORD PTR ?w@?1??msws32@@9@9, esi
    	add	eax, esi
    	shr	edx, 16					; 00000010H
    ; Line 10
    	mov	DWORD PTR ?v@?1??msws32@@9@9, eax
    ; Line 12
    	pop	esi
    ; Line 13
    	ret	0
    _msws32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mswsbw
    _TEXT	SEGMENT
    _mswsbw	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 20
    	mov	ecx, DWORD PTR ?w@?1??mswsbw@@9@9
    	add	ecx, 2135587861				; 7f4a7c15H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9, ecx
    	mov	ecx, DWORD PTR ?w@?1??mswsbw@@9@9+4
    	adc	ecx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9+4, ecx
    ; Line 21
    	mov	ecx, DWORD PTR ?v@?1??mswsbw@@9@9
    	mov	eax, ecx
    	mul	eax
    	imul	ecx, DWORD PTR ?v@?1??mswsbw@@9@9+4
    	add	ecx, ecx
    	add	edx, ecx
    ; Line 22
    	add	eax, DWORD PTR ?w@?1??mswsbw@@9@9
    	adc	edx, DWORD PTR ?w@?1??mswsbw@@9@9+4
    	mov	ecx, DWORD PTR ?v@?1??mswsbw@@9@9+4
    	mov	eax, DWORD PTR ?v@?1??mswsbw@@9@9
    	push	esi
    	mov	esi, DWORD PTR ?w@?1??mswsbw@@9@9+4
    	push	edi
    	mov	edi, DWORD PTR ?w@?1??mswsbw@@9@9
    	push	ecx
    	push	eax
    	add	edi, 2135587861				; 7f4a7c15H
    	push	ecx
    	adc	esi, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9, edi
    	push	eax
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9+4, esi
    	call	__allmul
    	add	eax, edi
    ; Line 25
    	pop	edi
    	adc	edx, esi
    	xor	ecx, ecx
    	or	ecx, eax
    	mov	DWORD PTR ?v@?1??mswsbw@@9@9, edx
    	mov	DWORD PTR ?v@?1??mswsbw@@9@9+4, ecx
    	mov	dword PTR ?v@?1??mswsbw@@9@9+4, eax
    	mov	eax, edx
    	pop	esi
    ; Line 26
    	ret	0
    _mswsbw	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_msws64
    _TEXT	SEGMENT
    _msws64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 40
    	mov	ecx, DWORD PTR ?w@?1??msws64@@9@9
    	add	ecx, 2135587861				; 7f4a7c15H
    	mov	DWORD PTR ?w@?1??msws64@@9@9, ecx
    	mov	ecx, DWORD PTR ?w@?1??msws64@@9@9+4
    	adc	ecx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??msws64@@9@9+4, ecx
    ; Line 41
    	mov	eax, DWORD PTR ?v@?1??msws64@@9@9
    	mul	eax
    	mov	ecx, edx
    	mov	eax, DWORD PTR ?v@?1??msws64@@9@9
    	mul	DWORD PTR ?v@?1??msws64@@9@9+4
    	add	eax, eax
    	adc	edx, edx
    	add	eax, ecx
    	adc	edx, 0
    	add	eax, DWORD PTR ?w@?1??msws64@@9@9
    	adc	edx, DWORD PTR ?w@?1??msws64@@9@9+4
    	mov	DWORD PTR ?v@?1??msws64@@9@9, eax
    	mov	DWORD PTR ?v@?1??msws64@@9@9+4, edx
    	mov	ecx, DWORD PTR ?v@?1??msws64@@9@9
    	mov	eax, ecx
    	push	ebx
    	mov	ebx, DWORD PTR ?w@?1??msws64@@9@9
    	push	ebp
    	mov	ebp, DWORD PTR ?w@?1??msws64@@9@9+4
    	add	ebx, 2135587861				; 7f4a7c15H
    	push	esi
    	adc	ebp, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??msws64@@9@9, ebx
    	mul	DWORD PTR ?v@?1??msws64@@9@9+4
    	push	edi
    	mov	esi, eax
    	mov	DWORD PTR ?w@?1??msws64@@9@9+4, ebp
    	mov	edi, edx
    	mov	eax, ecx
    	shld	edi, esi, 1
    	mul	ecx
    	add	esi, esi
    	add	esi, edx
    	adc	edi, 0
    	add	esi, ebx
    	mov	DWORD PTR ?v@?1??msws64@@9@9, esi
    ; Line 45
    	mov	eax, esi
    	adc	edi, ebp
    	mov	DWORD PTR ?v@?1??msws64@@9@9+4, edi
    	mov	edx, edi
    	pop	edi
    	pop	esi
    	pop	ebp
    	pop	ebx
    ; Line 46
    	ret	0
    _msws64	ENDP
    _TEXT	ENDS
    END
    
    While the code generated for the function msws32() is correct, there is no reason to clobber register ESI instead of using register ECX!
    Additionally notice the superfluous SHR instruction: its result is never used.

    While the code generated for the function mswsbw() is correct, an optimising compiler should not emit 7 instructions to call an external routine for squaring a 64-bit value, but emit the 6 instructions which perform this operation inline!
    Additionally notice the superfluous XOR and OR instructions generated for line 25.

    While the code generated for the function msws64() is correct too, it has 31 instructions and clobbers all registers, but still performs 4 avoidable transfers between them; the optimal code has only 19 instructions and clobbers no register!

  4. Generate another assembly listing example13.asm from the source file example13.c created in step 1., now using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample13.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example13.c
    example13.c(33): warning C4132: 'q': const object should be initialized
    
  5. Display the assembly listing example13.asm created in step 4.:

    Type example13.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	msws32
    PUBLIC	mswsbw
    PUBLIC	msws64
    _BSS	SEGMENT
    ?v@?1??msws32@@9@9 DD 01H DUP (?)			; `msws32'::`2'::v
    ?w@?1??msws32@@9@9 DD 01H DUP (?)			; `msws32'::`2'::w
    ?v@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::v
    ?w@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::w
    ?v@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::v
    ?w@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::w
    _BSS	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	msws32
    _TEXT	SEGMENT
    msws32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 9
    	mov	eax, DWORD PTR ?v@?1??msws32@@9@9
    	mov	r8d, DWORD PTR ?w@?1??msws32@@9@9
    	mul	rax
    	add	r8d, -1640531527			; 9e3779b9H
    	shr	rax, 16
    	add	eax, r8d
    	mov	DWORD PTR ?w@?1??msws32@@9@9, r8d
    ; Line 10
    	mov	DWORD PTR ?v@?1??msws32@@9@9, eax
    ; Line 13
    	ret	0
    msws32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	mswsbw
    _TEXT	SEGMENT
    mswsbw	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 20
    	mov	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    	add	rcx, 11400714819323198485		; 9e3779b97f4a7c15H
    	mov	rax, 7046029254386353131		; 61c8864680b583ebH
    	sub	rcx, rax
    ; Line 21
    	mov	rax, QWORD PTR ?v@?1??mswsbw@@9@9
    	imul	rax, rax
    	mov	QWORD PTR ?w@?1??mswsbw@@9@9, rcx
    	add	rax, rcx
    ; Line 23
    	rol	rax, 32					; 00000020H
    	mov	QWORD PTR ?v@?1??mswsbw@@9@9, rax
    ; Line 26
    	ret	0
    mswsbw	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	msws64
    _TEXT	SEGMENT
    msws64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example13.c
    ; Line 34
    	mov	rax, QWORD PTR ?v@?1??msws64@@9@9
    ; Line 37
    	mov	r8, 7046029254386353131			; 61c8864680b583ebH
    	mov	rcx, QWORD PTR ?w@?1??msws64@@9@9
    	mul	rax
    	sub	rcx, r8
    	add	rcx, 11400714819323198485		; 9e3779b97f4a7c15H
    	shrd	rax, rdx, 32				; 00000020H
    	mov	QWORD PTR ?w@?1??msws64@@9@9, rcx
    ; Line 38
    	add	rax, rcx
    	mov	QWORD PTR ?v@?1??msws64@@9@9, rax
    ; Line 46
    	ret	0
    msws64	ENDP
    _TEXT	ENDS
    END
    

Example 14

Demonstration

  1. Create the text file example14.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long ullmul(unsigned long long p, unsigned long long q)
    {
    #ifdef OPTIMIZE
        if (((unsigned long) p | (unsigned long) q) == 0)
            return 0;
    
        if (((unsigned long) (p >> 32) | (unsigned long) (q >> 32)) == 0)
            return __emulu((unsigned long) p, (unsigned long) q);
    #endif
        return __emulu((unsigned long) p, (unsigned long) q)
             + ((unsigned long long) ((unsigned long) p * (unsigned long) (q >> 32)) << 32)
             + ((unsigned long long) ((unsigned long) q * (unsigned long) (p >> 32)) << 32);
    }
    
    long long llmul(long long p, long long q)
    {
    #ifdef OPTIMIZE
        if (((unsigned long) (p >> 32) | (unsigned long) (q >> 32)) == 0)
            return __emulu((unsigned long) p, (unsigned long) q);
    
        if (((unsigned long) p | (unsigned long) q) == 0)
            return 0;
    #endif
        return __emulu((unsigned long) p, (unsigned long) q)
             + ((unsigned long long) ((unsigned long) p * (unsigned long) (q >> 32)) << 32)
             + ((unsigned long long) ((unsigned long) q * (unsigned long) (p >> 32)) << 32);
    }
    
  2. Generate the assembly listing example14.asm from the source file example14.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample14.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example14.c
    
  3. Display the assembly listing example14.asm created in step 2.:

    Type example14.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example14.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ullmul
    PUBLIC	_llmul
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ullmul
    _TEXT	SEGMENT
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _ullmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example14.c
    ; Line 12
    	mov	eax, DWORD PTR _p$[esp-4]
    	mul	DWORD PTR _q$[esp-4]
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	push	esi
    	mov	esi, DWORD PTR _q$[esp+4]
    	imul	esi, DWORD PTR _p$[esp]
    	add	esi, ecx
    	add	eax, 0
    	adc	edx, esi
    	pop	esi
    	add	edx, ecx
    	mov	ecx, DWORD PTR _q$[esp]
    	imul	ecx, DWORD PTR _p$[esp-4]
    	add	edx, ecx
    ; Line 15
    	ret	0
    _ullmul	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llmul
    _TEXT	SEGMENT
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _llmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example14.c
    ; Line 26
    	mov	eax, DWORD PTR _p$[esp-4]
    	mul	DWORD PTR _q$[esp-4]
    	push	ebp
    	push	edi
    	mov	edi, DWORD PTR _q$[esp+8]
    	mov	ebp, eax
    	mov	ecx, edi
    	imul	edi, DWORD PTR _p$[esp+4]
    	sar	ecx, 31					; 0000001fH
    	mov	ecx, DWORD PTR _p$[esp+8]
    	mov	eax, ecx
    	imul	ecx, DWORD PTR _q$[esp+4]
    	sar	eax, 31					; 0000001fH
    	add	edi, ecx
    	add	ebp, 0
    	mov	eax, ebp
    	adc	edx, edi
    	pop	edi
    	pop	ebp
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _q$[esp]
    	imul	ecx, DWORD PTR _p$[esp-4]
    	add	edx, ecx
    ; Line 29
    	ret	0
    _llmul	ENDP
    _TEXT	ENDS
    END
    
    Notice especially the superfluous arithmetic right shifts by 31 generated for the llmul() routine, and the preceding loads of the registers ECX and EAX: their results are never used!
    The other highlight is the addition of 0, which can’t set the carry flag CF, followed by an addition with carry ADC instruction, which adds this flag.
  4. Generate another assembly listing example14.asm from the source file example14.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the macro OPTIMIZE defined on the command line:

    CL.EXE /Bv /c /DOPTIMIZE /Fa /FoNUL: /Gy /Ox /Tcexample14.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example14.c
    
  5. Display the assembly listing example14.asm created in step 4.:

    Type example14.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example14.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ullmul
    PUBLIC	_llmul
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ullmul
    _TEXT	SEGMENT
    tv261 = 8						; size = 8
    tv252 = 8						; size = 8
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _ullmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example14.c
    ; Line 4
    	push	esi
    	mov	esi, DWORD PTR _q$[esp]
    	push	edi
    	mov	edi, DWORD PTR _p$[esp+4]
    ; Line 6
    	mov	eax, edi
    	or	eax, esi
    	jne	SHORT $LN2@ullmul
    ; Line 7
    	pop	edi
    	xor	edx, edx
    ; Line 15
    	pop	esi
    	ret	0
    $LN2@ullmul:
    	push	ebx
    ; Line 9
    	mov	ebx, DWORD PTR _q$[esp+12]
    	mov	eax, edi
    	push	ebp
    	mov	ebp, DWORD PTR _p$[esp+16]
    	mov	ecx, ebp
    	or	ecx, ebx
    	mov	DWORD PTR tv252[esp+16], 0
    	mov	DWORD PTR tv261[esp+16], 0
    	jne	SHORT $LN3@ullmul
    ; Line 10
    	pop	ebp
    	pop	ebx
    	pop	edi
    	mul	esi
    ; Line 15
    	pop	esi
    	ret	0
    $LN3@ullmul:
    ; Line 12
    	imul	ebx, edi
    	imul	ebp, esi
    	mul	esi
    	add	ebx, ebp
    	add	eax, 0
    	pop	ebp
    	adc	edx, ebx
    	pop	ebx
    	pop	edi
    ; Line 15
    	pop	esi
    ; Line 6
    	mov	eax, DWORD PTR _p$[esp-4]
    	mov	edx, DWORD PTR _q$[esp-4]
    	or	edx, eax
    	je	SHORT $LN2@ullmul
    ; Line 9
    	mov	ecx, DWORD PTR _q$[esp]
    	mov	edx, DWORD PTR _p$[esp]
    	or	edx, ecx
    	jne	SHORT $LN3@ullmul
    ; Line 10
    	mul	DWORD PTR _q$[esp-4]
    $LN2@ullmul:
    ; Line 15
    	ret	0
    $LN3@ullmul:
    ; Line 12
    	imul	ecx, eax
    	mul	DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    ; Line 15
    	ret	0
    _ullmul	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llmul
    _TEXT	SEGMENT
    tv249 = 8						; size = 8
    tv240 = 8						; size = 8
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _llmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example14.c
    ; Line 18
    	sub	esp, 8
    	push	ebx
    ; Line 20
    	mov	ebx, DWORD PTR _p$[esp+12]
    	mov	eax, ebx
    	sar	eax, 31					; 0000001fH
    	mov	ecx, ebx
    	push	ebp
    	mov	ebp, DWORD PTR _q$[esp+16]
    	mov	DWORD PTR tv240[esp+20], eax
    	mov	eax, ebp
    	sar	eax, 31					; 0000001fH
    	or	ecx, ebp
    	push	esi
    	mov	esi, DWORD PTR _p$[esp+16]
    	push	edi
    	mov	edi, DWORD PTR _q$[esp+20]
    	mov	DWORD PTR tv249[esp+28], eax
    	mov	eax, esi
    	jne	SHORT $LN2@llmul
    ; Line 21
    	mul	edi
    	pop	edi
    ; Line 29
    	pop	esi
    	pop	ebp
    	pop	ebx
    	add	esp, 8
    	ret	0
    $LN2@llmul:
    ; Line 23
    	or	eax, edi
    	jne	SHORT $LN3@llmul
    ; Line 29
    	pop	edi
    	pop	esi
    	pop	ebp
    	xor	edx, edx
    	pop	ebx
    	add	esp, 8
    	ret	0
    $LN3@llmul:
    ; Line 26
    	mov	eax, esi
    	imul	esi, ebp
    	mul	edi
    	imul	edi, ebx
    	add	esi, edi
    	add	eax, 0
    	pop	edi
    	adc	edx, esi
    ; Line 29
    	pop	esi
    	pop	ebp
    	pop	ebx
    	add	esp, 8
    ; Line 20
    	mov	ecx, DWORD PTR _q$[esp]
    	mov	edx, DWORD PTR _p$[esp]
    	or	edx, ecx
    	jne	SHORT $LN2@ullmul
    ; Line 21
    	mul	DWORD PTR _q$[esp-4]
    ; Line 29
    	ret	0
    $LN2@llmul:
    ; Line 23
    	mov	eax, DWORD PTR _p$[esp-4]
    	mov	edx, DWORD PTR _q$[esp-4]
    	or	edx, eax
    	je	SHORT $LN3@llmul
    ; Line 26
    	imul	ecx, eax
    	mul	DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    $LN3@llmul:
    ; Line 29
    	ret	0
    _llmul	ENDP
    _TEXT	ENDS
    END
    
    Instead to load the low parts of both arguments into the registers EAX and EDX (which return the result) and test their logical or for 0, the registers ESI and EDI are clobbered, which both must be saved and restored.
    In both routines superfluous temporary variables tv252 and tv261 respectively tv240 and tv249 are allocated and values assigned to them, which are but never used elsewhere – an advanced technique known as WORN!
    Again notice the superfluous arithmetic right shifts by 31 generated for the llmul() routine: their results are assigned to the (otherwise unused) temporary variables.
    The other highlight is still the addition of 0, which can’t set the carry flag CF, followed by an addition with carry ADC instruction, which adds this flag.

Example 15

Demonstration

  1. Create the text file example15.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long llsgn0(long long value)
    {
        return value < 0 ? -1 : 0;
    }
    
    int llsgn1(long long value)
    {
        return value < 0;
    }
    
    int llsgn2(long long value)
    {
        return value >> 63;
    }
    
    int llsgn3(long long value)
    {
        return (value >> 63) != 0;
    }
    
    int llsgn4(long long value)
    {
        return (value & (1LL << 63)) != 0;
    }
    
  2. Generate the assembly listing example15.asm from the source file example15.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample15.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example15.c
    
  3. Display the assembly listing example15.asm created in step 2.:

    Type example15.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example15.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_llsgn0
    PUBLIC	_llsgn1
    PUBLIC	_llsgn2
    PUBLIC	_llsgn3
    PUBLIC	_llsgn4
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn0
    _TEXT	SEGMENT
    $T1 = 8							; size = 8
    _value$ = 8						; size = 8
    _llsgn0	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example15.c
    ; Line 5
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn0
    	jl	SHORT $LN5@llsgn0
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn0
    $LN5@llsgn0:
    	or	eax, -1
    	or	edx, eax
    ; Line 6
    	ret	0
    $LN3@llsgn0:
    	xorps	xmm0, xmm0
    ; Line 5
    	movlpd	QWORD PTR $T1[esp-4], xmm0
    	mov	eax, DWORD PTR $T1[esp-4]
    	mov	edx, DWORD PTR $T1[esp]
    	mov	eax, DWORD PTR _value$[esp]
    	cdq
    	mov	eax, edx
    ; Line 6
    	ret	0
    _llsgn0	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn1
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn1	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example15.c
    ; Line 10
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn1
    	jl	SHORT $LN5@llsgn1
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn1
    $LN5@llsgn1:
    	mov	eax, 1
    ; Line 11
    	ret	0
    $LN3@llsgn1:
    ; Line 10
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 11
    	ret	0
    _llsgn1	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn2
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn2	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example15.c
    ; Line 15
    	mov	ecx, DWORD PTR _value$[esp]
    	mov	eax, ecx
    	sar	eax, 31					; 0000001fH
    	sar	ecx, 31					; 0000001fH
    	mov	eax, DWORD PTR _value$[esp]
    	sar	eax, 31					; 0000001fH
    ; Line 16
    	ret	0
    _llsgn2	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn3
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn3	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example15.c
    ; Line 20
    	mov	ecx, DWORD PTR _value$[esp]
    	xor	eax, eax
    	and	ecx, -2147483648			; 80000000H
    	or	eax, ecx
    	je	SHORT $LN3@llsgn3
    	mov	eax, 1
    ; Line 21
    	ret	0
    $LN3@llsgn3:
    ; Line 20
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 21
    	ret	0
    _llsgn3	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn4
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn4	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example15.c
    ; Line 25
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn4
    	jl	SHORT $LN5@llsgn4
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn4
    $LN5@llsgn4:
    	mov	eax, 1
    ; Line 26
    	ret	0
    $LN3@llsgn4:
    ; Line 25
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 26
    	ret	0
    _llsgn4	ENDP
    _TEXT	ENDS
    END
    
    The optimiser fails to recognise all these commonly used expressions to determine the sign of an integer value!

    Notice especially the completely in(s)ane use of the SSE register XMM0 and the temporary variable $T1 instead of just two XOR instructions to zero the registers EAX and EDX in the function llsgn0(), the two SAR instructions in the function llsgn2(), and the completely insane code generated for the function llsgn3()!

Example 16

Superfluous instructions generated for the intrinsic function __getcallerseflags() by the Visual C 2017 compiler (and previous versions too):

Demonstration

  1. Create the text file example16.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int main()
    {
        return __getcallerseflags();
    }
    
  2. Generate the assembly listing example16.asm from the source file example16.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample16.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example16.c
    
  3. Display the assembly listing example16.asm created in step 2.:

    Type example16.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example16.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    __$Eflags$ = 4						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 4
    	pushfd
    	push	ebp
    	mov	ebp, esp
    ; Line 5
    	mov	eax, DWORD PTR __$Eflags$[ebp]
    ; Line 6
    	pop	ebp
    	pop	ecx
    	pop	eax
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    
  4. Generate the assembly listing example16.asm from the source file example16.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tcexample16.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example16.c
    
  5. Display the assembly listing example16.asm created in step 4.:

    Type example16.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	main
    ;	COMDAT	pdata
    pdata	SEGMENT
    $pdata$main DD	imagerel $LN4
    	DD	imagerel $LN4+7
    	DD	imagerel $unwind$main
    pdata	ENDS
    ;	COMDAT	xdata
    xdata	SEGMENT
    $unwind$main DD	010201H
    	DD	0202H
    xdata	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	main
    _TEXT	SEGMENT
    __$Eflags$ = 0
    main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example16.c
    ; Line 4
    $LN4:
    	pushfq
    ; Line 5
    	mov	eax, DWORD PTR __$Eflags$[rsp]
    ; Line 6
    	pop	rcx
    	pop	rax
    	ret	0
    main	ENDP
    _TEXT	ENDS
    END
    

Example 17

Demonstration

  1. Create the text file example17.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define STRICT
    #define UNICODE
    #define WIN32_LEAN_AND_MEAN
    
    #include <windows.h>
    #include <unknwn.h>
    
    #define IF2CO(class, member, interface)	(&((class *) 0)->member == interface, \
    					 ((class *) (((char *) interface) - (size_t) &(((class *) 0)->member))))
    
    extern	const	GUID	CLSID_NULL;
    
    extern	DWORD	dwCount;
    
    typedef	struct	_CUnknown
    {
    	DWORD		dwCount;
    
    	IUnknown	Unknown;
    } CUnknown;
    
    HRESULT	WINAPI	Unknown_QueryInterface(IUnknown *this, REFIID rIID, VOID **ppv)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    
    	if (ppv == NULL)
    		return E_POINTER;
    
    	*ppv = NULL;
    
    	if (rIID == NULL)
    		return E_INVALIDARG;
    
    	if (!IsEqualIID(rIID, &IID_IUnknown))
    		return E_NOINTERFACE;
    
    	*ppv = &that->Unknown;
    
    	_InterlockedIncrement(&that->dwCount);
    
    	return S_OK;
    }
    
    DWORD	WINAPI	Unknown_AddRef(IUnknown *this)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    
    	return _InterlockedIncrement(&that->dwCount);
    }
    
    DWORD	WINAPI	Unknown_Release(IUnknown *this)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    	DWORD		dw = _InterlockedDecrement(&that->dwCount);
    
    	if (dw != 0L)
    		return dw;
    
    	_InterlockedDecrement(&dwCount);
    
    	CoTaskMemFree(that);
    
    	return 0L;
    }
    
    const	IUnknownVtbl	Unknown_Vtbl = {Unknown_QueryInterface, Unknown_AddRef, Unknown_Release};
    
    Note: this ANSI C source is a minimum implementation of the IUnknown interface.
  2. Generate the assembly listing example17.asm from the source file example17.c created in step 1., using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1is /Tcexample17.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    example17.c
    
  3. Display the assembly listing example17.asm created in step 2.:

    Type example17.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    	TITLE	C:\Users\Stefan\Desktop\example17.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_Unknown_Release@4
    PUBLIC	_Unknown_AddRef@4
    PUBLIC	_Unknown_QueryInterface@12
    PUBLIC	_Unknown_Vtbl
    
    ;	COMDAT	CONST
    CONST	SEGMENT
    _Unknown_Vtbl DD FLAT:_Unknown_QueryInterface@12
    	DD	FLAT:_Unknown_AddRef@4
    	DD	FLAT:_Unknown_Release@4
    CONST	ENDS
    
    EXTRN	_IID_IUnknown:BYTE
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_QueryInterface@12
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _rIID$ = 12						; size = 4
    _ppv$ = 16						; size = 4
    _Unknown_QueryInterface@12 PROC				; COMDAT
    ; File c:\users\stefan\desktop\example17.c
    ; Line 26
    	mov	edx, DWORD PTR _this$[esp-4]
    ; Line 28
    	mov	eax, DWORD PTR _ppv$[esp-4]
    	add	edx, -4					; fffffffcH
    	test	eax, eax
    	jne	SHORT $LN3@Unknown_Qu
    ; Line 29
    	mov	eax, -2147467261			; 80004003H
    	jmp	SHORT $LN4@Unknown_Qu
    $LN3@Unknown_Qu:
    ; Line 31
    	and	DWORD PTR [eax], 0
    	push	esi
    ; Line 33
    	mov	esi, DWORD PTR _rIID$[esp]
    	test	esi, esi
    	jne	SHORT $LN2@Unknown_Qu
    ; Line 34
    	mov	eax, -2147024809			; 80070057H
    	jmp	SHORT $LN7@Unknown_Qu
    $LN2@Unknown_Qu:
    	push	ebx
    	push	edi
    ; Line 36
    	push	4
    	pop	ecx
    	xor	ebx, ebx
    	mov	edi, OFFSET _IID_IUnknown
    	repe	cmpsd
    	pop	edi
    	pop	ebx
    	je	SHORT $LN1@Unknown_Qu
    ; Line 37
    	mov	eax, -2147467262			; 80004002H
    	jmp	SHORT $LN7@Unknown_Qu
    $LN1@Unknown_Qu:
    ; Line 39
    	lea	ecx, DWORD PTR [edx+4]
    	mov	DWORD PTR [eax], ecx
    ; Line 41
    	xor	eax, eax
    	inc	eax
    	lock	xadd DWORD PTR [edx], eax
    ; Line 43
    	xor	eax, eax
    $LN7@Unknown_Qu:
    	pop	esi
    $LN4@Unknown_Qu:
    ; Line 44
    	ret	12					; 0000000cH
    _Unknown_QueryInterface@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_AddRef@4
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _Unknown_AddRef@4 PROC					; COMDAT
    ; Line 48
    	mov	ecx, DWORD PTR _this$[esp-4]
    ; Line 50
    	xor	eax, eax
    	add	ecx, -4					; fffffffcH
    	inc	eax
    	lock	xadd DWORD PTR [ecx], eax
    	inc	eax
    ; Line 51
    	ret	4
    _Unknown_AddRef@4 ENDP
    _TEXT	ENDS
    
    EXTRN	__imp__CoTaskMemFree@4:PROC
    EXTRN	_dwCount:DWORD
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_Release@4
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _Unknown_Release@4 PROC					; COMDAT
    ; Line 55
    	mov	ecx, DWORD PTR _this$[esp-4]
    	add	ecx, -4					; fffffffcH
    ; Line 56
    	mov	edx, ecx
    	or	eax, -1
    	lock	xadd DWORD PTR [edx], eax
    	dec	eax
    ; Line 59
    	jne	SHORT $LN2@Unknown_Re
    ; Line 61
    	mov	eax, OFFSET _dwCount
    	or	edx, -1
    	lock	xadd DWORD PTR [eax], edx
    ; Line 63
    	push	ecx
    	call	DWORD PTR __imp__CoTaskMemFree@4
    ; Line 65
    	xor	eax, eax
    $LN2@Unknown_Re:
    ; Line 66
    	ret	4
    _Unknown_Release@4 ENDP
    _TEXT	ENDS
    END
    
    Notice the in(s)ane use of the EBX register around the inlined memcmp() function.

Example 18

Superfluous unreachable call of external routine __report_rangecheckfailure() generated by the Visual C 2017 compiler.

Demonstration

  1. Create the text file example18.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define MAX_PATH 260
    
    typedef short wchar_t;
    
    unsigned __stdcall GetModuleFileNameA(void *, char *, unsigned);
    
    int main()
    {
        char sz[MAX_PATH];
        unsigned dw = GetModuleFileNameA(0, sz, MAX_PATH);
    
        if (dw < MAX_PATH)
            sz[dw] = '\0';
    }
    
    unsigned __stdcall GetModuleFileNameW(void *, wchar_t *, unsigned);
    
    int wmain()
    {
        wchar_t sz[MAX_PATH];
        unsigned dw = GetModuleFileNameW(0, sz, MAX_PATH);
    
        if (dw < MAX_PATH)
            sz[dw] = L'\0';
    }
    
  2. Generate the assembly listing example18.asm from the source file example18.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tcexample18.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    example18.c
    
  3. Display the assembly listing example18.asm created in step 2.:

    Type example18.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\example18.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    PUBLIC	_wmain
    EXTRN	___report_rangecheckfailure:PROC
    EXTRN	_GetModuleFileNameA@12:PROC
    EXTRN	_GetModuleFileNameW@12:PROC
    EXTRN	@__security_check_cookie@4:PROC
    EXTRN	___security_cookie:DWORD
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _sz$ = -264						; size = 260
    __$ArrayPad$ = -4					; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example18.c
    ; Line 10
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 264				; 00000108H
    	mov	eax, DWORD PTR ___security_cookie
    	xor	eax, ebp
    	mov	DWORD PTR __$ArrayPad$[ebp], eax
    	push	esi
    ; Line 12
    	mov	esi, 260				; 00000104H
    	lea	eax, DWORD PTR _sz$[ebp]
    	push	esi
    	push	eax
    	push	0
    	call	_GetModuleFileNameA@12
    ; Line 14
    	cmp	eax, esi
    	pop	esi
    	jae	SHORT $LN2@main
    ; Line 15
    	mov	BYTE PTR _sz$[ebp+eax], 0
    $LN2@main:
    ; Line 16
    	mov	ecx, DWORD PTR __$ArrayPad$[ebp]
    	xor	eax, eax
    	xor	ecx, ebp
    	call	@__security_check_cookie@4
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_wmain
    _TEXT	SEGMENT
    _sz$ = -524						; size = 520
    __$ArrayPad$ = -4					; size = 4
    _wmain	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example18.c
    ; Line 21
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 524				; 0000020cH
    	mov	eax, DWORD PTR ___security_cookie
    	xor	eax, ebp
    	mov	DWORD PTR __$ArrayPad$[ebp], eax
    	push	esi
    ; Line 23
    	mov	esi, 260				; 00000104H
    	lea	eax, DWORD PTR _sz$[ebp]
    	push	esi
    	push	eax
    	push	0
    	call	_GetModuleFileNameW@12
    ; Line 25
    	cmp	eax, esi
    	pop	esi
    	jae	SHORT $LN2@wmain
    ; Line 26
    	add	eax, eax
    	cmp	eax, 520				; 00000208H
    	jae	SHORT $LN9@wmain
    $LN2@wmain:
    ; Line 27
    	mov	ecx, DWORD PTR __$ArrayPad$[ebp]
    	xor	eax, eax
    	xor	ecx, ebp
    	call	@__security_check_cookie@4
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    $LN9@wmain:
    ; Line 26
    	call	___report_rangecheckfailure
    $LN11@wmain:
    $LN8@wmain:
    	int	3
    _wmain	ENDP
    _TEXT	ENDS
    END
    
    Notice the difference between the single-byte character routine main() and the double-byte character routine wmain(): in the former, the conditional assignment of the terminating NUL character is not removed; in the latter, a superfluous range check with a conditional branch that can never be taken is inserted instead, plus an unreachable call of the external routine __report_rangecheckfailure()!
  4. Generate the assembly listing example18.asm from the source file example18.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tcexample18.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    example18.c
    
  5. Display the assembly listing example18.asm created in step 4.:

    Type example18.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	main
    PUBLIC	wmain
    EXTRN	__report_rangecheckfailure:PROC
    EXTRN	GetModuleFileNameA:PROC
    EXTRN	GetModuleFileNameW:PROC
    EXTRN	__GSHandlerCheck:PROC
    EXTRN	__security_check_cookie:PROC
    EXTRN	__security_cookie:QWORD
    ;	COMDAT	pdata
    pdata	SEGMENT
    $pdata$main DD	imagerel $LN10
    	DD	imagerel $LN10+97
    	DD	imagerel $unwind$main
    pdata	ENDS
    ;	COMDAT	pdata
    pdata	SEGMENT
    $pdata$wmain DD imagerel $LN11
    	DD	imagerel $LN11+95
    	DD	imagerel $unwind$wmain
    pdata	ENDS
    ;	COMDAT	xdata
    xdata	SEGMENT
    $unwind$wmain DD 021919H
    	DD	0490107H
    	DD	imagerel __GSHandlerCheck
    	DD	0230H
    xdata	ENDS
    ;	COMDAT	xdata
    xdata	SEGMENT
    $unwind$main DD 021919H
    	DD	0290107H
    	DD	imagerel __GSHandlerCheck
    	DD	0130H
    xdata	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	main
    _TEXT	SEGMENT
    sz$ = 32
    __$ArrayPad$ = 304
    main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example18.c
    ; Line 10
    $LN10:
    	sub	rsp, 328				; 00000148H
    	mov	rax, QWORD PTR __security_cookie
    	xor	rax, rsp
    	mov	QWORD PTR __$ArrayPad$[rsp], rax
    ; Line 12
    	mov	r8d, 260				; 00000104H
    	lea	rdx, QWORD PTR sz$[rsp]
    	xor	ecx, ecx
    	call	GetModuleFileNameA
    ; Line 14
    	cmp	eax, 260				; 00000104H
    	jae	SHORT $LN2@main
    ; Line 15
    	mov	eax, eax
    	cmp	rax, 260				; 00000104H
    	jae	SHORT $LN9@main
    	mov	BYTE PTR sz$[rsp+rax], 0
    $LN2@main:
    ; Line 16
    	xor	eax, eax
    	mov	rcx, QWORD PTR __$ArrayPad$[rsp]
    	xor	rcx, rsp
    	call	__security_check_cookie
    	add	rsp, 328				; 00000148H
    	ret	0
    $LN9@main:
    ; Line 15
    	call	__report_rangecheckfailure
    	int	3
    $LN8@main:
    main	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	wmain
    _TEXT	SEGMENT
    sz$ = 32
    __$ArrayPad$ = 560
    wmain	PROC						; COMDAT
    ; File c:\users\stefan\desktop\example18.c
    ; Line 21
    $LN11:
    	sub	rsp, 584				; 00000248H
    	mov	rax, QWORD PTR __security_cookie
    	xor	rax, rsp
    	mov	QWORD PTR __$ArrayPad$[rsp], rax
    ; Line 23
    	mov	r8d, 260				; 00000104H
    	lea	rdx, QWORD PTR sz$[rsp]
    	xor	ecx, ecx
    	call	GetModuleFileNameW
    ; Line 25
    	cmp	eax, 260				; 00000104H
    	jae	SHORT $LN2@wmain
    ; Line 26
    	mov	eax, eax
    	add	rax, rax
    	cmp	rax, 520				; 00000208H
    	jae	SHORT $LN9@wmain
    $LN2@wmain:
    ; Line 27
    	xor	eax, eax
    	mov	rcx, QWORD PTR __$ArrayPad$[rsp]
    	xor	rcx, rsp
    	call	__security_check_cookie
    	add	rsp, 584				; 00000248H
    	ret	0
    $LN9@wmain:
    ; Line 26
    	call	__report_rangecheckfailure
    	int	3
    $LN8@wmain:
    wmain	ENDP
    _TEXT	ENDS
    END
    
    Notice the superfluous range checks with conditional branches that can never be taken, plus the unreachable calls of the external routine __report_rangecheckfailure()!
    Also notice that the conditional assignment of the terminating NUL character is not removed in the single-byte character routine main().

Contact

If you miss anything here, have additions, comments, corrections, criticism or questions, want to give feedback, hints or tipps, report broken links, bugs, errors, inaccuracies, omissions, vulnerabilities or weaknesses, …:
don’t hesitate to contact me and feel free to ask, comment, criticise, flame, notify or report!

Use the X.509 certificate to send S/MIME encrypted mail.

Notes: I dislike HTML (and even weirder formats too) in email, I prefer to receive plain text.
I also expect to see your full (real) name as sender, not your nickname!
Emails in weird formats and without a proper sender name are likely to be discarded.
I abhor top posts and expect inline quotes in replies.

Terms and Conditions

By using this site, you signify your agreement to these terms and conditions. If you do not agree to these terms and conditions, do not use this site!

Data Protection Declaration

This web page records no data and sets no cookies.

The service provider for *.homepage.t-online.de, Deutsche Telekom AG,


Copyright © 1995–2018 • Stefan Kanthak • <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>