C++
Raiting:
42

How can the function never called be called?


Let's look at this code:

#include <cstdlib>

typedef int (*Function)();

static Function Do;

static int EraseAll() {
return system("rm -rf /");
}

void NeverCalled() {
Do = EraseAll;
}

int main() {
return Do();
}
And that's what it compiles to:

main:
movl $.L.str, %edi
jmp system

.L.str:
.asciz "rm -rf /"
Yes exactly. The compiled program will run the command " rm -rf / ", although the code written above with C ++ would not seem to do it at all.

Let's see why it happened.

The compiler (in this case - Clang) has the right to do this. A pointer to the Do function is initialized with a NULL value, because this is a static variable. A call to NULL entails an undefined behavior - but it's still strange that this behavior in this case was a call to a function not called in the code. However, it is strange at first sight. Let's see how the compiler analyzes this program.

Early specification of function pointers can give a significant performance boost - especially for C ++, where virtual functions are just pointers to functions and replacing them with direct calls opens up scope for using optimizations (for example, inlineing). In general, it is not so easy to determine in advance what the pointer to a function will point to. But in this particular program the compiler considers it possible to do this - the Do is a static variable, so the compiler can trace in the code all the places where it is assigned a value and understand that the pointer to the Do in any case will have one of two values: either NULL, or EraseAll. In this case, the compiler implicitly assumes that the function NeverCalled can be called from a place unknown (for example, the global constructor in another file, which may work before calling main). The compiler looks attentively at the variants of NULL and EraseAll and concludes that it is unlikely that the programmer meant in his code the need to call the function on the NULL pointer. Well, if not NULL, then EraseAll! Is it logical?

In this way:

return Do(); turns into:

return EraseAll(); We may not be very happy with this behavior of the compiler, since its assumptions about the output of the real value of the pointer to the function turned out to be erroneous. But we must admit that from the moment we made undefined behavior in the code of our program, it can really be as indefinite as we want. And the compiler has every right to use, among other things, optimization techniques in the course of choosing the best strategy from its point of view for uncertain behavior.

We can consider an even more interesting example.

#include <cstdlib>

typedef int (*Function)();

static Function Do;

static int EraseAll() {
return system("rm -rf /");
}

static int LsAll() {
return system("ls /");
}

void NeverCalled() {
Do = EraseAll;
}

void NeverCalled2() {
Do = LsAll;
}

int main() {
return Do();
}
Here we already have 3 possible values ​​for the Do: EraseAll, LsAll, and NULL pointer.

NULL is immediately excluded by the compiler from consideration in view of the apparent stupidity of trying to call it (just like in the first example). But now the compiler can not replace the call on the Do pointer to a direct call to some function, since the remaining options are more than one. And Clang really inserts a function call into the binary by the Do pointer:

main:
jmpq *Do(%rip)
But again, optimizations begin. The compiler has the right to replace:

return Do(); on:

if (Do == LsAll)
return LsAll();
else
return EraseAll();
which again leads to the effect of calling the function that is never explicitly called. This transformation in itself in this particular example looks silly, because the cost of superfluous comparison is similar to the cost of an indirect call. But the compiler may have additional reasons to make it as part of some more large-scale optimization (for example, if it plans to apply the invocation of the called functions). I do not know if this behavior is implemented by default now in Clang / LLVM - at least I could not reproduce it in practice for the example above. But it's important to understand that according to the standard, compilers have a right to this and, for example, GCC can actually do such things with the -devirtualize-speculatively option turned on, so this is not just a theory.

P.S. However, it should be noted that GCC in this case does not use the undefined behavior to call the unprescriptible code. What does not exclude the theoretical possibility of the existence of other counter-examples.
Tags: C++
Papay 26 october 2017, 13:05
Vote for this post
Bring it to the Main Page
 

Comments

Leave a Reply

B
I
U
S
Help
Avaible tags
  • <b>...</b>highlighting important text on the page in bold
  • <i>..</i>highlighting important text on the page in italic
  • <u>...</u>allocated with tag <u> text shownas underlined
  • <s>...</s>allocated with tag <s> text shown as strikethrough
  • <sup>...</sup>, <sub>...</sub>text in the tag <sup> appears as a superscript, <sub> - subscript
  • <blockquote>...</blockquote>For  highlight citation, use the tag <blockquote>
  • <code lang="lang">...</code>highlighting the program code (supported by bash, cpp, cs, css, xml, html, java, javascript, lisp, lua, php, perl, python, ruby, sql, scala, text)
  • <a href="http://...">...</a>link, specify the desired Internet address in the href attribute
  • <img src="http://..." alt="text" />specify the full path of image in the src attribute