As most of you know by now, I’m hard at work on the x64 edition of my assembly book, to be called X64 Assembly Language Step By Step. I’m working on the chapter where I discuss calling functions in libc from assembly language. The 2009 edition of the book was pure 32-bit x86. Parameters were passed to libc functions mostly by pushing them on the stack, which required cleaning up the stack after each call, etc.
Calling conventions in x64 are radically different. The first six parameters to any function are passed in registers. (More than six and you have to start pushing them on the stack.) The first parameter goes in RDI, the second in RSI, the third in RDX, and so on. When a function returns a single value, that value is passed back in RAX. This allows a lot more to be done without fooling with the stack.
Below is a short example program that makes four calls to libc functions: Two calls to puts(),
a call to time
, and a call to ctime
. Here’s the makefile for the program:
showtime: showtime.o gcc showtime.o -o showtime -no-pie showtime.o: showtime.asm nasm -f elf64 -g -F dwarf showtime.asm -l showtime.lst
I’ve used this makefile for other example programs that call libc functions, and they all work. So take a look:
section .data timemsg db "The timestamp is: ",0 timebuf db 28,0 ; not useed yet time1 dq 0 ; time_t stored here. section .bss section .text extern time extern ctime extern puts global main main: push rbp ; Prolog mov rbp,rsp mov rdi,timemsg ; Put address of message in rdi call puts ; call libc function puts xor rax,rax ; Zero rax call time ; time returns time_t value in rax mov [time1],rax ; Save time_t value to var time1 mov rdi,time1 ; Copy pointer to time_t value to rdi call ctime ; Returns ptr to the date string in rax mov rdi,rax ; Copy pointer to string into rdi call puts ; Print ctime's output string mov rsp,rbp ; Epilog pop rbp ret ; Return from main()
Not much to it. There are four sections, not counting the prolog and epilog: The program prints an intro message using puts
, then fetches the current time in time_t
format, then uses ctime
to convert the time_t
value to the canonical human-readable format, and finally displays the date string. All done.
So what’s the problem? When the program hits the second puts
call, it hangs, and I have to hit ctrl-z to break out of it. That’s peculiar enough, given how many times I’ve successfully used puts
, time
, and ctime
in short examples.
The program assembles and links without problems, using the makefile shown above the program itself. I’ve traced it in a debugger, and all the parameters passed into the functions and their return values are as they should be. Even in a debugger, when the code calls the second instance of puts
, it hangs.
Ok. Now here’s the really weird part: If you comment out one of the two puts
calls (it doesn’t matter which one) the program doesn’t hang. One of the lines of text isn’t displayed but the calls to time
and ctime
work normally.
I’ve googled the crap out of this problem and haven’t come up with anything useful. My guess is that there’s some stack shenanigans somewhere, but all the register values look fine in the debugger, and the pointer passed back in rax by ctime
does indeed point to the canonical null-terminated text string. The prolog creates the stack frame, and the epilog destroys it. My code doesn’t push anything between the prolog and epilog. All it does is make four calls into libc. It can successfully make three calls into libc…but not four.
Do you have to clean up the stack somehow after a plain vanilla x64 call into libc? That seems unlikely. And if so, why doesn’t the problem happen when the other three calls take place?
Hello, wall. Anybody got any suggestions?
A cursory (so to speak – becomes a pun shortly) glance over the clib puts function says it has a file cursor/file pointer. In my dim memory from the dinosaur years, I vaguely recall these have to be reset between uses, and I note there’s a clib.rewind() which rewinds the file cursor to the beginning of the file.
I confirmed this on stackoverflow.
https://stackoverflow.com/questions/32366665/resetting-pointer-to-the-start-of-file
-JRS
I believe the above is true for gets, but not for puts.
I downloaded the provided code and it worked perfectly on the first Linux box I tried.
I even hacked up an additional version with an additional puts included and that worked as well.
Something odd if happening here. I’ll post more details shortly.
bbuhler@gateway:~/assembly/jeffd/libc-examp$ ./showtime
The timestamp is:
Sat Nov 26 14:48:00 2022
bbuhler@gateway:~/assembly/jeffd/libc-examp$
Which distro and what version did you use?
Debian Buster upgraded from stretch.
bbuhler@gateway:~$ uname -a
Linux gateway.buhlerfamily.org 4.19.0-22-amd64 #1 SMP Debian 4.19.2
60-1 (2022-09-29) x86_64 GNU/Linux
bbuhler@gateway:~$
I’ll be able to check other machines / distros Monday.
UPDATE: (granted, not much of one…)
I pulled an old Win7 machine off the shelf, wiped the disk, and installed Linux Mint Cinnamon 21 from a thumbdrive, which I had loaded with the installer via Balena Etcher. So it was an absolutely fresh install, and I did not install anything on it but NASM and gcc-multilib. (ld requires that for builds.) Built showtime.
Same precise problem. No change. I had installed a lot of other things on my other two project machines, so a clean install with no more than I needed to build the executable was the goal.
The only thing all three machines have in common (apart from being Linux Mint or Kubuntu) was that I installed all three of them with Balena Etcher. I can’t imagine that that’s the problem. Still open to suggestions.
What if you yank the call to time? Just hard-code a fixed time value?
If it continues to behave the same way (one puts or the other is fine, but both crash), then you’ve simplified the test case by a step.
If it behaves differently, that will probably be another interesting clue… For example, if the problem is actually tripped at 4 calls to libc in total, but 3 is fine.
Depending on the behavior after skipping the time call, yanking ctime could produce even more clues.
Further, if the time call is… I don’t want to say “idempotent” but if you call it multiple times and overwrite the obtained data, and only care about the last result (even if the clock advances all the while), then that’s another way to make sure that it’s not a sensitivity to the number of calls into libc.
I’ve played around in that mode with it. I’ve commented out the call to time but left the others in. Without the call to time, the problem goes away. Commenting out ctime gets me a segfault, which isn’t surprising and about what you’d expect, given that ctime fills rax with the address of the time string. Without ctime, that address is probably garbage.
Separate idea, thus separate reply. 😀
When you single-stepped in the debugger, I assume you checked the return value on the puts calls?
Relatedly, you might throw strace at it to monitor the syscalls that reach the kernel, see if there are any clues in the underlying write() calls.
I did. In every case, after calling puts, rax contained the number of chars displayed.
I’m no Intel assembly expert but I compared the assembly of the equivalent function in C. I noted the my test code clears edi, rather than rax.
I ran your assembly code and got the same symptoms you described. I then changed xor rax, rax to be xor edi,edi. The modified code ran as expected.
Perhaps an expert, such as yourself might understand this change.
YIKES! This fixed the problem! Good grief! I never thought about something: The rdi register is used a lot in function calls; basically, in x64, any function call with at least one parameter will use rdi to pass in the parameter to the function. The time function is the only one without an inbound parameter. (The result comes back in rax.) That may imply that time is using rdi internally for purposes of its own, and if rdi carries some weird value into the time function, time freaks out. This doesn’t explain why other people have built showtime.asm with my own makefile and not had any trouble. I don’t remember reading anywhere that you should clear rdi before calling time, and over the past week or so I’ve read everything about the time function that I could find.
Ok. My guess here is that rdi comes out of puts carrying something that time doesn’t like.
Breaking news… I browsed the source online for libc-2.31-0 which is what my Linux Mint 20.0 box has installed… found this in time.h:
/* Return the current time and put it in *TIMER if TIMER is not NULL. */
extern time_t time (time_t *__timer) __THROW;
So it secretly has a single parameter… looks like you were writing to potentially arbitrary memory by not zeroing it out first.
And now I’m looking at this SO thread:
https://stackoverflow.com/questions/5141960/get-the-current-time-in-c
and it appears that all the example either pass the time_t by reference in arg#1 or use the return value but pass a 0 or NULL.
Am I spoiled by too much Perl programming where localtime(time) didn’t require any special parameters to the time call?
I’m starting to feel like it’s a 1984 situation… “we have always been calling time() with some sort of first parameter!”
DOUBLE YIKES! I freely admit I didn’t read the libc source, though if I got desperate enough, I would have. And you’d think online or in one of the numerous books I have on C and libc (granted, not as many as on Pascal and Delphi, heh) that it would have been mentioned.
This makes sense, given that after the first puts call, rdi contains a full 64-bit memory address. It is definitely NOT null.
When I created the C equivalent I determined the time function signature from this GNU page (https://www.gnu.org/software/libc/manual/html_node/Getting-the-Time.html). That indicated that I’d need to pass NULL as a parameter. Unfortunately, I’m not familiar with the assembly calling convention on Intel chips, so I didn’t recognise that as the issue. However, you seem to have explained that.
Based on what you’ve described, not setting the parameter to NULL is going to update the memory at whatever memory edi is pointing at. Again, I’m not sufficiently knowledgeable on Intel assembly to know whether the value in edi will be predictable based on the earlier calls.
I ran it through a debugger. Before filling rdi with the address of the string to be displayed by puts, rdi has the value 1. After the string has been displayed, rdi contains a 64-bit memory address. I looked at what the address points to, and it’s a region filled with nulls. Not sure why rdi’s contents are germane to time’s operation, but they are.
I believe what I would try is to compile the equivalent C program and look at the code the compiler generated, to see whether that gives any clues about something the C library expects to be done that your assembly code is not doing.
I don’t have a Linux system available to me, or I would try that myself and tell you whether I saw anything helpful.
Another test I might try is to modify your assembly code to be a function callable from C, compile and run a C program that calls that function, and see whether your assembly code functions properly in that situation. If it does, then the problem probably is something in the C environment that is not getting set up properly in your original program. I don’t know how you would track down what that was, if that turned out to be the case.
Hi Jeff
Under NASM, I would have coded:
lea rdi,[time1] ; loads rdi with the address of time1
call ctime
etc.
move rdi,time1 … will load rdi with the 8 bytes starting at time1, not its address.
May be I am on the wrong track?
Regards
David