Using GDB on Windows
I know many people have written about using GDB but who knows I might write something useful to you!
Attaching to a Running Process
So, you have a process on a machine (perhaps your pre-deployment machine) and it is doing something silly, you need to break it and see where it is blocked. Here you will see an actual problem I had and how I tried to solve it.
Step 1 get MinGW (I am using MinGW-64) on the machine where the server process is running and attach
to the running
process.
(gdb) attach 121832 Attaching to process 121832 [New Thread 121832.0x1dbec] [New Thread 121832.0x1dbf0] [New Thread 121832.0x20a8c] Reading symbols from C:\Users\Administrator\Desktop\clusterclient\bin\clusterclient.exe...done. 0x0000000077510591 in ntdll!DbgBreakPoint () from C:\Windows\SYSTEM32\ntdll.dll (gdb)
121832
is the PID and you can find that using ps
or in the windows task manager. I did compile symbols into my code so
everything is OK but if you did not you can load symbols from a different executable if you have an executable compiled from the exact same source
using the same compiler with debugging information using symbol-file
.
(gdb) info threads Id Target Id Frame * 3 Thread 4132.0x1378 0x00000000771e0591 in ntdll!DbgBreakPoint () from C:\Windows\SYSTEM32\ntdll.dll 2 Thread 4132.0x114c 0x00000000771e12fa in ntdll!ZwWaitForSingleObject () from C:\Windows\SYSTEM32\ntdll.dll 1 Thread 4132.0x107c 0x00000000771e12fa in ntdll!ZwWaitForSingleObject () from C:\Windows\SYSTEM32\ntdll.dll (gdb) thread 1 [Switching to thread 1 (Thread 4132.0x107c)] #0 0x00000000771e12fa in ntdll!ZwWaitForSingleObject () from C:\Windows\SYSTEM32\ntdll.dll (gdb)
The programme is a communications programme and spends a lot of time waiting on a socket or a semaphore. This is frustrating because the stack trace
does not show anything useful when queried with backtrace
or bt
(these commands are the same).
(gdb) bt #0 0x00000000771e12fa in ntdll!ZwWaitForSingleObject () from C:\Windows\SYSTEM32\ntdll.dll #1 0x000007fefc790f75 in WSPStartup () from C:\Windows\system32\mswsock.dll Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb)
To get out of the waiting function we can use step.
(gdb) step Single stepping until exit from function ntdll!ZwWaitForSingleObject, which has no line number information. [Thread 2208.0xa30 exited with code 0] 0x000007fefc790f75 in WSPStartup () from C:\Windows\system32\mswsock.dll (gdb) step Single stepping until exit from function WSPStartup, which has no line number information. 0x000007fefeb64efc in select () from C:\Windows\system32\ws2_32.dll (gdb) backtrace #0 0x000007fefeb64efc in select () from C:\Windows\system32\ws2_32.dll #1 0x000007fefeb64e7d in select () from C:\Windows\system32\ws2_32.dll #2 0x00000000004078ec in wait_for_data (sock=100, timeout=30000) at src/socketsutility.c:21 #3 0x000000000040793f in recv_fill_buf (sock=100, buf=0x22fd60, len=16, bytesRead=0x22fd5c, timeout=30000) at src/socketsutility.c:41 #4 0x0000000000405136 in packet_recv (ctx=0x5d5ab0, timeout=30000) at src/application.c:68 #5 0x0000000000403bfc in main (argc=1, argv=0x5d5f60) at src/main.c:201 (gdb)
Now you can see that I have returned to select() which is the call I made to wait for the socket. and the backtrace
should work
properly now. Ordinarily you might expect to use finish
to get back, unfortunately that requires debugging information and so will
sometimes return an error:
(gdb) thread 1 [Switching to thread 1 (Thread 2208.0x1244)] #0 0x00000000771e12fa in ntdll!ZwWaitForSingleObject () from C:\Windows\SYSTEM32\ntdll.dll (gdb) finish Run till exit from #0 0x00000000771e12fa in ntdll!ZwWaitForSingleObject () from C:\Windows\SYSTEM32\ntdll.dll [Thread 2208.0xd10 exited with code 0] 0x000007fefc790f75 in WSPStartup () from C:\Windows\system32\mswsock.dll (gdb) finish "finish" not meaningful in the outermost frame. (gdb) step Single stepping until exit from function WSPStartup, which has no line number information. 0x000007fefeb64efc in select () from C:\Windows\system32\ws2_32.dll (gdb)
From my back trace I can see where execution is currently and I can look at my code and see what is wrong. Firstly looking at
wait_for_data()
static int wait_for_data(SOCKET sock, uint32_t timeout) { struct timeval s_timeout; fd_set socks; int readsocks; if (timeout == 0) { timeout = 5*1000; } s_timeout.tv_sec = timeout / 1000; s_timeout.tv_usec = (timeout % 1000) * 1000; FD_ZERO(&socks); FD_SET(sock, &socks); readsocks = select(sock+1, &socks, (fd_set *)0, (fd_set *)0, (PTIMEVAL)&s_timeout); if (readsocks == 0) { return -1; } return 0; }
Well, that is simple, I cannot see any problems there... lets step until we come to the next function recv_fill_buf
you can do this in
several ways, the best is probably to set a breakpoint, this is easy if you have your source file, just type break socketsutility.c:42
since
we know that line 42 is the next line after the call to wait_for_data()
. Naturally it would be nice to know the value of res
.
(gdb) break socketsutility.c:42 Breakpoint 1 at 0x407942: file src/socketsutility.c, line 42. (gdb) cont Continuing. [Thread 5340.0x169c exited with code 0] Breakpoint 1, recv_fill_buf (sock=100, buf=0x22fd60, len=16, bytesRead=0x22fd5c, timeout=30000) at src/socketsutility.c:42 42 if (res == -1) { (gdb) print res $2 = 1 (gdb)
Or you could do it the long way round, please note that you should try and use finish
rather than step
if you are not in your
own code. GDB locked up on my machine when I tried step
instead of finish
.
(gdb) step
Single stepping until exit from function ntdll!ZwWaitForSingleObject,
which has no line number information.
[Thread 5340.0x1518 exited with code 0]
0x000007fefc790f75 in WSPStartup () from C:\Windows\system32\mswsock.dll
(gdb) step
Single stepping until exit from function WSPStartup,
which has no line number information.
0x000007fefeb64efc in select () from C:\Windows\system32\ws2_32.dll
(gdb) finish <-- if you use step here gdb seems to crash
Run till exit from #0 0x000007fefeb64efc in select ()
from C:\Windows\system32\ws2_32.dll
0x000007fefeb64e7d in select () from C:\Windows\system32\ws2_32.dll
(gdb) finish
Run till exit from #0 0x000007fefeb64e7d in select ()
from C:\Windows\system32\ws2_32.dll
0x00000000004078ec in wait_for_data (sock=100, timeout=30000)
at src/socketsutility.c:21
21 readsocks = select(sock+1, &socks, (fd_set *)0, (fd_set *)0, (PTIMEVAL)&s_timeout);
(gdb) finish
Run till exit from #0 0x00000000004078ec in wait_for_data (sock=100,
timeout=30000) at src/socketsutility.c:21
0x000000000040793f in recv_fill_buf (sock=100, buf=0x22fd60, len=16,
bytesRead=0x22fd5c, timeout=30000) at src/socketsutility.c:41
41 res = wait_for_data(sock, timeout);
Value returned is $1 = 1
(gdb) step
42 if (res == -1) {
(gdb)
The source for the recv_fill_buf()
function.
int recv_fill_buf(SOCKET sock, void * buf, int len, int * bytesRead, uint32_t timeout) {
int res;
int bytesReceived;
int totalBytesReceived = 0;
while (totalBytesReceived < len) {
res = wait_for_data(sock, timeout);
if (res == -1) { <-- line number 42
res = 0;
break;
}
bytesReceived = recv(sock, (char *)((PBYTE)buf) + totalBytesReceived, len - totalBytesReceived, 0);
if (bytesReceived > 0) {
totalBytesReceived += bytesReceived;
}
if (bytesReceived < 0) {
res = -1;
break;
}
}
*bytesRead = totalBytesReceived;
return res;
}
Whilst I am sure you noticed the error already (I am a plonker and I noticed it immediately too). On a Linux machine this code would not go
into an infinate loop but rather a SIGPIPE
would be sent to the process, this is no excuse for shoddy code though. Lets follow it
through anyway...
(gdb) print res $1 = 0 (gdb) step 46 bytesReceived = recv(sock, (char *)((PBYTE)buf) + totalBytesReceived, len - totalBytesReceived, 0); (gdb) step 47 if (bytesReceived > 0) { (gdb) print bytesReceived $2 = 0 (gdb) step 50 if (bytesReceived < 0) { (gdb) step 40 while (totalBytesReceived < len) { (gdb) step 41 res = wait_for_data(sock, timeout); (gdb)
Reading the Windows documentation, (search for it) you will notice a line under Return value that says:
If the connection has been gracefully closed, the return value is zero.
Did I check for a 0... NO! fortunately it took many times longer to write this than it did to discover this bug and now I know more about how to use GDB on Windows.