Saturday evening, 10:30 PM, Seattle time. Window 7 RC's audio stack is pumping music from the Internet Explorer process to the laptop speakers, the process manager boosts Word 2007 with foreground priority. Bart is writing something yet unannounced. Live Messenger (not to be renamed Bing Messenger) is running too. 10 + 9 = 7, at least in clock arithmetic, a quick calculation revealing the hour component of the time at "home", in Belgium. Little Facebook activity going on, not expecting too many log-on notifications on Messenger either. Affordable to swap in lots of context and concentrate on the writing job at hand. Boosting the IRQ level too. Life is good.
10:35 PM, Seattle time. A log-on notification message disturbs the graphical memory in the bottom right corner of the screen, after being fixed with some of Word's chrome for zooming and scrolling. The MSNP protocol is doing its work faithfully. The audio stack nicely overlaps the music with the log-on sound. An interrupt takes place. Old friend in Belgium, awake at 7:35 AM in a Sunday morning? Strange. A conversation is initiated, and a few watts in the data center are spent connecting two old friends together through the electronic alleyways of a gigantic TCP/IP world-scale network. An ethernet package with payload "too early - go to bed :)", with the last two-length character sequence nicely turned into a yellow happy face, leaves the NIC's port. An answer "emergency" followed by the brother's of the aforementioned yellow face travels back. Rang out of bed for some server going down it seems. None of my business. For a while at least...
Makes me wonder why I always have to inform for the software being used. Curious to hear about our stuff put to practice I guess. Admitted, I felt quite happy to hear Windows Server was involved. Not the latest release, but hey: better than nothing. Quite happy, but a bit unconfortable too: what piece of software was to blame, if any? Curiosity, that was I think it is. Or maybe my subconsience was in search for an excuse to deflect further talks about technology. Don't want to spoil a sunny morning in Ghent, Belgium - sunny, according to an orphaned sidebar gadget, floating freely on the Windows 7 desktop now. Let's not go deeper into an analysis of the subconsious me, I'm not a qualified psychologist, nor will I become one, and qualified ones haven't had any luck to reverse engineer my dark cells so far. Anyway, we suspend the conversation for a while. Problem being investigated on the other side; using the window switcher to get back to Word 2007. Smalltalk, in any form, I like it.
10:41 PM, Seattle time. A new IM message arrives. Error messages are being pasted. Should I regret having initiated a conversation to begin with? Maybe. Could turn out to be a fun exercise of psychic remote debugging. It takes 5 minutes to make Messenger turn to the Away state. Can't fake it too obviously. Should I respond? Let's glimpse over the message a second time:
An IO error occurred during FTP transport, No buffer space available (maximum connections reached?): JVM_Bind, No buffer space available (maximum connections reached?)
Should I make a silly joke about certain three-letter acronyms in the message? Nah, let's be fair. Wouldn't help the other side, and my technological bias is already known. Obviously I don't like the FTP protocol, or what did you think I was referring to? Actually I like the error message with built-in redundancy. They seem to be quite sure about the cause. But what does some error reporting component like to tell us? Clearly "No buffer space available (maximum connections reached?)".
Bing search did reveal a few useful hits, but some further diagnostics would be desirable. Time for me to reveal there's a bit of network admin in me. Just a tiny bit, but enough for the job at hand. Hints indicate it might reveal a WSAENOBUFS error code. Good old WinSock. Netstat might tell us something. Let's ask for a netstat -n -b -a output. Don't want to look like a dicky-doo-dah, let's try myself first. A hint to our DOS background "C:\_" catches my eye on the Windows 7 superbar. A few ALT-TAB presses will bring me there. Aarg, console out is fed with a string buffer containing the words "The requested operation requires elevation.". I'm in the post-Vista era. Oh well, I trust my main memory, press <ENTER> in the Live Messenger window.
A text file travels back, capturing output of the requested command:
Proto Local Address Foreign Address State PID
TCP 0.0.0.0:21 0.0.0.0:0 LISTENING 13668
TCP 0.0.0.0:22 0.0.0.0:0 LISTENING 13668
TCP 0.0.0.0:80 0.0.0.0:0 LISTENING 13068
TCP 172.26.28.30:1026 172.26.27.22:21 CLOSE_WAIT 15384
TCP 172.26.28.30:1027 172.26.28.30:21 CLOSE_WAIT 15384
TCP 172.26.28.30:1028 172.26.28.30:21 CLOSE_WAIT 15384
TCP 172.26.28.30:4998 172.26.28.30:21 CLOSE_WAIT 15384
TCP 172.26.28.30:4999 172.26.28.30:21 CLOSE_WAIT 15384
TCP 172.26.28.30:5000 172.26.27.22:21 CLOSE_WAIT 15384
TCP 127.0.0.1:383 127.0.0.1:4696 TIME_WAIT 0
TCP 127.0.0.1:4281 127.0.0.1:2485 TIME_WAIT 0
TCP 127.0.0.1:4696 127.0.0.1:2485 TIME_WAIT 0
TCP 172.26.28.30:4281 172.26.28.31:445 TIME_WAIT 0
UDP 0.0.0.0:135 *:* 1936
Nice to see some good old friends like IIS. But back to business. A Dictionary<string,int> in my main memory is consulted for a lookup on key "FTP", providing a prompt response of 21. What about looking for :21 in the file? Hmm, quite a bit of hits. Port 1026, CLOSE_WAIT. 1027 same deal, not any different for 1028. A scroll bar decorates the right border of my favorite editor, Notepad. WM_VSCROLL messages are pumped, SB_THUMBTRACK joins the party. 4998 involved in an intimate though partly terminated relationship with 21 too. Same for 4999 and 5000. 21 is no further engaged. Strange, a round number. Port 21 is not monogamic for sure. But 3975 desparate ports jumping on it, can it cope with that? Something's going on here.
80 centimeters behind me lies the answer in dead tree format. I reach out to my bookshelve. A blue book, purchased 08/03/03 in Ghent according to a sticker on the back, is removed from its spot where it's been sitting since its overseas transportation almost two years ago. Not too dusty though. The title is still clearly readable: Windows Server 2003 TCP/IP Protocols and Services. Yes, I admit. I even posess books on IIS 6.0, Exchange 2003, Active Directory Services and PKI in Windows Server 2003. Not just for decoration, I've read significant portions of it in a distant past. Oh, those student times with plenty of time. Dreaming back of times with my chat buddy, sitting next to one another in courses. Pretending to pay attention. Sometimes we actually did. A power state transition request for S1 is made in my head, but not granted. I can't fall asleep now. The Book Browser service is started. Damn this book looks boring. Payload diagrams, sequence diagrams, tables. Some yellow marker reveals I've actually gone through it, ever, but the book Browser Service keeps triggering cache misses.
What am I looking for anyway? Oh yes, huge number of connections in CLOSE_WAIT state. Luckily, the Book Browser service declared a dependency on the Indexer service. It's consulted to load pages in the upper range of the dead tree volume, triggering a search algorithm for LCID 1033, and responds to the "CLOSE_WAIT" query with an offset 329. The Book Browser service takes over again and efficiently locates and maps in the page. A table explains: "A FIN-ACK has been received and a FIN-ACK has been sent.". Useful to know. Is there more? Figure 13-7 reveals TCP connection states. I remember it was quite involved but the number of arrows makes me dizzy. A bit of Pineapple Orange Strawberry juice will help to rejuice my power supplies. The linear page scanning is continued, and pauses at "Controlling TCP Connection Terminations in the Windows Server 2003 Family and Windows XP". Blah, blah. TcpTimedWaitDelay. What's that? Default value of 240, the number of seconds that a TCP connection remains in the TIME WAIT state. No, we're in CLOSE_WAIT still. Lookup continues. Hints appear for a MaxUserPort setting: "MaxUserPort specifies the maximum port number .... The default value of MaxUserPort is 5000." Bingo. We're on to something.
So, what happened? Some uploader for a content management system, hosted by dllhost.exe, is overly ambitious to upload files in parallel and exhausts the number of ports available to applications. All of this in a short period of time, not enough for ports to be reclaimed. MaxUserPort is the cure. At least just for the symptoms. Clearly, there's a bigger problem here. Why is the CMS so greedy with sockets? I start the discussion. Being an emergency, my chat buddy decides to boost the setting, and requests a reboot by the data center folks. They'll look at the underlying problem in the near future. A case of premature optimization I feel. 4000 or so parallel connections. Wonder how that scales. Wasn't FTP stateful? Reuse, pooling, whatever. Local network communication it follows from context, why no SMB file copy operations? Lots of questions.
Problem solved for now, and we chatted a bit more about the weather. Smalltalk, you know?
PS: Why does Bart have a book on network protocols? There's been a time I implemented network protocols in managed code, for fun. I love the specifications, their relatively high level of precision, and the great discussion topic they form for lunch conversations.
| Digg It