Troubleshooting MySQL connection pool timeouts in ADO.NET

Twitter followers: this post deals with MySQL troubleshooting which might not mean anything to you, depending on why you follow me. Still, welcome! Here’s a funny cat picture for you!

Since I spent a chunk of my Saturday figuring this one out and there didn’t seem to be a lot of existing guidance online (maybe because not too many people are BRILLIANT GENIUSES who use ASP.NET with MySQL, and I’m not being sarcastic, it’s really my favourite thing ever, and if you’re also a BRILLIANT GENIUS, let’s talk), I thought I’d share.

This paragraph serves no actual purpose other than to explain that the error text below isn’t an actual error, it’s the message that’s appearing in our logs that we want to fix. Nothing to do with this blog’s software, which at this time doesn’t event use .NET. Anyway:

Exception: System.InvalidOperationException
Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

Yeah, that sucks. Plus it doesn’t happen all the time. But it could happen anytime. Did I mention it sucks?

A brief backgrounder before we fix the problem:

The MySQL connector for ADO.NET has the option of using a thing called a connection pool to speed up performance. You’ve probably heard this already, but database operations are generally cheap, performance-wise, but connections are expensive. Your connector can basically cache these connections for you.

Every time you connect to MySQL, you need a connection (duh). If your previous connection didn’t close down properly, the connector can’t give you that old one again, because it’s still in use. There are all kinds of reasons why this might happen, but since we are BRILLIANT GENIUSES we don’t need to know them all, we just need to know how to find them and stomp them flat.

(I’ll just point out that even if you close the connection, if you didn’t close the resources using the connection (like, for example, a DataReader), the connection will stay in use. You know, not like that happened to us today or anything, just saying.)

So, first off, how can you tell if you’ve got a potential leak? One way is to wait for the exception to show up in production. Another is to load test your application, but the trick is that you’ll need to hit the part of your code that’s failing with the load test, which may not happen if it’s obscure enough.

The way I’ll show you now is to run a MySQL command called SHOW FULL PROCESSLIST.

This’ll give you a dump of all the active threads in MySQL, along with the username, client host, and database that’s involved. You can run this in your development environment. Depending on your setup, you might have one, two, or seventeen of these things (or some other number greater than zero), Don’t worry too much about the exact number. The key is that the number of open threads shouldn’t be rising as you walk through your application.

OK, so now we’ve managed to find a situation in our test environment (which isn’t called “prod,” right?) where the number of open threads starts at one number, and then we do a repeatable sequence of actions in the app, and the number of open threads increases. And it keeps increasing every time we do that sequence of actions until we hit 100 or so (your default may vary) and then you can’t use the system anymore.

Yay!

From here it’s a pretty straightforward process with either a debugger or, uh, commenting out blocks of code (what, my home machine has issues with the debugger, stop laughing) until you find the call that’s failing. Fix that, run some more tests with periodic checks to the process list, and you’re all good.

OK, so I don’t want this to ever happen again. How can I modify my build process to test for this kind of thing? Guess what? I’ve got a plan!

All of our data access routines are done by Factory objects. I’m still debating whether or not to call them Managers like they do in Patterns of Enterprise Application Architecture, but they create DTOs, so I’m comfortable with the Factory idea, at least for the loads. But I digress. Furthermore, all of our data routines are usually named something starting with the word “Load.” So, the theory goes, if I create an integration-level unit test (I don’t want to run this all the time, just before checkins) that uses reflection to find all the classes that end in “Factory” and run all the public methods inside that start with “Load” 150 times I think we’ll have about 97% certainty that we haven’t introduced any new
leaks. We’ll see how that goes – it can wait until Monday when I’m in an environment that doesn’t include a fridge full of beer.

Comments

2 responses to “Troubleshooting MySQL connection pool timeouts in ADO.NET”

carno

January 30, 2009

I’m having same prob; have you figure out this with factory?
Jason

January 31, 2009

I never ended up finishing up on that; we switched to NHibernate for data management, which has different issues, but not that one, thankfully.

Troubleshooting MySQL connection pool timeouts in ADO.NET

Comments

2 responses to “Troubleshooting MySQL connection pool timeouts in ADO.NET”

Leave a Reply