Like the last communication from Turntable.fm on this matter, I’ll quote their entire blog post and then link to the post if you wish to leave a comment for the team. While annoying, these issues are the sorts of things a systems team has to handle on a regular basis and it is nice to see the Turntable.fm team openly explaining what they are facing and what they are doing to address the problems.
“There have been increasing reports of connectivity problems on the site including DJs being disconnected, users being kicked out of rooms and chat messages being lagged or dropped. We are very aware of these issues and are doing everything that we can to try to fix them as quickly as possible to get Turntable back to normal.
We initially thought that the problem was caused by either the new server we added in anticipation of the iPhone client launch, or by the change in traffic characteristics that resulted from the iPhone clients. We use software that allows us to serve thousands of clients from one machine, but that sort of software tends to have problems when any of those clients are slow or block. We dug into it more, though, and realized that this wasn’t the problem that we’re experiencing.
We believe that the problems we have been seeing are all manifestations of the same issue but, unfortunately, we have had a harder time trying to track it down than we anticipated. Since we ruled out other suspected causes we’ve been poring through bug reports and server logs, dumping TCP traffic, debugging code on both the client-side and backend and working with authors of third-party libraries. We think that we are narrowing in on the problem, gradually shrinking the size of the haystack.
For the technically-inclined, we are seeing flakiness in our websocket connection from the browser to our servers. Sometimes this results in missing “heartbeat” messages, which makes the server think that you’re disconnected, causing it to remove you from a DJ spot or remove you from the room. Sometimes those messages are chat messages and they are lagged or just never come through, making your chat experience flaky.
In addition to this, we also had a bug in our code that was leading to communication problems between out servers, resulting in problems with things like search and upload. We initially thought that this problem was related with the bigger socket bug, but we recently discovered that it was a separate issue, found the source of the problem and rolled out a code fix that should make those services much more stable now.
We have finally managed to replicate the main problem in our development and testing environments, which is a major step toward our being able to fix it. We have all available engineers working to debug this code, trying every technique that we know of to track down a bug, from capturing and analyzing traffic, to inserting as many debugging statements as possible in the code to track down where the error occurs.
Thank you for staying with us while we bring this problem under control.
atomly, Sr. Software Engineer”
Link: Where We Are Now