Your browser is no longer supported! Please upgrade your web browser now.

Harvest Availability Issues October 4th

This morning was the worst outage Harvest has experienced in many years and we are embarrassed. Our customers expect the best from Harvest and there is no excuse for failing in this way. Here’s what happened and how we are proceeding.

The summary of the issue is that sudden high traffic volume started to overwhelm our load balancers, firewalls and then our clustering tools. The effects lasted for 2 hours. It took us some time to find the core problems and put emergency resolution in place. Read on for a more technical description.

To give you some context, the recent application features we have deployed, and the recent third-party integrations we launched have introduced a new traffic pattern for the average Harvest customer’s usage which is almost 300% greater than last week’s level¬†and this continues to grow. Despite being overprovisioned in terms of servers and network bandwidth for precisely this eventuality, a few unfortunate artificial thresholds were suddenly exceeded this morning, causing cascading problems.

This morning around 10:16am EDT our various alert systems started to let us know that Harvest was running slow and some requests were greeted with an ugly error page.

The first threshold that was exceeded was the number of files that our Linux load balancers could keep open at any one time in order to serve and track customer requests. We fixed this rapidly. The second threshold which occurred instantanouesly is that the huge traffic spike began to overwhelm our firewalls. The third problem soon surfaced when failing firewalls began to wreak havoc with our clustering tools.

For a period of roughly 45 minutes, failing network clustering operations meant that customer requests were often not reaching Harvest servers at all. This proved problematic to troubleshoot and resulted in us removing our clustering logic altogether to get Harvest back online as soon as possible. We were forced to put Harvest in maintenance mode when it came back online to reduce the giant backlog of requests so as not to overwhelm the system when it came back online.

These three problems together took around 2 hours to resolve and to get Harvest back online.

The above is not an excuse for being down for two hours during the time of day that many customers use Harvest the most. We have stabilized the known issues and are taking extensive measures to ensure that these basic issues never occur again. We have more than sufficient capacity to handle orders of magnitude sudden growth, but this morning some poor server configuration caused issues.

Thank you for being patient while we brought Harvest back online. As a reminder, we maintain transparent system status updates at HarvestStatus.com to keep you informed during any issues.

Thoughts or questions about this post? Need some help?
Get in touch →

This was posted in #status, Product News.
  • While when some services go down, I’m first to go up and arms when a paid service I rely on goes down during peak hours.

    That being said, an issue happened, it was resolved relatively quickly, but most importantly, Harvest outlined in common sense terms, same day what happened, how it was resolved, and they attempt to avoid it in the future. And ultimately I love Harvest that much more.

    I think internet users are understanding that sh!t happens, it’s just we are use to being told, “we’re holding it wrong” or lots of apologies and no assurances. Overall we want complete transparency so we can make our own decisions and for that, thank you Harvest. You’re the best.

  • Thanks for the support Christopher, we do appreciate it.

  • Awesome to read some common sense responses to your technical problems and the resolution that followed. Much better than other administrators who adopt the hear no evil, see no evil, speak no evil approach to the elephant in the room when these problems occur! As a user in south eastern Australia, Harvest is quality software and my customers love the professional appearance. Keep up the great work.

  • These things happen occasionally, to every service. You guys recovered quick, learnt, and were transparent- can’t ask for more.

  • Brian Campbell on October 10, 2012

    Nik has summed up my feelings on it. It is refreshing to see companies that are not scared of honesty or are trying to put a spin on it. Other companies (cough, cough, apple…) can learn from you guys here.

  • Thanks Nik and Brian. Your support is appreciated here at Harvest.

Comments have been closed for this post.
Still have questions? Contact our support team →