Your browser is no longer supported! Please upgrade your web browser now.
#status posts:

Heartbleed and Harvest

On Monday, April 7th, there was a update released for the OpenSSL library to address security vulnerability CVE-2014-0160, more commonly known as the Heartbleed bug.

OpenSSL is widely used by many websites, including Harvest, to securely and privately transmit data on the Internet. The information exposed by the Heartbleed bug could allow an attacker to eavesdrop on these communications and steal data that could be tampered with or used to impersonate users.

Since the announcement, we have upgraded all of our infrastructure and Harvest is no longer vulnerable to Heartbleed.

We have no evidence that this exploit was used against Harvest. However, the nature of this attack makes detection very difficult, so we are being very cautious and aggressively updating anything that may have been compromised.

What you should do

In order to protect your account, you should do the following as soon as possible:

  1. Change your Harvest password. Go to your Profile Menu (upper right corner) and click My Profile. Click the Security tab, enter a new password, and click Reset Password.
  2. Revoke your access tokens for authorized applications. On the same Security page as step 1, you may see an “Authorized Applications” section. You should revoke any tokens listed there, as they have the same access to your account as your password. Revoking these tokens will log you out of any applications listed (such as Harvest for iPhone or Harvest for Mac).

What we’ve done

We upgraded OpenSSL for all of our main web application servers within minutes of the official announcement on April 7th.

One server, our internal infrastructure management tool which does not contain customer data, was not upgraded for three days because we had to wait for a new software release from an external vendor. This last server was upgraded on April 10th.

Once all of our systems were upgraded, we were no longer vulnerable to an attacker collecting any new data, but pre-existing things like the private keys for our SSL certificates could have been previously stolen. To mitigate this threat, we regenerated new private keys for all of our servers and had all of our SSL certificates reissued.

Unfortunately, this already-lengthy process took longer than anticipated, because we had to rely on a vendor to reissue our new SSL certificates, and this vendor introduced a bug into the certificate-issuing process that resulted in many instances of faulty certificates being issued. This bug impacted a number of their customers in addition to Harvest. We ultimately had to select a new vendor.

Lastly, we reset all user sessions to expire any sessions that may have been hijacked during the vulnerability window. You may have been logged out of Harvest and forced to log back in — that was a side effect of this reset.

While upgrading our infrastructure due to Heartbleed, we also hit an unrelated operating system bug which caused two brief Harvest outages. This made the introduction of the upgraded software and the new SSL certiticates impactful to customers. We are truly sorry for this interruption.

Scheduled Maintenance, Monday November 4th 10:00pm – 10:15pm EST

We’re making some database changes to Harvest. To make sure we don’t timeout any critical tasks, we are planning on taking Harvest offline November 4th 10:00pm – 10:15pm EST while we perform these changes. What time is that for you?

Thank you for your support. We will keep our progress up to date on @harvest and HarvestStatus.

UPDATE: Maintenance has now been completed successfully at 10:20 PM EST. Everything is operating normally again. If you experience any problems, try clearing your web browser’s cache and reloading the page you’re on.  If you need assistance with this, or with anything else, please drop us a line at support@harvestapp.com. Thank you!

Site Availability Issues on September 23rd and September 24th

Over the past two days, Harvest has had two very short outages. On both Monday September 23rd at  5:30am EDT and Tuesday September 24th at 7:40am EDT, Harvest was unresponsive for around 3 minutes. These events are both caused by the same problem and we are working to resolve the issue as fast as possible. At 10pm EDT on Tuesday September 24th (what time is that for you?), we’ll be performing a brief database maintenance to resolve the issue. We don’t expect any service impact from this maintenance. Let me get into some of the technical issues behind these outages.

Over time our main database has grown steadily in size, and at a certain point the database becomes larger than the memory allocated to the database software on the servers that it is running on. There is an ancient art involved in getting this memory allocation just perfect, and we’ve found over time that it is possible to allocate too much memory to the database, and suffer poor performance as a result. We’ve found that gradually increasing the allocated memory as the database grows works well. Recently our databases have grown large enough that it has become more involved to restart a database server to increase its memory allocation, and to put that server directly back into action. We have large enough databases now that a database server with cold caches doesn’t perform well when put back into production. We need to warm the server’s cache gradually before the server can become a master server in our database cluster.

So we are left with a slightly more challenging situation than we had previously, and have had to adapt our procedure. The net result is that increasing the database memory allocation needs a new procedure, and it needs to be done in a staggered fashion, and the recent availability issues have been the result.

I apologize for the two issues with Harvest yesterday and this morning. We are taking the final steps to resolve this issue in a brief database maintenance tonight, Tuesday September 24th at 10pm EDT. We don’t expect the Harvest service to be impacted by this maintenance. Thanks for your patience, folks!

Scheduled Maintenance, Tuesday June 4th 10pm – 12am EDT

UPDATE: Maintenance has now been completed successfully at 12:33am EDT on June 5th.  Everything is operating as expected with Harvest once again.  If you experience any problems, try clearing your web browser’s cache and reloading the page you’re on.  If you need assistance with this, or with anything else, please drop us a line at support@harvestapp.com. Thank you!

UPDATE: Due to availability issues with one of our vendors, we have postponed this maintenance window to Tuesday June 4th, between 10pm – 12am EDT. What time is that for you?

We’re making some changes to the way Harvest is delivered from our servers to your computer (for those of you with Rails knowledge, we’re rolling out Asset Pipeline across Harvest). Because the changes affect every page in Harvest, we wanted to give you some advance warning just in case you’re using Harvest at the time and it doesn’t seem quite right.

Harvest will be in scheduled maintenance mode on Tuesday June 4th, between 10pm – 12am EDT.  What time is that for you? We are not planning to take Harvest offline during this maintenance window, but there could be temporary performance or availability issues during this window as we roll out this change.

As always, we appreciate your support! We will update the progress of this upgrade via @harvest and HarvestStatus.com

Scheduled Maintenance, Sunday March 3rd, 11am – 4pm EST (Completed)

UPDATE: This software update was successfully deployed with less than a minute or two of service interruption. Thanks for your patience as we rolled out this significant upgrade.

Original Post:

We deploy new software to production multiple times in the average work day, but some software releases contain so much new code that we need to be a little extra careful when we deploy them.  Over the past few weeks the Harvest team has been upgrading much of the Harvest code base and the time has come to deploy this to production. This upgrade will allow us to make better software by leveraging new features of our software libraries and will make future software upgrades easier.

Harvest will be in scheduled maintenance mode on Sunday March 3rd between 11am – 4pm EST. What time is that for you?  We are not planning to take Harvest offline during this maintenance window, but there could be temporary performance or availability issues during this window as we roll out this large software upgrade.

As always, we appreciate your support! We will update the progress of this upgrade via @harvest and HarvestStatus.com

Harvest Availability Related to Hurricane Sandy

Since Monday night, the flooding and power outages caused by Hurricane Sandy have severely affected our availability. Harvest’s primary datacenter is located in the Chelsea neighborhood of New York City. The building is home to Google’s NY operations and a major datacenter in the north eastern U.S. While flooding has not affected our datacenter, the power outages in New York City, despite backup generators, have caused repeated issues.

Our team has been working around the clock to keep Harvest available. This post is to keep you informed of our current status and share the details of this event with you thus far.

Current Status

Saturday, Nov 3rd 7:00am ET – At approximately 5:52PM ET on Friday Nov 2nd, Harvest became unavailable due to problems related to the restoration of commercial power in the datacenter. The Harvest team worked through the night to safely restore service after the transition back to commercial power. By 7:00am ET, Harvest service is once again accessible to all customers. We thank you for the extra patience during this weekend evening which allowed us to restore our service properly. With commercial power in place at the data center, we take a big step towards stable conditions.

Thursday, Nov 1st 1:45pm ET – We continue to operate normally on backup generators with plenty of fuel at the datacenter. Based on ConEd’s latest estimate, they expect power to be restored in lower Manhattan by Saturday.

Wednesday, Oct 31st 7:30pm ET – Harvest experienced intermittent network issues from approximately 6:17pm ET till about 7:00pm ET. During this time, some customers were not able to access Harvest. The network issues were resolved at the datacenter and service should be 100% accessible at this time.

Wednesday, Oct 31st 12:30pm ET – Harvest is available and operating normally. Due to the power outage in lower Manhattan, all services are still powered by backup generators. Our datacenter has ample fuel and the ability to refuel as needed. To minimize downtime for our customers, we will continue to operate as-is and monitor the situation closely. Based on a statement from Con Edison (the power company in New York City), it will take an estimated 3 more days before commercial power is restored in lower Manhattan. We will provide updates if and when estimates change.

In the meantime, we have a backup plan in place to migrate our services should we need to (more details on that below). Note that all data continues to be safely backed up on-site and off-site in several locations during this time.

Please also note that you can always check our status at http://harveststatus.com

What’s Been Happening

Around 11:35pm ET on Monday, Oct 29th, Harvest went offline due to a power outage in our datacenter. During this time, datacenter staff worked to restore power and network connectivity. By 6:25am ET on Tuesday, Oct 30th, service had been restored. Harvest continued to operate on backup power generators during this time. Harvest operated normally for the rest of Oct 30th.

At 5:48am ET on Tuesday, Oct 31st, Harvest experienced a second outage window due to power failures in neighboring datacenters. This caused network connection problems which made Harvest unreachable by customers. By 7:00am ET, network paths had been re-routed and made Harvest accessible to most customers again. By 9:00am ET, all known network issues were resolved and Harvest became accessible for all customers.

Contingency Plan

While our core datacenter issues were being addressed, our team has been working tirelessly to ensure that we can reliably serve Harvest from a different datacenter if necessary. After around the clock efforts led by Harvester Warwick Poole to deploy and test the setup in a new datacenter, with support provided by our entire team, we are ready to take this step if necessary.

In the event of another extensive outage, we will immediately begin to switch service to a new datacenter located in Dallas, Texas. We expect this move to take approximately 2 hours, during which time Harvest will not be available. Should we need make the switch, we will alert all account owners via email and provide information through Twitter and harveststatus.com.

Thank You

Thanks to all our customers for the understanding and overwhelming support sent our way during this crisis. We couldn’t be more proud to have you good people as our customers. We will continue to do our best to provide service to you during this challenging time. If you have any concerns, please do not hesitate to contact us at support@harvestapp.com.

Harvest Availability Issues October 4th

This morning was the worst outage Harvest has experienced in many years and we are embarrassed. Our customers expect the best from Harvest and there is no excuse for failing in this way. Here’s what happened and how we are proceeding.

The summary of the issue is that sudden high traffic volume started to overwhelm our load balancers, firewalls and then our clustering tools. The effects lasted for 2 hours. It took us some time to find the core problems and put emergency resolution in place. Read on for a more technical description.

Continue reading…

Details On Unexpected Outage on March 5th

This morning around 8:50am EST Harvest began to perform slowly and was unavailable for short periods of time. We averted the immediate issue by doubling the number of application processes available to serve customer requests while we examined the underlying issue.

The core issue behind this morning’s incident is the tremendous adoption rate of the newly released Harvest for Mac application. This application has a new server resource profile and Harvest has had to scale rapidly to accommodate it. To make sure Harvest for Mac is always using fresh data, the application performs frequent calls to Harvest servers for the current Timesheet data. As it turns out the adoption rate of Harvest for Mac has been much faster than we scaled our resources to accommodate it. The immediate popularity of Harvest for Mac has almost doubled Harvest traffic levels within a few days.

We are making extensive changes to Harvest to allow for this new growth. Firstly, we are adding more servers and upgrading certain servers to increase their capacity. Additionally we are reworking the caching system which handles the Harvest for Mac data refresh process to make it more efficient. An update to Harvest for Mac will be released later today and we encourage all customers to install the update when prompted to do so.

We apologize if you were affected by the outage this morning. Thanks for bearing with us as we increase our capacity and make our applications more efficient, and thank you all for making Harvest for Mac so popular so quickly.

Scheduled Maintenance Saturday December 10th, 10am – 12pm EST

On Saturday December 10th, 2011, we plan to take Harvest offline beginning at 10am EST and ending before 12pm EST for some billing system upgrades. (What time is that for you?) We spend a lot of time dealing with customer billing issues. Time we could be spending making Harvest more awesome. So we have revamped the entire billing system so we can spend our time more efficiently.

We do hope the downtime will be less than the stated two hours. Please follow @harvest on Twitter and Harvest Status for real time updates from the team during this work. Thanks for your patience!

How Harvest Is Made

You may not realize it, but almost every day there are improvements being made to Harvest while our customers are using it. Transparency is a core value here at Harvest, and I’d like to take you through a little of how we work behind the scenes, in a series of slightly technical posts.

The new Harvest Status page

We’ve just released the beta version of a tool we will be using to promote transparency between Harvest operations and our customers: the new Harvest Status Page. Bookmark this tool to keep track of how Harvest is performing at any time.

Balancing priorities

I’ll briefly walk you through the software release process we follow, and in a subsequent post I’ll talk in more detail about the tools and methods we use. If you are familiar with DevOps and the concept of continuous deployment you’ll recognize these in our workflow.

Context determines your opinion on software deployment. Our customers naturally prioritize software stability and the addition of new features as quickly as possible. Customer acquisition, avoiding outages, using cool new technology, and striving for elegant robust code are a few other priorities held by my Harvest coworkers. A natural tension can exist between these priorities. How does Harvest balance this and retain our core focus on a good customer experience?

The simplest answer is: We take small steps quickly through collaboration.

Release cycle and deployments

What may be of most interest to customers is how we deploy new code to Harvest. Harvest changes almost every day, usually multiple times per day. In the time it took me to write this blog post, two different developers deployed five production releases of Harvest. Some might be concerned that a process like this promotes poor quality software. In reality, like many other companies, we have found that this iterative, constant change promotes high quality software, exposes and resolves unexpected issues quickly and allows a distributed team to work on different features concurrently. This means, in a nutshell, that when developers deem code ready to go to production, it goes to production. No artificial release schedule governs Harvest software rollout. There is also no manager whose job it is to ensure our software quality because that is the common responsibility of every person committing code at Harvest.

100% bug-free software is an unrealistic goal, but we strive for a bare minimum of issues by having structure in place to address problems quickly and efficiently:

  • All significant code changes are peer reviewed before deployment. In the next post, I’ll talk about how we do this.
  • Every developer, designer and sysadmin at Harvest is able to (and does) deploy production code.
  • Mondays tend to be the busiest traffic day of the week at Harvest, so we rarely release big new features on Mondays. Same goes for late on Fridays, when bugs could linger over a weekend.
  • We have an internal QA process and production-similar staging environments, where we perform extensive testing when required.

Some deployments warrant special care, such as releases which involve database migrations changing large datasets. Certain database operations could produce a poor customer experience while deployments roll out. We have in the past, and will continue to deploy these releases at times of lowest customer impact, although Harvest’s global customer base reduces this window constantly. We have a maintenance mode which we can employ to take Harvest offline briefly if we need to.

If you have seen Harvest in maintenance mode and we didn’t notify you, our customer, prior to this deployment, we made a mistake and you can be sure that the team is working on the problem with urgency. It happens, but we think Harvest’s uptime speaks to how infrequently this occurs.

Obviously, when it comes to software which has a third party review process, or runs on customer desktops, such as our iPhone App and the upcoming Mac App, our process to roll out change is a little different to the core Harvest software that runs on our own servers.

If this post was too technical (or not technical enough), the one thing I hope you will take away from this is: Harvest software changes all the time in small increments. This concept of continuous deployment isn’t new or revolutionary and it may not work well for every company, but it allows us to strike a balance between stability and agility and keep forward momentum as we build a fairly complex suite of software.

Next week I’ll touch on the tools we use to review code, communicate as a team and keep on top of our infrastructure performance. If there is something you’d like me to specifically discuss, let me know in the comments or directly at warwick@getharvest.com.