Last week I wrote about how developers at Harvest deploy code and own the responsibility of keeping our software quality high. Today I’ll touch on the tools and process we currently use to collaborate, stay in touch with customers and glean feedback from our infrastructure.
Harvest developers are seldom in the same building, let alone the same state or country. We work as a distributed team, yet we collaborate extensively. All of our code is hosted with GitHub, which makes this collaboration simple. For those familiar with Git:
- Developers work in feature branches off the master branch, and master is always assumed to be deployable by anybody at any time.
- Developers use GitHub Pull Requests all the time, and significant deployments are peer reviewed in this way prior to deployment.
- A Continuous Integration server constantly tests our code, and reports concerns to the team.
- Development takes place locally, but we have multiple production-similar staging environments for testing and QA.
We strive to have no ‘walls’ over which features or releases are thrown between team members. We share the responsibility of creating and supporting our software. As the ‘systems guy’ at Harvest, it’s important to me that every developer has the ability to manage systems configuration. It’s also important that if problems arise, the team who responds to these problems is not a siloed operations team, but includes the developers who wrote the code which is running in production.
To this end, we use Chef to transform our systems configuration into a collaborative effort. Every component of our infrastructure is controlled by Chef. This means that technical team members can view and modify production configuration and roll out systems changes. The beauty of Chef is that everything is protected by Git version control and enhanced by the power of Ruby.
Harvest has a stellar support team and our customers are generally blown away by their responsiveness and helpfulness. If a customer reports a bug, it will eventually end up in the hands of a developer who will interact directly with the customer via Zendesk to resolve the issue. This direct customer contact further enhances a collaborative ideal of ‘no walls’ between internal functions.
Harvest has taken this concept one step further with our Outreach program. Every customer account has a Harvest team member assigned as an Outreach contact, and nobody at Harvest is exempt. Periodically the system will prompt our customers to direct any questions they have about Harvest to their Outreach contact. This means that every Harvest team member interacts with, helps and understands customers daily. I love this program because it often forces me to delve deeply into our own product and grok the Harvest customer experience. I feel responsible for our product success and customer satisfaction.
Communication is key to our culture of “Getting Things Done”, especially given the distributed nature of the Harvest team. We have various tools that we use for communication:
- Co-op is where the Harvest team communicates most of the day. In Co-op, we see what everyone is working on, have back and forth discussions and keep our company culture alive by sharing important pictures with each other.
- HipChat is used extensively for focused discussions and for getting notifications from our various tools. More on tools next.
- Skype is how we hold our infrequent company townhall meetings and see each other’s beautiful faces.
- Google Hangout is how some developers collaborate when pair programming or planning.
- Beluga is our bat signal, to round up the team if needed.
- Zendesk is used by the Harvest Support and Delta Force teams communicate with each other, and with our customers.
- Kaizen, an internal tool we developed, is our combined bug tracker, task manager and wiki. The entire team uses Kaizen to keep track of their tasks, assign tasks to one another, and to share knowledge.
Metrics and Tools
Our applications and infrastructure are constantly changing and we rely on a diverse suite of tools for immediate feedback on how everything is performing.
- Nagios monitors hundreds of services across our network and can notify us about problems via email, SMS and Hipchat.
- Cloudkick monitors our production servers from outside of our own networks for a second opinion on how things are performing.
- Pingdom probes our applications from worldwide locations, providing performance data and alerting if there are issues.
- New Relic provides insight into our application performance and a thorough understanding of the customer experience.
- Munin provides trending data on a vast array of metrics across our servers and applications.
- GitHub hooks alert the team via HipChat when code is pushed to our repositories.
- Errbit is the tool we use to capture exceptions and errors from applications for examination.
- Capistrano alerts the team when applications are being deployed, and by whom, via Co-op and HipChat.
Finally, we developed an internal tool called Lumberjack for mining production logs, and to understand application trends. Lumberjack is a Rails application which collects production logs via syslog-ng, stores data in MySQL and aggregated stats in MongoDB, provides an interface for searching for specific events in logfiles and for charting ad-hoc data sets from the terabytes of log data our systems produce.
Here is an example of the charts a Harvest developer might be looking at at any time of the day:
To sum up this exposé on how we work at Harvest, we share the responsibility for our customer experience. We move quickly but we collaborate. If you think these things sound exciting, we want you to join our team!