Imagine the Scene
In my past working life I spent many hours as a DBA scratching my head, searching through mountains of books (for the youngsters out there - Yes, we had to physically RTFM!) and later in life searching forums and googling for answers. If the company I worked for were lucky enough to have a support contract, great - but I can also recall many hours of navigating complicated web forms or voicemail systems instructing me to press any number of digits to reach a human - to then be put in a queue - or wait anywhere up to a few hours, or at times even a few days, for a response. All the while, my company was crippled without their mission-critical databases.
I have had only one good experience of outstanding customer support in my DBA career, but I’ll save that story for another time!
Fast-forward to when I first discovered Continuent - I was still a DBA at the time working for an e-commerce company based in London, UK, and I was tasked with finding a new HA/DR solution for the critical single sign-on component of their multi-million dollar revenue website. After various POCs with various solutions, we settled on Continuent Tungsten Clustering, the ONLY solution that met all of our needs - HA/DR and Geo Distribution, solid and proven, zero-downtime maintenance, and to my surprise, 24/7/365 support as standard. It ticked every box and more.
If you’re an existing Continuent customer, using our Tungsten Clustering and/or Tungsten Replicator solutions, and have experienced our 24/7 Support, then the rest of that story should be pretty familiar.
The Figures Speak for Themselves
In our latest reports, covering the last 3 months (June, 1st to Aug 31st), our average response time for Urgent support cases has been a staggering 2.5 minutes... or 150 seconds... in my past, in that time I would still be listening to the robotic voice telling me the account number I entered could not be recognized.
For the past 12 months (Sept 1st 2020 thru Aug 31st 2021), 2.7 minutes. And consistently, for the past 24 months it has remained at just 2.7 minutes. I trust you are getting the drift, right?
So, why are we fast? How are we so fast? And how can we continue to be so fast?... And oh, which cartoon character should we be?
Let’s dig into these questions a little more.
The Why and The How (We’re So Fast)…
“The Why” is the easiest question to answer: Because we care. It really is as simple as that. We are a relatively small team of ex-DBA’s with on average 30+ years of experience between us. We have been in your position in times of crisis, and we know the pain first-hand.
“The How” is comprised of three answers….
- Our software is very robust (complete, battle-tested and proven, as we like to say), and has been around for quite a while now...more importantly because “it just works” (as one of our customers so eloquently put it), the number of Urgent support cases we receive is actually relatively low. This year (2021) we have only handled 18 Urgent cases, one in every 10 days or so
- We have a suite of tools that help keep us efficient and responsive. Also, being geographically distributed as a support team means that we can (almost) follow the sun.
- Our development team also cares...a great deal...and will often take over complicated cases or be on hand when debugging complex issues.
The Tools
- Zendesk: This is your first port of call as a customer. Everyone in your organization that works with Tungsten should have an account. I always recommend to customers that you bookmark the URL and pin it to your favorites. Especially if you do not need to contact us very often. There really is truth in the “use it or lose it” saying. Once you raise a case, an email drops into the inbox of all the support engineers. This then becomes your primary channel of communication. In Urgent scenarios, or when we feel that the case could potentially be complicated or require a high level of information transfer, we often open up a Zoom call so that we can guide you, talk together, and better understand your case.
- PagerDuty: Every support engineer's nightmare ;) We all have it on our phones, and as soon as an Urgent case comes in, depending upon the time of the day and in which timezone, someone's PagerDuty app will start making noises and getting the attention of an engineer.
- Zoom: As I mentioned above, we use Zoom regularly to aid with support cases. Almost all of the time we will use a Zoom call in Urgent cases where we know from the information provided that it will be a much faster route to resolution
- “tpm diag” - This is built into our product and is the single most useful tool available. I will discuss this a little more in the next section…
How Can We Continue To Be So Fast?
We just will… because we continue to care and have faith in our product. However, being fast also relies on the customer doing their part as well.
We know in times of crisis you just need help, and need it ASAP; however, to help us to help you we need details. For many this may seem obvious, but also for many providing detail wastes time, and they just need someone on a call or on a Zoom chat asap.
We will always try and accommodate such requests, but it’s not really a productive use of our time if the first 5-10 minutes of a call are spent asking questions about what happened, what are the errors, and what steps may have led to the issue. This is where we continue to ask for your help.
When raising a support case, Zendesk presents a number of questions - they may seem trivial when you have your senior management shouting at you to get the system working again, but a few extra minutes spent filling out the questions up-front can save those extra 5-10 minutes on a call! And most importantly, be as detailed as you can in the description field. Pasting in the output from “trepctl status” with a simple remark of “The replicator won’t come online” really isn’t helpful!
One good example, if you see errors in the replicator, knowing what release of MySQL you are using, along with the full error log could lead to a much quicker resolution, one such case recently encountered an error because of a change in behaviour in the latest point release of MySQL. A known issue that we have fixed and queued to be released in the next patch release of our product. That particular case could have been resolved and closed in less time than it’s taken to read this blog post had we had the diagnostics package with the full error log and a completed zendesk case showing the release of MySQL in use.
I mentioned diagnostics package, and I also mentioned “tpm diag” above - these are one in the same. tpm as you know is our “(t)ungsten (p)ackage (m)anager” and is the tool you use for installing and updating the product. The `tpm` command has a number of global options, diag being one of them. When issued it will collect all the components logs files, snapshots of the current status of each component along with various other metrics and logs from the OS and database, and bundle everything up into a nice little archive package. Depending upon your configuration and topology, it could also collect the logs from all the nodes, not just the node you execute it on.
As a support engineer, looking at a diagnostics package is like looking at a photograph, a very detailed photograph. We can piece together what has happened and what the current state is, we can piece together a timeline. All of this can drastically speed up the time to resolution. More information on “tpm diag” and also “tungsten_send_diag” (A really useful tool for getting the diag packages to us if your hosts are connected to the internet) can be found at the following doc page links:
- tpm diag: https://docs.continuent.com/tungsten-clustering-6.1/cmdline-tools-tpm-commands-diag.html
- tungsten_send_diag: https://docs.continuent.com/tungsten-clustering-6.1/cmdline-tools-tungsten_send_diag.html
Over the coming weeks we are also slowly building up a knowledge base as part of Zendesk, this will help you in times where you may be encountering known issues.
So Back to the Cartoon…
Hopefully after reading this, you will have a clear idea of which cartoon character you think would best represent Continuent Support… For me, it has to be Road Runner, right? Beep Beep... I’ll leave you to decide who would be the Wile E. Coyote’s of the support world ;)
Comments
Add new comment