Over the last five to ten years, it has become normal to follow a cloud-first IT Strategy. This is particularly the case when considering telephone systems. The promise of a more robust platform on a ‘pay per user per month’ basis makes compelling business sense for such commodity services. Replacing costly on-premise PABX systems with cloud-based Voice over IP (VoIP) systems, meant that telephony systems were amongst the earliest and most widely adopted cloud-based systems.
Essentially, while a lot of people were worried about taking their computing power into the cloud, there was less resistance and less consideration to the risk associated with taking a cloud-based phone system, mainly because of the lack of data stored on those systems and the ease with which calls could be diverted to mobile phones.
Such is the normality of using a cloud-based system such as Horizon or a solution provided by BT, fronted by third-party re-seller, that very little heed was taken to the inherent risk of the underlying platform.
One of our clients recently experienced a horrific eye-opening experience, which brought these risks to light. They had been using a cloud-based VoIP system for several years and as part of their wider IT Infrastructure refresh, had decided to move to a new phone system provided by their new provider to allow tighter integration with their new platform.
In due course, termination notices were served, and a graceful exit strategy agreed with the incumbent. On the face of it, this was a reasonably simplistic project with one telephony provider providing PAC codes to the other and having plenty of time to enable a seamless changeover between the systems.
We then hit several issues with the incumbent not being able to port numbers over to the incoming provider. When we undertook the pre-move tests, it appeared that several of the numbers on the DDI range had got a different postcode to the records that were held by BT; this included the main office numbers.
We were therefore in a bizarre situation where the firm’s primary phone numbers, according to BT, were not owned by the firm; and it was impossible for us to move those numbers over while we couldn’t prove that the firm had them, even though those numbers had been in use by the firm for around 20 years and indeed had been on the incumbent telephone system for five or so years, being used every single day.
It appears what had happened was, when the phone system was set up, there had been a mistake mismatching phone numbers to postcodes, and once that mistake was in the nightmare of BT administration, it wasn’t an easy fix to get them changed.
The administrative process to change these things seemed to drag on for weeks and months, causing significant delays to the project; this became a project risk. We were rapidly approaching the notice period, served in the early part of the project.
We therefore engaged the incumbent telephone provider and negotiated with them, that they would put us onto a rolling monthly arrangement until we could gracefully leave. After all, the administrative mistake was their making, and they had not corrected it in the five or so years they provided the telephone system.
From a project management point of view, we felt the risk had been mitigated: we had signed agreements from all parties about the delays, and a plan of how we were working together to address the administrative challenges.
When decommission means decommission
It must have been about 4pm or 5pm on a Monday, when I had a phone call from our client to say: “Please help – our telephone system had disappeared!” It was one of those things that I couldn’t quite believe, initially. So, I went through the question and answer session of “What do you mean, your telephone system has disappeared?”, initially assuming there was some sort of outage. It was only when we drilled into what had really happened that the horror of the situation struck.
While the incumbent support party had agreed on an extension on a rolling monthly contract basis, they had failed to inform the platform provider of the new arrangements. So, when they terminated the contract originally, they had entered the agreed termination date into their system; and when that date came, the provider of the platform itself simply activated their ‘decommission plan’ which meant the phone system was completely decommissioned off their platform.
Our first thought was that this was a simple fix: “All we need to do is ask them to restore the system, it is a bit of an administration misunderstanding, and we will be back up and running in a few hours”. How wrong we were.
When we started talking to the provider in some detail, we discovered that a decommission of a system in a cloud platform meant just that. They literally delete the information, they don’t keep any back-ups, there was no going back. Our client’s phone system had literally been disappeared and deleted in the click of one button.
On discussing this with the third-party support provider, it became apparent that there was no interaction between their system and the platform provider to undo a termination request. So, despite them agreeing to the extension, it would have necessitated them re-contracting with the cloud platform provider to give them the extra time they had promised. Nothing for it but to plunge straight into issue management mode: How on earth could we get a working phone system which, after all, is the lifeblood of law firms, up and running quickly and efficiently?
Our first consideration was to see if we could turn on the new platform. Unfortunately, because of the state of the numbers, we still hadn’t got the point where BT would release them to us. Things were exacerbated further because, by the deletion of the existing phone system, all of the numbers used by the firm were actually released into the general sale market – key office numbers and numbers used in the firm’s advertising, they were all now on general release for anyone else to purchase.
Immediately we moved to re-secure all the numbers that were being used. Again, this was made extremely hard because the administration had not been done properly and the postcodes that we had provided did not match up to the numbers released; moreover, because we were halfway through changing them, when we tried to allocate the numbers to the incorrect postcodes, they didn’t match there either.
This is where we must give the third-party company we were leaving a lot of credit. They pulled out all the stops to ensure, through back-door channels with BT, that all of those numbers were essentially ring-fenced so they could be returned back to our client in due course. Those back-channels also enabled us to secure those numbers in blocks of 10 to 20 so that we could actually do something with them. Effectively it was the personal supplier relationships that enabled us to secure those numbers; when we tried to do it through the formal processes, they failed. It all came down to the personal engineer-to-engineer relationships that were in play.
New plans, old system
But once we had the numbers secured, we still couldn’t transfer them to another third party because the PAC process takes a number of days.
That left us with one option: to re-build the phone system from scratch on the platform we were leaving. It then took us a whole two days of engineering effort with the incumbent supplier’s engineers working through the night and early morning, to recreate phone systems from scratch.
Due to the poor nature of the record keeping, we didn’t know which handset individual people had and because of the way VoIP systems work, it is necessary to match up the unique code “IMEA” of the handset to the extension number that person is using.
That necessitated the firm’s equity partners and IT manager having to run around the offices making lists of phone handset codes to create a list of who had what, so it could then be married up with the numbers.
Once we had those lists sorted, the third-party supplier again had to go through the process of recreating the phone system on an extension-by-extension basis. We then had to analyse call flows and have them set up again.
Luckily, we had completed all of the work in terms of required functionality and call plans for the new system; this meant we were able use a lot of this information to set the old phone system.
The technical challenge was significant, but the most challenging element of the outage was the business challenge. The firm had lost its main communication platform to clients. Normally during telephony outage you simply have call diverts put on so that incoming calls to the main switchboard numbers get routed to and answered by a mobile.
However, because all of the numbers had gone into ‘purgatory’, we couldn’t even enact that simple redirection for the first half-day or so of the outage; it was a full blackout and we couldn’t do anything to enact the DR plan.
We resorted to putting notifications onto the website and social media that we were experiencing telephony problems. We also had to email clients and key contacts to let them know there was a problem and request that they use mobile numbers.
From the issues starting on Monday at about 4pm, it took us until late on Wednesday to have a fully functioning phone system almost back to the state it was on before that delete button was pressed.
Lessons learned
One month later we had finally sorted out the administrative problems with the number registrations and were able to successfully move over to the new phone system as per the plan.
The whole scenario has highlighted the fragility of cloud-based phone systems in terms of administration so:
- It is essential that phone numbers are registered properly to the right address so clients can prove they own them and are paying for them via third-party.
- I recommend that you do a test PAC of all your numbers to see if there are any problems that will cause you issues very early into a telephony change project and potentially after every office move.
- If you do have a delay on your projects while changing phone system, make sure that regardless of the confidence the third party supplier has, that you get written confirmation from the platform provider that you have entered into a changed or rolling monthly contract, enabling you to move away gracefully.
- It is an acceptance that the best business continuity plan cannot always cover all the options. For example, our client had a good communication plan and also had a good mitigation plan in terms of diverts, but the lack of ability to divert those numbers caused a real issue. The words ‘telephone numbers are in purgatory’ is something that you do not want to hear in such a situation.
The main lesson I would say is that it is the engineers wanted to fix these problems, and the processes and organisations put around the engineers aren’t actually that helpful.
I am convinced the only reason we got the system back for our client within a couple of days was down to the goodwill and professionalism of individual engineers rather than the companies they worked for.
The companies formal processes were barriers at times, whereas the individual engineers back-channels and cross organisational relationships despite the company structures that ultimately enabled the successful resolution and the end of ‘purgatory’.
Written by…
David Baskerville