Last year, before Christmas I had a jolly time in my office, cleaning up old bits of code and other stuff. One of the things I’ve “Cleaned-up” was old expired Azure Management Certificates. Then I went for 2 weeks holiday travelling around England. Until I had a text message from my colleague saying that one of our production system is down and not going back up. Error was something to do with Azure authentication.
Quickly doing the math I’ve realised that I have accidentally removed a current management certificate that was used in our system. The cert was used on the start-up phase of the app to check that all Azure Resources were available for the system to operate. And of-course at the time when I deleted the certs there were no outages and all systems run as normal. Until one of them decided to get restarted (because Azure). And when it went down it never came back up from because certificate that the system had was removed by me.
After fixing the systems (sorry, man!) and a bit of investigation turned out that our system had an expired certificate and it could authenticate with the service with it. So Azure Management Portal did not check if the management certificate was expired.
Now I’m not sure if this is a bug or a feature, but certainly I’m not going to report that because if this is going to be fixed, how many other systems with expired certs will go down without a warning?
From now on a rule of thumb – never to delete management certificates, even expired ones!