Reducing response time to bugs in production

One thing  is common to most startup companies:  products need to be built fast, pushed out as soon as possible while delivering the best possible customer experience at the moment.  The problem is that bugs can easily find their way to production.

If you are a developer, you would agree with me that sh*ts do happen in production. Bugs do find their way to production, some are pretty bad and can undermine your effort to deliver  great customer experience.

Some bugs are difficult to discover even in best test environments and nothing can uncover them except real users of the application themselves. Users will never use your application the way you intended it anyway. There will always be some level variation in the way people use things.

But a  quick response to customers’ problems will ultimately increase your customers’ satisfaction and trust.

"The organizations that enable their teams to quickly respond to problems in production code create the highest quality software."
– Sifter Software Quality Academy

Bugs come in so many ways, some are from third-party SDK or libraries you imported into your code. It has happened to me on several occasions and the most recent one came from AWS PHP SDK.

On that particular day, I got a request of users unable to upload files. That was really strange.  I finally tracked down this bug and discovered that a hotfix we had pushed to production had triggered a composer update which updated AWS PHP SDK to v3.31.0.   This particular version of AWS SDK  has a bug and throws an exception whenever you try to upload files to S3 bucket.

It dawned on me, we were not doing something right. Our response time was too slow. Why must customers report this before we know?

What could we have done better?

Errors that are capable of impacting user experience like this should have been discovered or known long before it’s reported.

Yes! Exceptions are logged but logs are not checked often-  I’m yet to meet a developer who is consistent with checking logs especially on weekends with the hope of finding and tracking a bug down.  If you know any, shoot me a mail – I must build a startup with him 🙂

Email Notification to our Rescue

I respond to emails pretty fast, if you send me a mail, chances are high that it will be read within 2 minutes it arrived. The same goes for everyone on the team.

This means we can easily turn ourselves to a SWAT team of developers that get real-time alerts whenever the unexpected happens.  I made a modification to the way exceptions are handled in the code.

Here is the modification I made to exception handler.

  public function report(Exception $exception)
  {
      if (App::environment('production') && $this->shouldReport($exception)) {
          $error_details = $this->processException();  
          $this->reportToSlack($error_details);
          $this->reportToMail($error_details);
      }
    parent::report($exception);

  }

Not all exceptions are reported, we don’t want to get our mailbox filled up with 404 exceptions, validation exceptions or authorization, such are filtered out.

Through this, we have been to reduce our response time by 90% because everyone is alerted once there is a problem.

Whether you are on your couch reading what is new in Laravel 5.6 or passing some time on the beach with hot bikini-wearing models, you’ll get a notification.

This may not be the best way out there one can employ to handle errors on production environment but this can be helpful if you are running a small team in a small startup company.

Is there anything you think we could have done better?  Feel free to share your thoughts or share something new with me. 🙂

Samuel James

0