Many times, we build a beautiful product but face hard time monitoring the performance from tech as well as business prospective and that may lead to failure of the product.
I personally feel, monitoring product is as good as monitoring your own health. You have to keep check of your health to know if all ok or not. Same way, product also needs constant monitoring to improvise.
At MyGlamm we consistently monitor tech and business metrics of every module / feature. In this post, I will cover mostly the tech side of monitoring.
For monitoring, there are few good open source as well as paid tools available. I am firm believer of open source but while selecting any tool, I also consider my manpower cost, server cost and time etc.. as well. Generally I prefer paid managed tools to save my time and manpower cost but I definitely opt for open source tool if it’s going to save my time and cost.
Currently, we use multiple tools for different purposes. The most important monitoring tool we use is Application Performance Monitoring (APM). This tool is more like a heartbeat checker of your application. APM helps to monitor application response time, errors etc.. It helps the most to identify the slowest endpoints and pinpoints to the exact reason for the same. Be it piece of code, external API call or database query. For this purpose, we use New Relic. One of the best tools for APM.
New Relic is also used to monitor performance of Single Page Application (SPA). Our website runs on React and New Relic helps to trace JS errors, slow response pages etc.. You can opt for this tool if you are running a SPA, it helps a lot.
On the mobile app front, Crashlytics is the BEST so don’t bother looking for alternatives.
For Infrastructure & logs, we use combination of New Relic and Cloudwatch. Our most infra alerts are set on CloudWatch, things like high memory / CPU or I/O usage + many custom alerts which help us to keep check on the infra. Logs are passed to New Relic so it becomes easier to trace the issues in the services.
Apart from above, for some very critical business impact events, we use custom alerts via Email or Dead Man’s Snitch (DMS). DMS is mainly used for cron jobs to ensure everything is running smoothly. You can also build custom real time dashboards to monitor these events.
At POPxo too we have similar setup, instead of CloudWatch, Stackdriver is used as the infra is on on GCP. PagerDuty is used for alerts.
We have also started evaluating DataDog. It’s an equally good service (for APM, Infra and Logs) with few additional features which we are looking for. Let’s see how it goes.
Combination of these tools really help you to understand performance of your app and infrastructure. The application can break anytime even though the code is written well. There can be may reasons and these tools will help us to find out those reasons. For example, recently, our member registrations started failing, monitoring & alerting tools helped us to find out that some 3rd party service we were using had sudden downtime and it was causing registrations to fail. We quickly deployed a hot-fix to cover that scenario too.
So, I strongly recommend using any of the above tools or similar tools to keep check on your apps.
On non-tech front, our alerts are mostly covered by Adobe Analytics but Google Analytics is equally good. Do make good use of their events and alerts feature. It’s fantastic.
Photo by Luke Chesser on Unsplash