WSO2 MicroGateway Observability for request failures -Ballerina Metrics

Lashan Sivaganeshan
4 min readNov 29, 2020

--

Feat: Prometheus + Grafana

WSO2 products provide multiple observability features (logs, metrics, and traces) and the WSO2 MicroGateway product also plays its part when it comes to making the processes of testing, understanding, and debugging systems easier through observability.

This article is focused on enabling metrics for Gateway level failures using the Ballerina (the language that implements the WSO2 MicroGateway runtime) metrics monitoring [1] via Grafana. The following document [2] contains comprehensive instructions on configuring the MicroGateway runtime with Prometheus and Grafana [3]. A point to note here is that the observability features enabled in the MicroGateway through Prometheus and Grafana are complementary to its compatibility with the API-Manager Analytics [4]. API-Manager Analytics server provides multiple reports, statistics, and graphs on the APIs running on the MicroGateway while the metrics provide statistics about CPU and memory usages of the MicroGateway.

This article contains several links to related references. Please feel free to skip to the parts that interest you and continue to read. At the bottom of the article, there’s a video intended to summarize the details. In order to get a fundamental understanding of the WSO2 MicroGateway, please refer to the following article [6].

As mentioned above, this article focuses on Gateway level failures of the HTTP requests coming to the MicroGateway. In other words, HTTP 3xx, 4xx, 5xx responses returned from the Gateway itself. If these responses are returned from the backend servers which facilitate the actual services that are fronted through the WSO2 MicroGateway, the default dashboards available in API-Manager analytics can easily capture them. Please refer to the following screenshots from the API-Manager Analytics dashboard on the Backend 4xx responses.

HTTP 4xx responses returned from the Backends being captured through APIM-Analytics. Required configurations available in [4][5]

Extra tip: The above case was simulated with the following backend API implemented in the WSO2 Enterprise Integrator. It returns an HTTP-409 response.

Simulating HTTP-409 using an API in the WSO2 Enterprise Integrator

However, Authentication failures in the Gateway level (401 Unauthorized responses) are not yet captured in API-Manager analytics. This is because the events published from the Gateway to analytics are extracted using valid tokens which are passed with the requests. Until the capability to capture Gateway Authentication failures is available in the API-Manager Analytics, the Ballerina metrics can be used through Grafana as further explained below. Anyways, with Grafana you will be able to visualise a large set of information, enhancing the observability facilities of your deployment.

The MicroGateway dashboard that can be imported to Grafana is available in [7]. Configurations to integrate the MicroGateway with Prometheus+Grafana pair are available in [2].

To capture the Gateway level 401 Unauthorized responses, the Ballerina metrics including the following can be used through a Panel in the Grafana dashboard. More information on Ballerina metrics is available in [8]. Similarly, HTTP 3xx, 5xx responses can also be monitored with the same approach. The Panels can be improved with enhanced PromQL queries [9].

ballerina_http_Caller_requests_total_value
ballerina_http_Caller_4xx_requests_total_value
HTTP 401 responses collected through ballerina_http_Caller_requests_total_value metrics

The complete list of metrics exposed through the MicroGateway can be checked through the following URL (as per the b7a.observability.metrics.prometheus configurations of the micro-gw.conf file).

https://localhost:9000/metrics
/metrics endpoint of the MGW

The following screen recording demonstrates how the above-mentioned Ballerina metrics can be used through a Grafana Dashboard panel.

Apart from the above, the access logging capabilities [10] of Ballerina also can be used as an alternative. We can get the access logs in the “combined” pattern [11] for all requests which can be utilized through ELK [12] to visualize the traffic. The logs will be collected as follows when the “- -b7a.http.accesslog.console=true” parameter is used in the startup of the MicroGateway.

gateway <API_JAR_PATH> --b7a.http.accesslog.console=true --b7a.http.accesslog.path=<Path>/access_logs127.0.0.1 - - [29/Nov/2020:14:44:27 +0530] "GET /testRespond/ HTTP/1.1" 404 45 "-" "curl/7.47.0"
127.0.0.1 - - [29/Nov/202014:45:23 +0530] "GET /
testRespond/1?1 HTTP/1.1" 401 153 "-" "curl/7.47.0"
127.0.0.1 - - [29/Nov/2020:14:45:23 +0530] "GET /
testRespond/1?2 HTTP/1.1" 401 153 "-" "curl/7.47.0"

The limitation with the above approaches is not being able to distinguish between the backend responses and the Gateway responses in terms of the status code. Therefore using a combination of the above tracing resources depending on the requirement would cater to the need.

An important note with regard to enabling the access logs is the need for managing log files because the logs will include records for each request coming to the MicroGateway. Along with enabling other tracing functionalities, capacity planning is absolutely essential with the scaling capabilities of the deployment.

Please feel free to leave any comments with any notes that would be beneficial in the above discussion.

Thanks.

[1]. https://ballerina.io/learn/observing-ballerina-code/#monitoring-metrics

[2]. https://mg.docs.wso2.com/en/latest/how-tos/observability/

[3]. https://prometheus.io/docs/visualization/grafana/

[4]. https://mg.docs.wso2.com/en/latest/how-tos/analytics-for-microgateway/

[5]. https://apim.docs.wso2.com/en/latest/learn/analytics/configuring-apim-analytics/

[6]. https://medium.com/@wathsalakoralege/a-simple-understanding-of-wso2-micro-gateway-6003921eb2f2

[7]. https://grafana.com/grafana/dashboards/12061

[8]. https://github.com/ballerina-guides/inter-microservice-communication

[9]. https://prometheus.io/docs/prometheus/latest/querying/examples/

[10]. https://ballerina.io/learn/by-example/http-access-logs.html

[11]. https://tomcat.apache.org/tomcat-7.0-doc/api/org/apache/catalina/valves/AccessLogValve.html

[12]. https://www.elastic.co/what-is/elk-stack

--

--

Lashan Sivaganeshan

What you search is out there. It's a matter of pressing the right keys.