Maintenance 102

Maintainability — Logs & Their Use

An application that logs properly will make maintenance easier.

Oliver de Cramer

Published in

The Startup

8 min readSep 22, 2020

In the first chapter, we saw how to handle exceptions and how to log them to:

not leak sensitive data to the user.
have user user-friendly messages displayed for your users.
have all the details for easy troubleshooting.

Maintainability — Handling exceptions properly.

Exception handling is another of those subjects that might seem very simple. But handling well your exceptions can be…

medium.com

In this second article, we will see what needs to be logged beside exceptions, how to log them, and why we are logging them.

The examples in the article are in PHP, using the PSR-3: Logger Interface; but the logging concepts we will discuss are common to all languages.

What to Log & How to Log

To decide what needs to be logged, we must first decide why we are logging the information.

Most often we need to log information to make diagnostic easier. For example, if an API returns an error we will log the request & response.
We also log to for debug purposes, For example, we might log all api requests & responses in order the debug the api endpoint.
We can also log for statistics; this can be used for various reasons.

Logging for diagnostic

The most important log for diagnostic is the exception logs; those we have already discussed previously. But we can also log other “errors” for all errors are not exceptions.

One of the best examples of this is API calls; there are other cases as well; a missing file; an empty database query…. The cases depend upon your application.

Let’s see an example with an api call

<?php$response = $this->httpClient->get($endpoint, $query);
if ($response->getStatusCode() != 200) {
    $this->logger->warning(
        "Failed to fetch test data from test api", 
        [
            "endpoint" => $endpoint, 
            "query" => $query, 
            "status_code" => $response->getStatusCode(), 
            "body" => $response->getBody()
       ]
    );    throw new MyApiException("Failed to fetch data from test api");
}

In this example, we logged everything we need to make the same api call locally to check what is going on. We also used a unique log message so that if we find the log message in our logs we can find the code that has logged it.

So the point here is to :

Log what you need to reproduce the error locally and you can’t get by any other means.
Log what you believe will allow a quick diagnostic (The body)
Log a unique message that has meaning.
Do not log every step of the code!

You need to keep in mind not to log sensitive data such as passwords.

Logging for Debug

Logging for debugging is very similar to logs for diagnostic. If you have api’s that are very sensitive and you can’t make api calls locally easily(payments for example). Or you do not trust the api. It can be a good idea to log everything.

Let’s add debugging data to our previous example;

<?php
$this->logger->debug(
    "Fetching test data from test api", 
    [
        "endpoint" => $endpoint, 
        "query" => $query, 
   ]
);$response = $this->httpClient->get($endpoint, $query);
if ($response->getStatusCode() != 200) {
    $this->logger->warning(
        "Failed to fetch test data from test api", 
        [
            "endpoint" => $endpoint, 
            "query" => $query, 
            "status_code" => $response->getStatusCode(), 
            "body" => $response->getBody()
       ]
    );throw new MyApiException("Failed to fetch data from test api");
}$this->logger->debug(
    "Fetched tes data from test api", 
    [
        "endpoint" => $endpoint, 
        "query" => $query, 
        "body" => $response->getBody()   ]
);

Here we have added 2 additional logs:

The first log is before making the call to the API. We log all the information we have at that point to be able to execute the same call locally if possible, or comparing with working calls.
The second log has the response, if we are working with payments this might be the only way to see a complete response. (Some payment methods have additional information on the production they don’t have in the sandbox).

So the point here is to :

Log what we do.
Log the result.

Like in the previous case you need to keep in mind not to log sensitive data such as passwords.

We also have another trick up our sleeves here. In our previous example we logged using “warning” here we used “debug”. With most frameworks (Magento 😥) you can configure the level that should be logged. Constantly logging debug information is useless. So you need to enable/disable these logs when needed.

To be able to use this in symfony for example; I would recommend having multiple channels set up and have our custom channels use custom handlers. That way we can easily enable disable the debug logs for a given part of the application.

How to Log Messages to different Files (Symfony Docs)

Symfony 5: The Fast Track is the best book to learn modern Symfony development, from zero to production. +300 pages in…

symfony.com

Log for Statistics

Finally, we can also log to do statistics. This is possible if your logs are parsed and sent to tools such as Elasticsearch/Kibana, Elasticsearch/Grafana, Loki/Grafana, Datadog. On the third maintenance article, we will discuss more these tools and do a quick comparison. For now, let us just say that it is possible to use logs for statistics.

Our current logs can already be used to make statistics. For example, we can use the “Fetching data from test api” debug log to count the number of calls made to the API. This can be important information.

The only thing we would need to change is to increase the log level from debug to info.

There is also another usage for statistics, lets use our previous example again. We can use the “Fetching data from test api” logs and the “Failed to fetch test data from test api” logs to have an idea of the fail rate of the api. If you are using Grafana you can also use this for alerting purposes.

I recently had such a case on production; without touching the code we were able to add an alert for a specific case that was not documented in the api we were using. This allowed our client to communicate with their customers about the issue. Thanks to grafana theye were able to see all concerned customers and didn’t need to wait for a negative customer feedback. Meanwhile we exchanged with the api maintainers about the best approach to take to get around the issue.

This means once you identify 1 isolated issue, you can quickly find if there are other similar issues. Communicate on the subject making the issue less critical.

Using logs for statistics is cheap in development time. In my opinion, every project should have some log parsing anyway, so using that parsing to make some statistics is very useful.

There are of course limits; you are probably logging way more information than you need for the statistics you are extracting. This can be a good thing as it can be used to add an alert you hadn’t thought about. But this also means you can’t just keep all the logs indefinitely. It would cost too much to store them all.

Why are logs so important?

All applications have bugs; bugs they inherit from the software they are based on (cf Magento), bugs coming from the infrastructure, bugs that you develop, bugs from the application you connect to… There are bugs somewhere.

So at some, you are going to have a problem to resolve.

Properly logging will hopefully allow you to get a better vision of the problem as fast as possible.

If you do have proper logs for your issue, this will allow you to take the following steps:

Be retroactive

Will allow you to communicate on the issue before customers notice it, or get too much frustrated by it.

For example, we have an issue on some orders that are not shipping because of api calls to the carrier endpoints failing.

If we can list the concerned orders we can communicate and allow our customers to plan better.

Understand the impact

You have identified the issue and started to communicate on the subject. Thanks to the logs you know the orders that are concerned and you can understand the impact.

For example, your website has 1000 orders per day and you discover that 1 order per day is having an issue. If it’s Friday 6 pm you probably wouldn’t want to fix the issue. it would risk the other 999 orders. And if something went wrong with your fix no one would notice it during the weekend.

Understand the issue

With the proper log, you get closer to understand the origin of the issue. Is our api call to the carrier in error:

because of a mistake in the code
because of the proxy or another issue with the infrastructure?
because of the endpoint being unstable?

The logs will most often not give you a straight answer but it will make debugging easier than having none.

How to fix the issue

We must now find a solution. The solution can be implemented in multiple steps.

In our example case, we can have at first a person manually shipping the orders. This might need to be done if the issue is long to fix. We know there is a single order per day and therefore doing this manually would be simple.

Step 2 would, of course, be to fix the issue.

Conclusion

Will you be able to put logs everywhere that needs it to be able to debug future issues? You won’t be able to. That is why you must not overthink it either.

I have rarely come across an application that logs too much. I have often complained about not having enough logs, particularly in payment modules.

Does that mean that you should log everything? No; this is a case I came across one project I worked on. Opening a single page created over 500 lines of logs. There were logs in conditions to trace what was happening, logs in loops… Removing all the logs allowed page load times to be divided by 3.

Debug logs can be used for particularly complex and critical bits of code. But those should be disabled most of the time. Writing a log has a cost. You are accessing the file system after all.

You must think of log rotation and deleting your logs. Depending on the situation you might need to keep some logs longer than others. Logs are not meant to be kept and you need a strategy to get rid of the old ones.

In this article, we have seen what to log, and what to log.

The issue is using grep to search in log files has its limits. Particularly if we wish to start searching in the context of the log and we search for correlations.

Grep’s main limit whoever is for sites that are hosted on multiple servers with load balancing. Logs will be distributed on the multiple servers which makes finding and understanding them harder.

That is why in the next chapter of maintainability we will discuss the various tools that are available to us to parse & aggregate the logs to make them more accessible.

Thank you for reading.