Non-fatal errors vs fatal crashes: The differences explained

Non-fatal errors happen in every application that’s developed and have a close relationship with fatal errors. Most of us know that both types of errors have an essential difference: fatal errors are not recoverable, while non-fatals are.

In my 10+ years of development experience, I’ve seen many engineers ignoring non-fatal errors since they wouldn’t crash the application either way. In my opinion, this is very wrong, and there are multiple reasons for this—time to dive in and show you the importance of taking non-fatal errors more seriously.

What is a non-fatal error?

A non-fatal error is a failure in your application that didn’t result in a crash for the user. In other words: the loss is recoverable, and the application can continue.

What is a fatal error?

A fatal error results in a crash for the end-user in which the application can not continue anymore. Crashes are the type of failures that you want to prevent in all cases since there’s something wrong. An application that often crashes will likely result in a high churn rate, which will negatively affect your application’s performance.

The “I never use force unwrap” mindset

Force unwrapping an optional in Swift is often seen as a code smell since it increases the chances of introducing a fatal error. Like explained in my article Optionals in Swift explained: 5 things you should know you should be aware that force unwrapping an optional when it’s actually nil will result in a crash of your application that you can’t recover.

The result is that many engineers always decide to use guard statements to unwrap an optional and either return nil or throw an error if the value doesn’t exist. The following piece of code has a familiar presence in many projects:

let myOptional: Int?

guard let unwrappedOptional = myOptional else {
    /// The code just stops here.
    return
}

In case the optional has no value, the code will exit without any failure handling. It could mean that the user flow suddenly breaks without showing any feedback to the end-user.

Let’s take this example a little further with a real-case example:

struct Article {
    let title: String?
}


func postArticle(_ article: Article) {
    guard let title = article.title else {
        return
    }

    // .. Continue posting the article
}

Posting the article only takes place with a non-empty title because what is an article without a title? Therefore, we use a guard statement to unwrap the title and return the method if the title is not present.

This code might seem fine, but it would mean that nothing happens if there’s no title. This is a programmer mistake since we should enforce filling in a title in the UI, but we might not realize this is even a problem. In my experience, many engineers unwrap an optional and state:

I’m pretty sure this will always have a value either way, no need to handle the return

Surprise: it happens more often than you think. An improvement would be to introduce a throwing error as follows:

enum ArticlePostingError: Swift.Error {
    case missingTitle
}

func postArticle(_ article: Article) throws {
    guard let title = article.title else {
        throw ArticlePostingError.missingTitle
    }

    // .. Continue posting the article
}

This code example is much better since it enforces handling the error on the implementation level. Of course, we could still use try? and ignore the error accordingly, but it’s at least transparent that this method can fail. The flow of posting an article breaks with an error whenever the title misses. We can decide to show feedback to the user as a result of the thrown error.

Are non-fatal errors always visible to the end user?

A common question asked is whether non-fatal errors are visible to the end-user. Whether this is true entirely depends on the way the application implementation. The developer can decide to handle non-fatal errors and show a describing error message telling the user that something went wrong. In the above example, we could alert the user that the title is required before the article can be published.

However, in my experience, many non-fatal errors are ignored, and the user ends up with a lot of confusion. Combined with the fact that many engineers don’t take time to look at non-fatal errors results in an application that seems to perform great, while it’s not. Therefore, I’m a big advocate for introducing application metrics that results in a so-called non-fatal error-free user sessions percentage.

A non-fatal error free percentage as companion of the crash-free user sessions

Most of the tracking tools I’ve worked with, like Datadog, Firebase, and AppCenter, provide you with an option to track both crashes and non-fatal errors. These tools allow you to gather insights for non-fatal errors and help you decide whether it’s time to dive into a flow that’s not performing well.

The above example could result in logged non-fatal errors stating that publishing articles failed without a title set. Based on those insights, you can decide to look into the flow and write a fix accordingly.

An easy way to track non-fatal errors

Once you’ve set up a crash monitoring service, your crashes will be handled automatically through one of the tools I’ve mentioned earlier. Non-fatal errors, however, need to be tracked manually. At WeTransfer and in RocketSim, we’ve created a convenience method that allows us to track non-fatals easily:

/// A callback to execute each time an error is logged. Can be used to log errors to analytics for example.
public static var errorTracker: ((_ error: Swift.Error, _ description: String, _ file: String, _ function: String, _ line: UInt) -> Void)?

Our logging method that’s making use of OSLog and Unified logging as recommended by Apple we’re making use of the error tracker:

func error(_ error: Error, description: String, trackAsNonFatal: Bool = true, file: String = #file, function: String = #function, line: UInt = #line) {
    os_log("%{public}@ [%{public}@]", log: logger, type: .error, "\(error)" as NSString, description)
    
    guard !isRunningTests else { return }

    if trackAsNonFatal {
        Log.errorTracker?(error, description, file, function, line)
    }

    DiagnosticsLogger.log(error: error, description: description, file: file, function: function, line: line)
}

We don’t log any errors when running our tests suite as it would mess up our data with test failures. We can also opt-out of tracking an error as a non-fatal since we don’t always want to monitor specific errors. Cancellation errors are a typical example since it’s expected failure.

Logging an error at the implementation level looks as follows:

Log.recording.error(error, description: "Video convertion failed")

And we could opt-out for tracking this error as a non-fatal as follows:

Log.recording.error(error, description: "Video convertion failed", trackAsNonFatal: false)

We’re setting it up in our tracking configurator to ensure that our error tracker is working as expected. For example, when working with Datadog, you can use the following piece of code:

Log.errorTracker = { error, description, file, function, line in
    Global.rum.addError(error: error, source: .source, attributes: [
        "description": description,
        "file": "\(URL(fileURLWithPath: file).lastPathComponent)#L\(line)",
        "function": function
    ])
}

The result is an overview of non-fatal errors, for instance, in Datadog:

A non-fatal error example shows that decoding failed with an error code 11839.
A non-fatal error example shows that decoding failed with an error code 11839.

Taking non-fatal errors as serious as crashes

At WeTransfer, we decided to take non-fatal errors just as seriously as crashes. We’re closely monitoring non-fatal errors, and we try to indicate failing user flows early on. We filter out non-fatal errors that don’t require action when they occur by using the opt-out functionality.

We still believe crashes are more critical to fix compared to non-fatal errors as there are chances of the app surviving without disappointing the user in case of a non-fatal error, while you’re sure the app breaks with a crash. However, a non-fatal error happening a lot can be just as important to fix when it fails an essential flow of your app.

Conclusion

Even though non-fatal errors don’t break your app right away, they might still break an important user flow in your app. Like with crashes, you should closely monitor whether your app is performing well by adding tracking for non-fatal errors. A 99.5% crash-free percentage doesn’t mean your app is performing great. It’s great to have a low number of crashes, but it’s worth nothing if there are many non-fatal errors still breaking important app flows, making your end-user unhappy.

If you like to prepare and optimize even more, check out the optimization category page. Feel free to contact me or tweet me on Twitter if you have any additional tips or feedback.

Thanks!