Strategies for Error Handling
One of the important topics in programming courses is error handling. Usually, they talk about what is an error and explain stuff like try/catch/finally and in the case of more advanced courses they explain about resource cleanup. But after learning these techniques that your programming language gives you, as a Software Engineer you have a responsibility to decide "what to do" in case of error. I have seen many cases where they just "throw" the exception and that is the easiest thing you can do. But is it always the best thing?
There are four ways you can handle an error that happens during program execution and you need to consider all the factors and make an informed decision in each scenario as to what is the best way to handle the error.
- Fail fast: This is the most common method. In case of these errors you just terminate the process. These are fatal errors that are not recoverable. There is absolutely no way to continue processing if these happen. For example, if the configuration file you need is missing then you should fail fast, rather than continue execution.
- Ignore: This is a very delicate scenario and you need to make sure you know what you are doing. But in some cases, the error is simply expected or trivial (I would not call these "exceptions" unlike many programming languages, but that is another story). For example, if you have a non-critical section in the code that is experimental and you predict that something might fail in that section, but you don't want to terminate the process in that case.
- Retry: This usually happens with external resources. For example, if you call another API and get a timeout error, one way to handle the error is trying again (Maybe immediately or after a few ms wait).
- Fix: Sometimes you have enough information to fix the error or recover from the error. For example, if you have two database servers and they are replicated, and one of them is not responding, the easy fix is to try the other one (Ops will probably get alerted about the other DB having issues and will have to rectify the problem, but that is independent of your application).
One common part in all of the above cases is logging which usually happens with any error handling scenario. Errors are important and you should log them so that by inspecting the operational environment you can understand how the application is running and what kind of errors are regularly happening. This information can help you improve your code and/or the operational environment.
Note that you sometimes need to mix multiple approaches. For example, you may retry in case of error, if it fails again try to fix it and if it still doesn't work, terminate the process.
The point is, in each case you have to consciously consider the context and determine what is the best way to handle the error.