Jump to content

Computer Programming/Coding Style/Minimize nesting

From Wikibooks, open books for an open world

Deeply nested code is a common feature of structured programming. While it has some advantages, discussed in that section, it is frequently considered hard to read and an anti-pattern: “Flat is better than nested”.[1]

Specifically, nested control flow – conditional blocks (if) or loops (for, while) – is hard to understand beyond three levels of nesting,[2][3] and has high cyclomatic complexity. This is known as “Dangerously Deep Nesting”[3] or, in the case of nested if statements, the “Arrow Anti Pattern”, due to the following shape:

 if
   if
     if
       if
         do something
       endif
     endif
   endif
 endif

This has a number of problems:

  • The code is hard to read.
  • Context is hard to understand, due to multiple levels of indentation.
  • Cleanup happens vertically far from the original cause: if a resource is acquired (say, memory allocated, file opened) at the top, in one indentation level, the cleanup occurs at the same indentation level, but at the bottom, vertically far.

Other than refactoring or avoiding this code, one technique to handle deeply nested code is code folding in editors – this allows you to collapse a block, yielding abstraction and allowing you to see the surrounding code easily without the intervening code (so resource acquisition and cleanup are both visible).

Solutions

[edit | edit source]

Solutions include the following.[4]

Refactor blocks into separate functions.

[edit | edit source]

This is particularly common for bodies of loops.

Combine tests

[edit | edit source]

If several if clauses are just tests (without any intervening code), these can be combined into a single test. Compare:

if a:
    if b:
        ...

to:

if a and b:
    ...

Inline function calls with boolean short-circuiting

[edit | edit source]

If the only body of the if clause is a function call and assignment to perform a test, followed by another if clause, in language such as C where assignments are expressions (have a value) and boolean expressions are short-circuited, these can be combined:

if (a) {
    int b = f();
    if (b) {
        ...
    }
}
if (a && int b = f()) {
    ...
}

Auxiliary variables or functions

[edit | edit source]

Auxiliary variables are useful when a complex expression is inlined in code, notably a boolean expression or anonymous functions.[5] Using an auxiliary expression both reduces the nesting, since it is no longer included within another expression, and the variable name documents the meaning of the expression. For complex boolean expressions, another alternative is a separate function which is called, rather than an auxiliary variable.

Early return

[edit | edit source]

The most significant solution is early return, which has several forms, notably a guard clause.[4] Avoiding nested control flow is a fundamental reason for non-local control, notably: return (value), raise (exception), continue, and break. A common pattern is to replace an if-then or nested if ifs by if not/return-continue (return/raise if a function, continue/break if a loop body).

Compare:

if a:
    ...
    if b:
      ...
      ...

to:

if not a:
    return
...
if not b:
    return
...
...

Similarly, compare:

for i in l:
    if a:
        ...
        if b:
            ...
            ...

to:

for i in l:
    if not a:
        continue
    ...
    if not b:
        continue
    ...
    ...

This reduces the nesting and makes the flow more linear – either go further down the block, or return/continue.

This pattern is called a “guard clause” when the checks appear at the start of the code and check preconditions. However, it is used more generally to finish processing and return immediately once work is complete or a value has been computed:[3]

“Use a return when it enhances readability: In certain routines, once you know the answer, you want to return it to the calling routine immediately.”

However, early returns are potentially confusing and error-prone, notably due to the issues of cleanup, and go against a central tenant of structured programming, namely a single exit point per routine.[3]

“Minimize the number of returns in each routine: It’s harder to understand a routine when, reading it at the bottom, you’re unaware of the possibility that it returned somewhere above. For that reason, use returns judiciously–only when they improve readability.”

In the absence of cleanup – when a function is just computing a value or producing side effects – early returns have fewer potential problems. In the presence of cleanup, some languages have facilities that facilitate cleanup even with returns (such as “finally” clauses, “atexit” in Unix or Python, or “defer” in Go). Another option is to keep a single exit point at the end, following a cleanup clause, and instead of early returns, jump (goto) the cleanup clause.

In cases of complex nesting – nested if/then/else statements or multiple if statements at a given level – often the logic is simply multiple exclusive conditions, which can be handled by testing for each condition in turn, executing code if relevant and then returning or using an elif, allowing a flat structure and making the complete condition clear.

Compare:

if a:
    if b:
        f()
    else:
        g()
else:
    if b:
        h()
    else:
        i()

to:

if a and b:
    f()
    return
if a and not b:
    g()
    return
if not a and b:
    h()
    return
if not a and not b:
    i()
    return

or:

if a and b:
    f()
elif a and not b:
    g()
elif not a and b:
    h()
elif not a and not b:
    i()

Alternative control structures

[edit | edit source]

Early return has a number of stylistic variants with other control structures, notably in eliminating else statements.

Omit else following return[6]

Compare:

if a:
    return ...
else:
    return ...

to:

if a:
    return ...
return ...

Compare:

if a:
    return ...
elif b:
    return ...
else:
    return ...

to:

if a:
    return ...
if b:
    return ...
return ...
Use switch

In languages with a switch statement, this can replace multi-way conditionals.

switch (x) {
case a:    
    return ...
case b:
    return ...
default:
    return ...
}
Use elseif

Some languages have elseif statements (elsif, elif) to reduce nesting in if clauses within else clauses, functioning similarly to a switch: Compare:

if a:
    return ...
else:
    if b:
        return ...
    else:
        return ...

to:

if a:
    return ...
elif b:
    return ...
else:
    return ...

or:

if a:
    return ...
elif b:
    return ...
return ...

Nested loops

[edit | edit source]

Nested loops are natural for multidimensional data, but for sequential processing of single-dimensional data, nested loops are often unnatural and can be replaced by flatter structures.

Sequential loops

[edit | edit source]

A subtler issue occurs when processing a sequence of data by first doing one thing on some of the data, and then switching to a different state and process the rest of the data. One can do this by nested loops, but more natural is to break the loop and then continue in a separate loop.

In many languages, such as C, this is done by having an auxiliary index variable that is shared between the two loops.

Compare:

for (int i = 0; i < n; i++) {
    foo(a[i]);
    if (...) {
        for (int j = i; j < n; j++) {
            bar(a[j])
        }
        break;
    }
}

with:

int i = 0;
for (; i < n; i++) {
    foo(a[i]);
    if (...)
        break;
}
for (; i < n; i++) {
    bar(a[i])
}

This shows the sequential flow more clearly, and avoids the nesting.

In languages such as Python that implement iterators, this can be done without an auxiliary variable, since the index state is contained in the iterator:

l = iter(a)
for x in l:
    foo(x)
    if ...:
        break
for x in l:
    bar(x)

Switching between loops

[edit | edit source]

A more complex example occurs when you want to switch back and forth between two ways of processing data, such as iterating through a string overall vs. within words. In general the most elegant solution is via mutually recursive coroutines (with tail calls), operating on a shared iterator (or index variable), though in languages without coroutines this is instead done via a state machine, or sometimes mutually recursive subroutines.

In simpler cases where there is a main loop and a secondary loop (such as looping through a string, secondarily operating on its words), there is a natural nested structure. In this case simply factoring the secondary loop into a separate function is sufficient to remove the nesting. Compare:

for (int i = 0; i < n; i++) {
    foo(a[i]);
    while (...) {
        ...
    }
}

to:

for (int i = 0; i < n; i++) {
    foo(a[i]);
    other(a, &i, n);
}

void other(char *a, int *i, int n) {
    ...
}

Module complexity

[edit | edit source]

While deeply nested code within a single function is undesirable, having separate modules, functions, and nested functions is an important form of modularity, particularly due to restricting scope. As a rule, it’s better to have all functions in a module be related to each other (for cohesion), which favors a high level of factoring and separate modules. That said, this can increase module complexity, particularly in extreme cases such as Java, with its restriction of one public top-level class per file.

Nested functions

[edit | edit source]

Some languages, like Haskell, Kotlin, Pascal, Python, or Scala, allow for declaring helper functions as **nested functions**. The helper function is declared inside the body of another outer value or function. The scope of the helper function is then limited to the body of the outer function. Some upsides of nested functions are the following ones:

  • By being declared inside the outer function, the body of the inner function can refer to any method/function/variable declaration in its enclosing scope. This reduces the need to declare parameters in the function, and pass those parameters back and forth.
  • The inner function is thus hidden inside the implementation of the outer function, without a need to expose it outside.

Compare:

def foo():
    def foo_helper():
        ...

    ...
    foo_helper()
    ...

to (note the explicit parameter passing):

def foo():
    ...
    _foo_helper(x)
    ...
 
def _foo_helper(x):
    ...

However, a programmer may also prefer to declare a function that could be nested outside of the inner function and in the main scope, even if that means adding extra parameters in declarations and calls. Some reasons why he/she may choose to do so are the following ones:

  • Nested functions are most useful for having access to the enclosing state, not because their own scope is restricted. If the nested function does not depend or use most of the variables or functions in the scope of the outer function, it may be preferable to declare it as a top-level private function.
  • The helper function may be too long, so it may be clearer to have a separate, private, top-level function.
  • The inner function may actually carry a particularly tricky algorithmic process, which the developer may prefer to unit-test in isolation.
  • If the outer function has access to mutable data or state, he/she may want to make it clear where and when those may be modified. Moving the helper function out of the scope of those variables makes it obvious that it does not modify those variables, and also reduces the length of the code that can.

References

[edit | edit source]
  1. Zen of Python
  2. Noam Chomsky and Gerald Weinberg (1986)
  3. a b c d Code Complete, Steve McConnell
  4. a b Flattening Arrow Code”, Coding Horror: programming and human factors, by Jeff Atwood, January 10, 2006
  5. Reducing Code Nesting”, Eric Florenzano’s Blog, Jan 02, 2012
  6. Refactoring, Martin Fowler