AsyncLocalStorage and Structured Logging in Nest.js

Sunday, 11 July 2021

Hero

Logging is invaluable in backend services. Structured logging makes logs more useful by enabling you to categorize, analyze and search logs effectively. Sadly though, setting up structured logging correctly is difficult, especially in dependency injection-oriented frameworks such as Nest.js.

This blog post is code-snippet heavy. It's the most succinct way to convey the problem I'm trying to solve, so prepare to read a lot of code.

Consider the following AuthService that validates a user's credentials and logs information using a hypothetical StructuredLogger:

@Injectable()
class AuthService {
  constructor(
    private readonly logger: StructuredLogger,
    private readonly auth0: Auth0Service,
  ) {}
 
  validate = async (creds: Credentials): Promise<string | Error> => {
    this.logger.info({ ...creds }, "Validating credentials");
 
    const result = await checkCredentials(creds);
    if (!result.success) {
      this.logger.error({ error: result.error }, "Invalid credentials");
      return result.error;
    }
 
    this.logger.info("Credentials valid");
    return result.accessToken;
  };
}

The validate method produces the following log statment whenever it runs:

[2021-07-11 06:56:49.574 +0000] INFO: Validating credentials
    username: "johndoe"
    scopes: [
      "offline profile email"
    ]
    redirectUrl: "https://example.com/auth/redirect"

It is useful because it contains the arguments passed to the validate method. A clever log parser can then process this line to separate the log message ("Validating credentials") and the log metadata (username, scopes, redirectUrl) to produce clean logs.

Context of Execution#

While the above log statement is alright, it can be better. A key piece of information missing from it is its context of execution, such as the name of the service and the method that produced it, as well as the ID of incoming request that triggered the service.

Service Name#

It's easy to add the name of the service to the logger. Most logging packages support extending a parent logger with some default metadata that is appended to every log line. Therefore we can inject a common base logger, and extend it inside our service's constructor with the service name.

@Injectable()
class AuthService {
  private readonly logger: StructuredLogger;
 
  constructor(
    baseLogger: StructuredLogger,
    private readonly auth0: Auth0Service,
  ) {
    this.logger = baseLogger.child({ serviceName: AuthService.name });
  }
 
  validate = async (creds: Credentials) => {
    this.logger.info({ ...creds }, "Validating credentials");
    // ...
  };
}

Now the same log statement includes the service name as well:

[2021-07-11 06:56:49.574 +0000] INFO: Validating credentials
    serviceName: "AuthService"
    username: "johndoe"
    scopes: [
      "offline profile email"
    ]
    redirectUrl: "https://example.com/auth/redirect"

Method Name#

You can extend the service's logger inside every method to append the methodName property to the logger's metadata:

validate = async (creds: Credentials) => {
  const methodLogger = this.logger.child({ methodName: "validate" });
  this.logger.info({ ...creds }, "Validating credentials");
  // ...
};

Now the same log line contains the methodName attribute as well:

[2021-07-11 06:56:49.574 +0000] INFO: Validating credentials
    serviceName: "AuthService"
    methodName: "validate"
    username: "johndoe"
    scopes: [
      "offline profile email"
    ]
    redirectUrl: "https://example.com/auth/redirect"

Request ID#

The ID of a request is typically contained inside the x-request-id header in an incoming request. Therefore, the simplest way of adding it to the logs is to pass the incoming request to the validate method, and let it extend its parent logger with a requestId.

validate = async (creds: Credentials, req: Request) => {
  const requestId = req.headers["x-request-id"];
  const correlationId = req.headers["x-correlation-id"];
  const methodLogger = this.logger.child({
    methodName: "validate",
    reqId: requestId,
    cid: correlationId,
  });
  // ...
};

Now the logs include the request ID too:

[2021-07-11 06:56:49.574 +0000] INFO: Validating credentials
    serviceName: "AuthService"
    methodName: "validate"
    reqId: "4d426f0a-cf13-47de-a824-f98d98ff7e83"
    cid: "fa38e3d4-6fc0-4bd6-b5cd-cb9f43f994f8"
    username: "johndoe"
    scopes: [
      "offline profile email"
    ]
    redirectUrl: "https://example.com/auth/redirect"

It works, but not really.

It pollutes the signature of the validate method, which should only be concerned with the credentials argument.
It makes AuthService harder to test, as now every test must stub or mock the req parameter when testing the validate method.
It's terribly inconvenient. Extending a logger at every step of execution is a fragile process, and someone on your team can easily forget to do so.

Injecting a Customized Logger#

An alternative solution to this problem is to inject a customized logger using Nest.js that is scoped to a request:

export const LoggerProvider: Provider<StructuredLogger> = {
  provide: StructuredLogger,
  scope: Scope.REQUEST,
  useFactory: () => new StructuredLogger(),
};

You might be able to do some ExecutionContext magic and inject a request's ID into this logger, but I'm not sure how. Either way, this solution has the following problems:

You're forced to inject a logger instance, even though loggers should be created and destroyed freely.
This provider is limited to a Request Scope, which means it is recreated on every request. It implies that any service or controller that depends on this provider will also be request scoped, regardless of any explicitly specified scope, thus causing needless object allocations on every request.

Enter: `AsyncLocalStorage`#

AsyncLocalStorage is a feature introduced in Node v14 as a part of the async_hooks package. It allows you to access an implicitly passed object anywhere in an asynchronous call chain. Its canonical use case is to keep a track of a request ID throughout the request's lifetime.

While the async_hooks API is unstable, the AsyncLocalStorage class is considered stable in Node.js. It has been backported to Node.js v12 too.

Middleware#

You can create a functional middleware to create a request context for every incoming request with AsyncLocalStorage.

First, define a global store for storing contexts of every incoming request:

interface RequestContext {
  [key: string]: string;
}
 
const globalStore = new AsyncLocalStorage<RequestContext>();

Then create a couple of convenience methods that enable wrapping a request with a context and accessing that context within the request:

// Allows easy access to a request's context
const ctx = (): RequestContext => {
  const context = globalStore.getStore();
  if (!context) {
    throw new NoContextError("No context attached to the current req");
  }
  return context;
};
 
// Allows wrapping a request in a context
const runWithCtx = (
  fx: (ctx: RequestContext) => Promise<unknown>,
  context: RequestContext = {},
) => {
  globalStore.run(context, () => {
    return fx(ctx());
  });
};

Then create a Nest Functional Middleware that wraps every request in a context:

const withContext = (_req: Request, _res: Response, next: NextFunction) => {
  runWithCtx(() => next(), {});
};

Then create a middleware that stores a request's x-request-id and x-correlation-id headers in the request context:

const withRequestId = (req: Request, _res: Response, next: NextFunction) => {
  const context = ctx();
  context.correlationID = req.headers["x-correlation-id"];
  context.requestID = req.headers["x-request-id"];
  return next();
};

Finally, apply the withContext and withRequestId middleware as early as possible in the Nest application's middleware chain. Ensure that withContext executes before withRequestId:

async function main() {
  const app = await NestFactory.create(AppModule);
  app.useLogger(withContext, withRequestId);
  await app.listen(3000);
}

With this setup complete, you can now access the request context implicitly inside any service or controller. I recommend creating a logger factory method to automatically initialize a logger with such details:

const createLogger = (
  serviceName: string,
  methodName: string,
): StructuredLogger => {
  const context = ctx();
  const { requestId, correlationId } = context;
  const baseLogger = new StructuredLogger();
  const customizedLogger = baseLogger.child({
    serviceName,
    methodName,
    requestId,
    correlationId,
  });
  return customizedLogger;
};

Usage#

The createLogger method greatly simplifies the logging setup in AuthService:

@Injectable()
class AuthService {
  constructor(private readonly auth0: Auth0Service) {}
 
  validate = async (creds: Credentials): Promise<string | Error> => {
    const logger = createLogger(AuthService.name, "validate");
    logger.info({ ...creds }, "Validating credentials");
 
    const result = await checkCredentials(creds);
    if (!result.success) {
      logger.error({ error: result.error }, "Invalid credentials");
      return result.error;
    }
 
    logger.info("Credentials valid");
    return result.accessToken;
  };
}

You no longer need a logger parameter in the constructor.
The createLogger method simplifies extending a base logger with metadata.
Since the logger is not a class property anymore, you don't need to access it with this.logger.

Performance Caveat#

I like this log setup, but as with everything else in programming, it's a compromise. AsyncLocalStorage has a non-negotiable performance hit. It's quite small, but measurable. Make sure to benchmark your setup to check if the convenience is worth the performance penalty.

Here are the results from a rudimentary benchmark for an endpoint that returns "Hello world!" to every request, using the Apache Benchmarking tool. I ran the test on an Apple M1 MacBook Air running Node v16.1.0.

Request Throughput Request Latency

Without AsyncLocalStorage middlware:

$ ab -n 1000 -c 100 localhost:3000/greet
 
Server Software:
Server Hostname:        localhost
Server Port:            3000
 
Document Path:          /greet
Document Length:        13 bytes
 
Concurrency Level:      100
Time taken for tests:   0.201 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      212000 bytes
HTML transferred:       13000 bytes
Requests per second:    4965.98 [#/sec] (mean)
Time per request:       20.137 [ms] (mean)
Time per request:       0.201 [ms] (mean, across all concurrent requests)
Transfer rate:          1028.11 [Kbytes/sec] received
 
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   1.0      0       3
Processing:     5   19   5.8     18      33
Waiting:        2   11   4.5     11      29
Total:          5   20   6.4     18      35
WARNING: The median and mean for the initial connection time are not within a normal deviation
        These results are probably not that reliable.
 
Percentage of the requests served within a certain time (ms)
  50%     18
  66%     21
  75%     22
  80%     25
  90%     29
  95%     35
  98%     35
  99%     35
 100%     35 (longest request)

With AsyncLocalStorage middlware:

$ ab -n 1000 -c 100 localhost:3000/greet
 
Server Software:
Server Hostname:        localhost
Server Port:            3000
 
Document Path:          /greet
Document Length:        13 bytes
 
Concurrency Level:      100
Time taken for tests:   0.219 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      382000 bytes
HTML transferred:       13000 bytes
Requests per second:    4565.08 [#/sec] (mean)
Time per request:       21.905 [ms] (mean)
Time per request:       0.219 [ms] (mean, across all concurrent requests)
Transfer rate:          1702.99 [Kbytes/sec] received
 
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   1.1      2       4
Processing:     4   19   8.4     17      41
Waiting:        3   11   6.2     10      38
Total:          5   20   8.9     18      43
 
Percentage of the requests served within a certain time (ms)
  50%     18
  66%     21
  75%     21
  80%     26
  90%     41
  95%     42
  98%     42
  99%     42
 100%     43 (longest request)

4965.98 vs 4565.08 requests per second. I think I can take a loss of 400 requests per second for the added convenience, but you should decide what works for you based on your usecase.