April 27, 2021

AWS CDK Lambda and DynamoDB dependency management

Managing dependencies between Lambdas and DynamoDB tables can get ugly.
The default way of allowing a Lambda function to access DynamoDB is done like so:


const tableHandle = new dynamodb.Table(stack, "Table", {
  partitionKey: { name: "id", type: dynamodb.AttributeType.STRING },
});

const functionHandle = new TypeScriptFunction(stack, "Add-Function", {
  entry: require.resolve("@calculator/add/src/handler.ts"),
  environment: {
    TABLE_NAME: tableHandle.tableName, // adding the table name to the environment
  },
});

tableHandle.grantReadWriteData(functionHandle); // grant the lambda access

And then in your code, you'd do:


await this.documentClient
  .scan({ TableName: process.env.TABIE_NAME }) // using the env variable from lambda definition
  .promise();

As you probably already know, this pattern comes with some potential issues.

  1. First is the problematic usage in code - there is no way to match that the environment variables name set on the function match what you are trying to access from the code. (I actually did put a typo there, did you spot it?)
    Although there are things to mitigate this a bit, for example - never use the env variables directly, but have centralized functions that do that, like:

const getTableName = () => process.env.TABLE_NAME;

Still, no verification is happening, and if someone removes the environment variable or changes its name, you won't be able to know until you get a runtime error.

  1. Another problem is the need to pass handlers around. For small stacks that might actually have only one function and one table, that's a non-issue, but if you have a large application with tens or even hundreds of lambdas, and multiple tables, it gets ugly.
  2. Related to number 2 - since you have to pass things around, they have to be introduced in order. Let's say we want to add a lambda that will watch the stream of events in that table, and maybe create some cache or aggregation in another table. It will have to be declared after the initial table. Then let's have another function that reads from that cache. It might seem like that order is correct, and if you are happy to keep things that way - great! Nonetheless - you should not be forced to. Sometimes it might make more sense to group and order things by functionality, not by their dependency order.
  3. You have to remember to grant the permissions to read the Table to the lambda function. It seems like a sensible thing to do, but when you think about it - it wouldn't make sense to add the environment variable if we didn't also grant the permissions. Similarly - it would not make sense to grant permissions if we didn't somehow expose information to the lambda about how to connect to the table. That means - we should be able to do this in one step. (again, a frequent source of errors that are only visible run-time)
  4. Handlers are only typed as a generic CDK Lambda/DynamoDB Table. That means, if you need to pass many of them around there is no way to see a problem before, again, a run-time error.
    Consider a lambda function that requires access to multiple tables:

const createTablesAggregator = (
  stack: Stack,
  someTable: ITable,
  otherTable: ITable,
  yetAnotherTable: ITable
) => {
  new TypeScriptFunction(stack, "Aggregator-Function", {
    entry: require.resolve("@calculator/aggregator/handler.ts"),
    environment: {
      SOME_TABLE: someTable.tableName,
      OTHER_TABLE: otherTable.tableName,
      YET_ANOTHER_TABLE: yetAnotherTable.tableName,
    },
  });
};

and then somewhere else you would call:


createTablesAggregator(stack, someTable, yetAnotherTable, otherTable);

TypeScript would have no way of catching this mistake - everything would deploy. In the best-case scenario things would not work, the worst-case scenario, you might mess up the tables that were passed in the wrong order (maybe the schema for the tables was compatible, and your code successfully did an operation that should happen in the other table). Again - for a small stack this might seem like a non-issue. However, once you have a large one, and multiple people change the CDK code at the same time, it's very easy to mess this up.

What?

By now you are hopefully convinced that there are areas for improvements. Our solution is based on having a central "registry" for Lambdas and Dynamo Tables.

The registry allows you to later reference those constructs by names, instead of passing them around. (which takes care of problems 2, 3, 5).


registerTable(stack, AvailableTables.TABLE, {
  partitionKey: { name: "id", type: dynamodb.AttributeType.STRING },
}); // registerTable is a custom wrapper, trivial to implement yourself, see example below

new ToolkitFunction(stack, AvailableLambdas.ADD, {
  entry: require.resolve("@calculator/add/src/handler.ts"),
  addDependencies: [addTables(AvailableTables.TABLE)],
});

Using the addDependencies automatically adds the permissions (RW by default, trivial to add an option to specify more limited permission) - which takes care of problem number 4.

We are left with problem number 1, which is solved by using a helper function in your code:


export const getDynamoTableName = (tableName: AvailableTables) =>
        process.env[`DYNAMODB_${AvailableTables[tableName]}`]

getDynamoTableName(AvailableTables.TABLE)

To see how this all connects together take a look at dependencyManagement branch of our [xolvio/aws-sales-system-example/dependencyManagement]
(https://github.com/xolvio/aws-sales-system-example/tree/dependencyManagement)

Let me know if you have any questions or thoughts in the comments below.

Keep reading