DynamoDB Tutorial

Introduction

This article describes how to establish a connection to an AWS DynamoDB database with the following Processors.

  • PutDynamoDB (or PutDynamoDBRecord) to enter data into a table. - GetDynamoDB to retrieve data from a table.

  • DeleteDynamoDB to delete data from a table. == Prerequisites To use the DynamoDB Processors, you first need an existing AWS DynamoDB table DeleteDynamoDB to delete data from a table.

Prerequisites

To be able to use the DynamoDB Processors, you first need an existing AWS DynamoDB table. To establish a connection to this, the following prerequisites must be given or known:

  • the name/address of the DynamoDB table to be accessed.

  • a form of authentication for AWS

  • The name of the hash and range keys of the table.

Authentication

For authentication, we recommend using the Services AWSCredentialsProviderControllerService.

Alternatively, the authentication data can also be specified directly in the individual Processors.

Even if the authentication for the Processors runs via the AWSCredentialsProviderControllerService, the correct value for the "Region" property must always be given in the Processors.

Basic Put and Get operations

This paragraph explains the basic functionality of the PutDynamoDB and GetDynamoDB Processors. We will look at the following small example.

DynamoDB is a NoSQL database. Data is stored here using a primary (hash key) and a secondary (range key) key.

Given is a DynamoDB table in which information about users is to be stored. The table has a hash key and a range key:

  • Hash key: User

  • Range key: Range

Now we want to enter basic information about our users in the table using the PutDynamoDB Processor. In this example, we assume that the information is given in the form of a JSON object:

{
"name" : "Max",
"surname" : "Mustermann",
"email" : "max.muster@mustermail.com",
"phone": "+49 (0)123 456789"
}

To store this data in the DynamoDB table, we set this object as the content of a FlowFile. In addition, the FlowFile requires the following attributes for the hash and range key:

  • dynamodb.item.hash.key.value

  • dynamodb.item.range.key.value

In this example, we use the user’s email for the hash key and the string info as the range key. Here you can see how to generate a corresponding FlowFile using a GenerateFlowFile Processor.

Generate Mockup Data

Now we need to configure the PutDynamoDB Processor. In addition to the already known properties, we also need to specify the name of the attribute under which the incoming data is to be stored. This is done using the property Json Document Attribute. In this example, we simply call this attribute data.

Both PutDynamoDB and GetDynamoDB can only read or write a single attribute of the data for given hash/range keys. write. A later paragraph will explain how multiple attributes and multiple range keys can be written using PutDynamoDBRecord. An analogous functionality does not currently exist for GetDynamoDB.

PutDynamoDB

A configuration for a PutDynamoDB Processor could then look like this:

PutDynamoDB config

If you now use this configuration in a flow together with the previously generated FlowFile as follows, the data is inserted into the table (or changed if it already exists).

Flow

If executed successfully, the FlowFile is forwarded to the Success Relation without any changes.

After execution, the table could look like this:

User Range data

max.muster@mustermail.com

info

{ "name" : "Max", "surname" : "Mustermann", "email" : "max.muster@mustermail.com","phone": "+49 (0)123 456789"}

GetDynamoDB

The GetDynamoDB Processor works in the same way as the PutDynamoDB Processor. A configuration could look like this.

GetDynamoDB

As before, the hash key, range key and attribute are read from the (FlowFile) attributes of an incoming FlowFile. If the corresponding data is available in the table, it is entered as content in the FlowFile and this is forwarded to the success relation. If the data cannot be found in the table, the FlowFile is forwarded to the not found relation instead.

The property Range Key Value is not marked as mandatory for the Processors GetDynamoDB and DeleteDynamoDB. If a DynamoDB table is created with hash and range key, these Processors must always be called including the range key. Retrieving or deleting all data based only on the hash key is only possible for tables without a range key. For a table without a range key, the corresponding attributes and properties are ignored.

Deleting data from the database

To delete data from the table, use the DeleteDynamoDB Processor. As before, the attributes of an incoming FlowFile specify the hash and range key of the data to be deleted. All attributes of the corresponding data are deleted.

The configuration is analogous to the GetDynamoDB Processor.

Special case PutDynamoDBRecord

The PutDynamoDBRecord Processor allows multiple data (in record format) to be written to the database at once. Data with different range keys as well as different attributes can be written at the same time.

The data must be given in a format that can be read by a record reader and must be given in the correct format depending on the application.

The properties of this Processor differ significantly from the conventions of the previously explained Processors.

  • Partition Key Field is the name of the hash key of the table. In our case, User.

  • Sort Key Field is the name of the range key of the table. In our case, Range.

For both keys, it is possible to determine them dynamically and with different strategies.

  • The hash key can either be given as a FlowFile attribute as usual, read as a field from the incoming record or generated dynamically from a UUID.

  • The range key can either be omitted (only possible for tables without a range key), read from a field in the incoming record or created using a counter.

Example

As before, we use a table with:

  • Hash Key: User

  • Range Key: Range

In this example, we want to store payment information for our user in addition to basic information. So we have not just one, but 2 JSON objects that we want to store. So that we can use these with the PutDynamoDBRecord Processor, we need to put them in the correct format:

[
{
"Range" : "info",
"data" : {
    "name" : "Max",
    "surname" : "Mustermann",
    "email" : "max.muster@mustermail.com",
    "phone": "+49 (0)123 456789"
    }
},
{
"Range" : "payment",
"paymentMethod" : "credit card",
"data" : {
    "paymentMethod": "credit card",
    "lastFourDigits": "5678",
    "expirationDate": "12/28",
    "billingZip": "54321"
    }
}
]

This is a list of 2 records, both of which are to be entered in the table. As both records have the same hash key, we will specify this using an attribute in the FlowFile. To do this, we again use the attribute dynamodb.item.hash.key.value and set the value to the E-Mmail max.muster@mustermail.com.

FlowFile

The range key is specified here by the fields Range in the records and each additional record field is used as an attribute. (The name of this field must correspond to the name of the Sort Key Field property)

In this example, this means: The first record in the list is entered in the database with the hash key max.muster@mustermail.com and range key info and has an attribute with the name data.

For the second record with the range key payment, the attribute paymentMethod is also entered in addition to the data attribute.

The corresponding configuration for the PutDynamoDBRecord Processor:

PutDynamoDBRecord