Replication

This service replicates data from an RDBMS to OpenSearch.

Impact on system performance

Replication has a direct impact on system performance. In particular, the number of threads (Replication → Settings → Replication_Threads) and the interval (replicationInterval) play a role. The more jobs run in a short time, the higher the CPU load. This applies in particular if a lot of data is replicated.

Although the replicationBlockSize has a positive effect on the replication duration, it also has a direct influence on the memory requirements of Apache Karaf.

Preparation for data record updates

A few prerequisites should be created before setting up the replications. Replication requires a column with exclusively incrementing timestamps for each table (log as well as child log), as this timestamp is used to subsequently find new/updated data records. If different processes, servers, …​ log into the DB in parallel and therefore possibly with a time delay (even if only milliseconds), it can happen that individual data records are not displayed in the BPC because they are committed with a timestamp older than the latest replicated one.

It is therefore recommended:

  1. log not the server time of the logging server, but that of the DB server, as different process servers may have slightly different times

  2. Ensure that this column is actually rewritten with every insert or update

Both can be done simply via an additional (hidden) column with default value and associated trigger. This has the advantage that nothing changes at all for the logging process)

Adding a technical TIMESTAMP column for replication

Oracle

For PM log and child log (without prefix) under Oracle, the procedure for creating then looks like this, for example:

--LOG:
--Add a column (with 6 decimal places of precision); set the default value for inserts to the current time in UTC (this eliminates the need for adjustments in the logging process)
ALTER TABLE LOG ADD (DB_UPDATE_TS TIMESTAMP DEFAULT SYSTIMESTAMP AT TIME ZONE 'UTC' NOT NULL);

--“Redistribute” existing log entries; otherwise, replication won't handle it well:
--Either put them all in one block (!not recommended for tables with more than 1 million entries!):
UPDATE LOG SET DB_UPDATE_TS = TIMESTAMP;
Commit;

  --or for large datasets and live systems that also need to perform operations on the tables, as an anonymous PL/SQL block with minimal undo tablespace requirements:
  --In this case, an index should also be created on the timestamp column to avoid full table scans!
    declare begin
      FOR counter IN 0 .. 3650 LOOP
        --dbms_output.put_line(to_char(to_date('2010-01-01', 'YYYY-MM-DD') + counter, 'YYYY-MM-DD') || ' - ' || to_char(to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter, 'YYYY-MM-DD'));
        --LOG:
        update log set DB_UPDATE_TS = TIMESTAMP where TIMESTAMP between to_date('2010-01-01', 'YYYY-MM-DD') + counter and to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter;
        commit;
        --CHILDLOG:
        update childlog set DB_UPDATE_TS = TIMESTAMP where TIMESTAMP between to_date('2010-01-01', 'YYYY-MM-DD') + counter and to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter;
        commit;
      END LOOP;
    end;

--Create a trigger that resets the column with every update (this eliminates the need for adjustments to the logging process)
CREATE OR REPLACE
TRIGGER LOG_DB_UPDATE_TS
BEFORE UPDATE ON LOG
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
DECLARE
BEGIN
:NEW.DB_UPDATE_TS := SYSTIMESTAMP AT TIME ZONE 'UTC';
END;
/

--Create Index:
CREATE INDEX IDX_LOG_DBLU ON LOG(DB_UPDATE_TS) COMPUTE STATISTICS;


--Here's the whole thing again for Childlog:
ALTER TABLE CHILDLOG ADD (DB_UPDATE_TS TIMESTAMP DEFAULT SYSTIMESTAMP AT TIME ZONE 'UTC' NOT NULL);

--See the PL/SQL block above, if applicable!
UPDATE CHILDLOG SET DB_UPDATE_TS = TIMESTAMP;
commit;

CREATE OR REPLACE
TRIGGER CHILDLOG_DB_UPDATE_TS
BEFORE UPDATE ON CHILDLOG
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
DECLARE
BEGIN
:NEW.DB_UPDATE_TS := SYSTIMESTAMP AT TIME ZONE 'UTC';
END;
/

CREATE INDEX IDX_CHILDLOG_DBLU ON CHILDLOG(DB_UPDATE_TS) COMPUTE STATISTICS;
--Finished

MSSQL

TIMESTAMP should be drawn in UTC, as otherwise there will be a blind spot between 2 and 3 o’clock during the winter time changeover! The code does not yet take this into account. use SYSUTCDATETIME instead of SYSDATETIME - but check the behavior in the BPC.

The trigger for childlog is missing. Normally, however, this is not required as the entries are only ever added and not updated.

/* Add columns. Use DATETIME2 for higher precision than DATETIME. Since MSSQL does not support millisecond precision, use the SYSDATETIME() function instead of current_timestamp; this function provides nanosecond precision. */
ALTER TABLE [LOG] ADD DB_UPDATE_TS DATETIME2 DEFAULT SYSDATETIME() NOT NULL;
GO

ALTER TABLE [CHILDLOG] ADD DB_UPDATE_TS DATETIME2 DEFAULT SYSDATETIME() NOT NULL;
GO

/* Initialize columns */
BEGIN
    UPDATE [LOG] SET DB_UPDATE_TS = [timestamp];
END
GO

BEGIN
    UPDATE [CHILDLOG] SET DB_UPDATE_TS = [timestamp];
END
GO

/* Trigger */
CREATE TRIGGER LOG_DB_UPDATE_TS
  ON [LOG]
  AFTER UPDATE
  AS
BEGIN
    IF NOT UPDATE(DB_UPDATE_TS)
    BEGIN
        UPDATE t
            SET t.DB_UPDATE_TS = SYSDATETIME()
            FROM [LOG] AS t
            INNER JOIN inserted AS i
            ON t.PROCESSID = i.PROCESSID;
    END
END
GO

/* Indices */
CREATE INDEX IDX_LOG_DBLU ON LOG (DB_UPDATE_TS);
CREATE INDEX IDX_CHILDLOG_DBLU ON CHILDLOG (DB_UPDATE_TS);

MySQL

and for MySQL quite simply:

/* Adds a sufficiently precise timestamp managed by the database to the table */
ALTER TABLE LOG ADD DB_UPDATE_TS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6)
ALTER TABLE CHILDLOG ADD DB_UPDATE_TS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6)

PostgreSQL

and for PostgreSQL also via trigger:

ALTER TABLE log ADD db_update_ts timestamp NOT null DEFAULT (timezone('UTC', now()));
ALTER TABLE childlog ADD db_update_ts timestamp NOT null DEFAULT (timezone('UTC', now()));

CREATE OR REPLACE FUNCTION update_db_update_ts_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.db_update_ts = timezone('UTC', now());
    RETURN NEW;
END;
$$ language 'plpgsql';

CREATE TRIGGER update_log_db_update_ts BEFORE INSERT OR UPDATE ON log FOR EACH ROW EXECUTE PROCEDURE update_db_update_ts_column();
CREATE TRIGGER update_childlog_db_update_ts BEFORE INSERT OR UPDATE ON childlog FOR EACH ROW EXECUTE PROCEDURE update_db_update_ts_column();

CREATE INDEX idx_log_dblu ON log (db_update_ts);
CREATE INDEX idx_childlog_dblu ON childlog (db_update_ts);

Configuration

Data sources

Data sources are connections to individual databases. The individual replication jobs then use/reference these. These are set up under Backend Connections of type "data_source" and referenced using the component ID.

Interface

A separate interface is available for setting up the individual replication jobs under Settings → Replication → Components → Editor. Entries can be created, deleted, duplicated and also activated/deactivated individually via the interface.

Configuration parameters of the Replication module

The various parameters and associated functions are described below. These can be found under BPC Administration → Replication → General.

Module

General module settings

Name (ID) Description

Icon
(module_iconCls)

Individually selectable icon that is displayed before the title.
If no icon is to be displayed, e.g. "none" can be entered. If the field is empty, a standard icon is selected.

Source

Settings for the source system

Name (ID) Description

Database Connection
(rdmsDataSourceName)

Database connection for accessing the data. This must first be created via Backend Connection of type Data Source.

SELECT (CTE)
(sourceCommonTableExpressionQuery)

If this value is set, this query is used as the source instead of the table name. May only contain the SELECT statement of a Common Table Expression (CTE). The WITH $sourceTable$ AS ( $sourceCommonTableExpressionQuery$ ) $bpcQuery$; is generated so that it matches the queries placed on it by the BPC. If this option is used, then 'Source_Table' (sourceTable) is used as the name of the CTE. Incidentally, this can match the name of an existing database table.

Table
(sourceTable)

Name of the table or view in the source database. If a SELECT (CTE) is specified, this is used as the name in the CTE.
Must not be empty!

Time Zone
(sourceTimeZone)

If the date columns do not contain time zone information, they are interpreted with the specified time zone.
This does not apply to the time column. There is a separate time zone setting for this.

Primary Key
(idColumns)

Columns for the creation of a unique key in OpenSearch. An incorrect configuration leads to data records being overwritten.

Last Update
(lastUpdateColumn)

The column must contain the time of the last change to the data record.
It is crucial for the replication function that the correct time is entered here for each change to the data record. It is advisable to have this set via a DB trigger.
For the performance of the replication and to reduce the load on the source database, a sorted index should always be present in this column.

Last Update Time Zone
(lastUpdateColumnTimeZone)

If the last update column does not contain time zone information, it is interpreted with the specified time zone.
Does not affect other date columns. There is a separate time zone configuration for these.

Timeout
(sourceQueryTimeoutInSeconds)

Defines how long the JDBC driver waits for a response from the DB. Specified in seconds.

Target

Settings for the target system

Name (ID) Description

Index
(targetIndex)

Name of the index in which the data is to be stored. This is created automatically if required.

Index Creation Settings
(targetIndexCreationSettings)

Settings that differ from the default. The value is set as its "settings" value when creating/generating an OpenSearch index. It is also used during reindexing, as this creates a new index.
If this field is empty, then the index creation settings of the Core Services are used.
If it is set, then only these are used. The index creation settings of the Core Service must then be used as a basis via copy&paste.

Field Mappings
(targetIndexMappings)

Optional setting for the individual fields (also known as mapping). This can be used, for example, to define the data type of the field.

Dynamic Field Templates
(targetIndexDynamicTemplates)

Settings that differ from the standard. The value is set as "dynamic_templates" when creating/generating an OpenSearch index in field mappings ("mappings"). Is also used during reindexing, as this creates a new index.
If this field is empty, then the dynamic field templates of the Core Services are used.

Adapt Field Names
(targetIndexCaseSensitivityOfFields)

The capitalization of the fields created in OpenSearch can be changed here. The field names are formed from the column names of the database.

Advanced Settings

Advanced settings

Name (ID) Description

Enable
(replicationEnabled)

Activates the execution of this replication job.

Start Delay
(replicationDelay)

Delay of this replication (in seconds) after the initial start of replication execution. Can be used to speed up the start of the BPC or to initially prioritize other replication jobs.

Interval
(replicationInterval)

Interval in seconds at which this replication job should be executed. This interval is not guaranteed if there are not enough threads available and too many replication jobs are running (possibly for too long).

Start Date
(replicationStartDate)

Only data that is newer than this point in time is replicated.
Date format "yyyy-MM-dd' 'HH:mm:ss.SSS".

Day Range per Run
(replicationBlockDayRange)

If data from the past is replicated, this controls the number of days that should be processed for each job run. A high value can lead to the source database and OpenSearch being heavily loaded. This may also block other replication jobs.

Maximum Records
(replicationBlockSize)

Maximum number of data records that can be loaded at once from the source database and written to OpenSearch. This value directly influences the consumption of memory in Karaf, as the memory required to store all data records in the maximum size in the memory is reserved. However, a large value has a positive effect on the speed of replication.

Replicate Binary Files
(replicationSyncFiles)

Replicates columns of type BLOB.
Attention! This can significantly affect the memory requirements and performance of OpenSearch.

Decompress Binary Files
(replicationUnzipSyncedFiles)

If binary data is replicated and the option is activated, the system checks whether the data was compressed with GZip and decompresses it before saving.

Restart Replication Where Left Off
(restartReplicationWhereLeftOff)

If this option is active, the replication job is continued at the point in the data where it last stopped when it is restarted or reactivated. Otherwise, replication begins at the configured start time.

Adjust Date Limit
(adjustUpperDateLimitInSeconds)

Adjustment of the upper time limit (in seconds). Influences the upper date limit when selecting data by adding this value to the current time. Changing the value can lead to data records being replicated unnecessarily several times or changes only being replicated with a certain delay.

Data Management Organization ID
(vamOrganizationId)

If set, special features are used for the replication of data management data. Specifically, only the data of the entered organizationId (as in data management configuration) is replicated

Enable
(replicationLoggingEnabled)

Activates logging for this replication job.
In addition, logging must be generally active on the replication module.

Shadow Copy

A shadow copy copies all documents (created after the defined 'replicationStartDate') of the OpenSearch index (see 'targetIndex') into a new index at the specified time. In the end, the alias is switched to the new index and the previous index is deleted.

Name (ID) Description

Enable
(replicationShadowCopyEnabled)

Activate creation of shadow copies.

Cron Pattern
(replicationShadowCopyCronPattern)

A Cron-like pattern (using Quartz Scheduler syntax) to define when the shadow copies are to be created. (See <a target="_blank" href="https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html">Quartz CronTrigger Documentation</a>)

Number of Copies
(replicationShadowCopyKeepCopiesCount)

Number of shadow copies to be kept.

Tail Sync

The 'Tail Sync' works on the current index (see 'targetIndex') and synchronizes older data (new/changed/deleted database records) with OpenSearch so that they match the database again.

Name (ID) Description

Enable
(replicationTailSyncEnabled)

Activates the Tail Sync function. This deletes old data records or subsequently synchronizes data records between the source and target. This is necessary if data records are deleted in the source. New or changed data records should be recognized via regular replication if the setting is correct. If the tail sync is not enabled it can still be started manually and the other tail sync settings will be used.

Cron Pattern
(replicationTailSyncCronPattern)

A Cron-like pattern (using Quartz Scheduler syntax) to define when the tail sync should be started. (See <a target="_blank" href="https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html">Quartz CronTrigger Documentation</a>)

Maximum Records
(replicationTailSyncBlockSize)

Maximum number of data records that can be loaded at once from the source database and written to OpenSearch. This value directly influences the consumption of memory in Karaf, as the memory required to store all data records in the maximum size in the memory is reserved. However, a large value has a positive effect on the speed of replication.

Start Date
(replicationTailSyncRelativeStartDate)

The relative start date specified. Synchronization starts from this date. Data before the replication start date will continue to be deleted. This date is only used if it is after the regular start date and before the end date.
Only use in special cases!

End Date
(replicationTailSyncRelativeEndDate)

The relative end date specified. Synchronization takes place up to this point in time. The end should not overlap with the regular replication (interval + buffer).
Syntax: <code>n seconds|minutes|hours|days|weeks|months|years ago</code>

Day Range per Run
(replicationTailSyncBlockDayRange)

This controls the number of days that should be processed in each run. A high value can lead to the source database and OpenSearch being heavily loaded.

Delete Old Records
(replicationTailSyncRelativeDeleteOlderThanDate)

The relative deletion date to be specified. All documents that are older are deleted. If this option is not set, it will not be executed.
Syntax: <code>n seconds|minutes|hours|days|weeks|months|years ago</code>

Enable Tail Sync logging
(replicationTailSyncLoggingEnabled)

If enabled, Tail Sync runs are logged in the `bpc-tailsync-log` index.

Consistency Check

After replication runs, a simple consistency check is performed. The number of documents in the source and target are compared.

Name (ID) Description

Consistency Check Frequency
(replicationConsistencyCheckFrequency)

The frequency with which the consistency check is performed. The number of data records in the source database is compared with the number in OpenSearch. The check can severely affect the performance of the replication and should be switched off if historical data is replicated.
0 = no check; 3 = check every 3rd replication

Lookup Joins

These can be used to enrich replicated documents with additional data. For example, if the data to be replicated only contains a partner ID, the partner's name and other info can be added for monitoring.

Name (ID) Description

Lookup Joins
(join)

Can be used to automatically enrich logging documents with additional data, e.g. if the data to be logged only contains an ID and further associated data can be loaded via an existing index.

Configuration parameters of a replication job

The various Parameters and associated functions are described below. These can be found under BPC Administration → Replication → Components. It is recommended to use the specialized interface: BPC Administration → Replication → Editor

Module

General module settings

Name (ID) Description

Icon
(module_iconCls)

Individually selectable icon that is displayed before the title.
If no icon is to be displayed, e.g. "none" can be entered. If the field is empty, a standard icon is selected.

Source

Settings for the source system

Name (ID) Description

Database Connection
(rdmsDataSourceName)

Database connection for accessing the data. This must first be created via Backend Connection of type Data Source.

SELECT (CTE)
(sourceCommonTableExpressionQuery)

If this value is set, this query is used as the source instead of the table name. May only contain the SELECT statement of a Common Table Expression (CTE). The WITH $sourceTable$ AS ( $sourceCommonTableExpressionQuery$ ) $bpcQuery$; is generated so that it matches the queries placed on it by the BPC. If this option is used, then 'Source_Table' (sourceTable) is used as the name of the CTE. Incidentally, this can match the name of an existing database table.

Table
(sourceTable)

Name of the table or view in the source database. If a SELECT (CTE) is specified, this is used as the name in the CTE.
Must not be empty!

Time Zone
(sourceTimeZone)

If the date columns do not contain time zone information, they are interpreted with the specified time zone.
This does not apply to the time column. There is a separate time zone setting for this.

Primary Key
(idColumns)

Columns for the creation of a unique key in OpenSearch. An incorrect configuration leads to data records being overwritten.

Last Update
(lastUpdateColumn)

The column must contain the time of the last change to the data record.
It is crucial for the replication function that the correct time is entered here for each change to the data record. It is advisable to have this set via a DB trigger.
For the performance of the replication and to reduce the load on the source database, a sorted index should always be present in this column.

Last Update Time Zone
(lastUpdateColumnTimeZone)

If the last update column does not contain time zone information, it is interpreted with the specified time zone.
Does not affect other date columns. There is a separate time zone configuration for these.

Timeout
(sourceQueryTimeoutInSeconds)

Defines how long the JDBC driver waits for a response from the DB. Specified in seconds.

Target

Settings for the target system

Name (ID) Description

Index
(targetIndex)

Name of the index in which the data is to be stored. This is created automatically if required.

Index Creation Settings
(targetIndexCreationSettings)

Settings that differ from the default. The value is set as its "settings" value when creating/generating an OpenSearch index. It is also used during reindexing, as this creates a new index.
If this field is empty, then the index creation settings of the Core Services are used.
If it is set, then only these are used. The index creation settings of the Core Service must then be used as a basis via copy&paste.

Field Mappings
(targetIndexMappings)

Optional setting for the individual fields (also known as mapping). This can be used, for example, to define the data type of the field.

Dynamic Field Templates
(targetIndexDynamicTemplates)

Settings that differ from the standard. The value is set as "dynamic_templates" when creating/generating an OpenSearch index in field mappings ("mappings"). Is also used during reindexing, as this creates a new index.
If this field is empty, then the dynamic field templates of the Core Services are used.

Adapt Field Names
(targetIndexCaseSensitivityOfFields)

The capitalization of the fields created in OpenSearch can be changed here. The field names are formed from the column names of the database.

Advanced Settings

Advanced settings

Name (ID) Description

Enable
(replicationEnabled)

Activates the execution of this replication job.

Start Delay
(replicationDelay)

Delay of this replication (in seconds) after the initial start of replication execution. Can be used to speed up the start of the BPC or to initially prioritize other replication jobs.

Interval
(replicationInterval)

Interval in seconds at which this replication job should be executed. This interval is not guaranteed if there are not enough threads available and too many replication jobs are running (possibly for too long).

Start Date
(replicationStartDate)

Only data that is newer than this point in time is replicated.
Date format "yyyy-MM-dd' 'HH:mm:ss.SSS".

Day Range per Run
(replicationBlockDayRange)

If data from the past is replicated, this controls the number of days that should be processed for each job run. A high value can lead to the source database and OpenSearch being heavily loaded. This may also block other replication jobs.

Maximum Records
(replicationBlockSize)

Maximum number of data records that can be loaded at once from the source database and written to OpenSearch. This value directly influences the consumption of memory in Karaf, as the memory required to store all data records in the maximum size in the memory is reserved. However, a large value has a positive effect on the speed of replication.

Replicate Binary Files
(replicationSyncFiles)

Replicates columns of type BLOB.
Attention! This can significantly affect the memory requirements and performance of OpenSearch.

Decompress Binary Files
(replicationUnzipSyncedFiles)

If binary data is replicated and the option is activated, the system checks whether the data was compressed with GZip and decompresses it before saving.

Restart Replication Where Left Off
(restartReplicationWhereLeftOff)

If this option is active, the replication job is continued at the point in the data where it last stopped when it is restarted or reactivated. Otherwise, replication begins at the configured start time.

Adjust Date Limit
(adjustUpperDateLimitInSeconds)

Adjustment of the upper time limit (in seconds). Influences the upper date limit when selecting data by adding this value to the current time. Changing the value can lead to data records being replicated unnecessarily several times or changes only being replicated with a certain delay.

Data Management Organization ID
(vamOrganizationId)

If set, special features are used for the replication of data management data. Specifically, only the data of the entered organizationId (as in data management configuration) is replicated

Enable
(replicationLoggingEnabled)

Activates logging for this replication job.
In addition, logging must be generally active on the replication module.

Shadow Copy

A shadow copy copies all documents (created after the defined 'replicationStartDate') of the OpenSearch index (see 'targetIndex') into a new index at the specified time. In the end, the alias is switched to the new index and the previous index is deleted.

Name (ID) Description

Enable
(replicationShadowCopyEnabled)

Activate creation of shadow copies.

Cron Pattern
(replicationShadowCopyCronPattern)

A Cron-like pattern (using Quartz Scheduler syntax) to define when the shadow copies are to be created. (See <a target="_blank" href="https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html">Quartz CronTrigger Documentation</a>)

Number of Copies
(replicationShadowCopyKeepCopiesCount)

Number of shadow copies to be kept.

Tail Sync

The 'Tail Sync' works on the current index (see 'targetIndex') and synchronizes older data (new/changed/deleted database records) with OpenSearch so that they match the database again.

Name (ID) Description

Enable
(replicationTailSyncEnabled)

Activates the Tail Sync function. This deletes old data records or subsequently synchronizes data records between the source and target. This is necessary if data records are deleted in the source. New or changed data records should be recognized via regular replication if the setting is correct. If the tail sync is not enabled it can still be started manually and the other tail sync settings will be used.

Cron Pattern
(replicationTailSyncCronPattern)

A Cron-like pattern (using Quartz Scheduler syntax) to define when the tail sync should be started. (See <a target="_blank" href="https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html">Quartz CronTrigger Documentation</a>)

Maximum Records
(replicationTailSyncBlockSize)

Maximum number of data records that can be loaded at once from the source database and written to OpenSearch. This value directly influences the consumption of memory in Karaf, as the memory required to store all data records in the maximum size in the memory is reserved. However, a large value has a positive effect on the speed of replication.

Start Date
(replicationTailSyncRelativeStartDate)

The relative start date specified. Synchronization starts from this date. Data before the replication start date will continue to be deleted. This date is only used if it is after the regular start date and before the end date.
Only use in special cases!

End Date
(replicationTailSyncRelativeEndDate)

The relative end date specified. Synchronization takes place up to this point in time. The end should not overlap with the regular replication (interval + buffer).
Syntax: <code>n seconds|minutes|hours|days|weeks|months|years ago</code>

Day Range per Run
(replicationTailSyncBlockDayRange)

This controls the number of days that should be processed in each run. A high value can lead to the source database and OpenSearch being heavily loaded.

Delete Old Records
(replicationTailSyncRelativeDeleteOlderThanDate)

The relative deletion date to be specified. All documents that are older are deleted. If this option is not set, it will not be executed.
Syntax: <code>n seconds|minutes|hours|days|weeks|months|years ago</code>

Enable Tail Sync logging
(replicationTailSyncLoggingEnabled)

If enabled, Tail Sync runs are logged in the `bpc-tailsync-log` index.

Consistency Check

After replication runs, a simple consistency check is performed. The number of documents in the source and target are compared.

Name (ID) Description

Consistency Check Frequency
(replicationConsistencyCheckFrequency)

The frequency with which the consistency check is performed. The number of data records in the source database is compared with the number in OpenSearch. The check can severely affect the performance of the replication and should be switched off if historical data is replicated.
0 = no check; 3 = check every 3rd replication

Lookup Joins

These can be used to enrich replicated documents with additional data. For example, if the data to be replicated only contains a partner ID, the partner's name and other info can be added for monitoring.

Name (ID) Description

Lookup Joins
(join)

Can be used to automatically enrich logging documents with additional data, e.g. if the data to be logged only contains an ID and further associated data can be loaded via an existing index.

Basic settings

Setting (Key) Data type Description

Replication_Enabled
(replicationEnabled)

boolean

Activate/deactivate the replication job

Replication_StartDate
(replicationStartDate)

String

Replication of data records that are newer than this point in time.
Date format "yyyy-MM-dd' 'HH:mm:ss.SSS"

or

as a relative value in the format:

1 second|minute|hour|day|week|month|year ago
n seconds|minutes|hours|days|weeks|months|years ago
Examples
1970-01-01 00:00:00.000
1 week ago
6 months ago
42 days ago

Source / Source

Setting (Key) Data type Description

Source_DataSource
(rdmsDataSourceName)

String

Data source to be used (ID of the Backend Connections of type data_source)

Source_Table
(sourceTable)

String

Table name of the source or CTE if optional. Name of the CTE if the optional 'Source_CommonTableExpressionQuery' is used.

Source_Timezone
(sourceTimeZone)

String

Time zone of the date fields used in the source database table (used internally TimeZone.getTimeZone). Is only applied to the actual data and not to the 'lastUpdateColumn' column.

Examples
UTC
GMT+1
GMT-8
America/Los_Angeles
Europe/Berlin
Etc/GMT
CET

Source_IdColumns
(idColumns)

String

Columns for the creation of a unique key in OpenSearch. For example: "PROCESSID,CHILDID"

Source_LastUpdateColumn
(lastUpdateColumn)

String

This column is used to determine the age of the data set

It is strongly recommended to create an index for these columns in the source database.

Source_LastUpdateColumnTimezone
(lastUpdateColumnTimeZone)

String

Time zone, which is used in the source database table for the data in the lastUpdateColumn column.

The setting is only used in conjunction with [adjustUpperDateLimitInSeconds].

Examples
UTC
GMT+1
GMT-8
America/Los_Angeles
Europe/Berlin
Etc/GMT
CET

Source_QueryTimeoutInSeconds
(sourceQueryTimeoutInSeconds)

Integer

Defines how long the JDBC driver waits for a response from the DB. See also JDBC java.sql.Statement.setQueryTimeout()

Source_CommonTableExpressionQuery
(sourceCommonTableExpressionQuery)

String

Can be used instead of database views. May only contain the SELECT statement of a Common Table Expression (CTE). The WITH $sourceTable$ AS ( $sourceCommonTableExpressionQuery$ ) $bpcQuery$; is generated so that it matches the queries made on it by the BPC. If this option is used, then 'Source_Table' (sourceTable) is used as the name of the CTE. Incidentally, this can match the name of an existing database table.

Target

Setting (Key) Data type Description

Target_Index
(targetIndex)

String

Target index in OpenSearch.
If not yet available, the index is created automatically.

Target_CaseSensitivityOfFields
(targetIndexCaseSensitivityOfFields)

String

Specifies how the fields are to be created in OpenSearch (case-sensitive).

  • asSource = are created exactly as they are returned by the database

  • lowerCase = column names are converted to lower case

  • upperCase = column names are converted to upper case

Target_IndexCreationSettings
(targetIndexCreationSettings)

JSON

Assign different settings to the target index during creation.

  • If this field is empty, then the Core Services settings → Core_IndexCreationSettings are used as usual.

  • If it is set, then only these are used. The settings of Core_IndexCreationSettings must then be used as a basis by copy&paste.

Example with added "Index Sorting"
{
   "number_of_shards": "5",
   "number_of_replicas": "1",
   "index": {
      "sort.field": "LASTUPDATE",
      "sort.order": "desc"
   },
   "analysis": {
      "normalizer": {
         "lowercaseNormalizer": {
            "filter": [
               "lowercase"
            ],
            "char_filter": [],
            "type": "custom"
         }
      }
   }
}

If "Index Sorting" is to be used, then OpenSearch mappings must also be created for the specified sorting fields (LASTUPDATE in the example) (see Target_IndexMappings).

Target_IndexMappings
(targetIndexMappings)

JSON

Assign a mapping to the target index when it is created. Should only be necessary in certain cases.

The mapping for the "Index Sorting" example from above.
{
   "properties": {
      "LASTUPDATE": {
         "type": "date"
      }
   }
}

Target_IndexDynamicTemplates
(targetIndexDynamicTemplates)

JSON

Assign a customized mapping to the target index. If this is set, then the global (see Core setting → Core_IndexDynamicTemplates) is not used.

In the Elasticsearch documentation (reference to the OpenSearch documentation) there is more information about the possibilities of Dynamic Templates.

In the following example, all fields that OpenSearch recognizes as text fields (strings) are provided with a mapping ("all_textfields") where the content is not analyzed (saves storage space and the data can still be displayed). Plus an exception ("special case"): Our standard mapping is used for all text fields that have the name postfix 'name'.

Example
[
  {
    "spezialfall": {
      "match_mapping_type": "string",
      "match": "*name",
      "mapping": {
        "type": "text",
        "fields": {
          "lowercase": {
            "normalizer": "lowercaseNormalizer",
            "type": "keyword"
          },
          "raw": {
            "type": "keyword"
          }
        }
      }
    }
  },
  {
    "alle_textfelder": {
      "match_mapping_type": "string",
      "match": "*",
      "mapping": {
        "type": "keyword",
        "analyzer": false
      }
    }
  }
]

To specify the OpenSearch type for a database field. It can happen from time to time that OpenSearch makes a mistake with the mapping and uses an inappropriate type. Concrete example: The Oracle column with the name 'NUMBER' of the data type 'NUMBER(10,2)' is created in the OpenSearch mapping as type 'long' instead of 'float'. This can be corrected with the example below.

Example
[
  {
    "ZAHL_long_als_float": {
      "match_mapping_type": "long",
      "match": "ZAHL",
      "mapping": {
        "type": "float"
      }
    }
  }
]

Advanced Settings / Advanced

Setting (Key) Data type Description

Replication_RestartWhereLeftOff
(restartReplicationWhereLeftOff)

boolean

When the server is started or a job is changed, the replication jobs are restarted and start replicating again from the beginning (see replicationStartDate). If this setting is set to 'true', replication restarts at the date of the most recent data record that was replicated. These timestamps are stored as metadata in the OpenSearch index.

Replication_Delay
(replicationDelay)

Integer

Delay in seconds after which replication is started

Replication_Interval
(replicationInterval)

Integer

Interval in seconds for replication

Replication_BlockDayRange
(replicationBlockDayRange)

Integer

Number of days (block) that are to be processed for each job run. Do not set this value too high, as this will cause an increased load on the source database and OpenSearch/Lucene will not be able to take a breather.

Example 10 days: The job has just arrived at 10.03.2015, then the data records are replicated from 10.03.2015 to 20.03.2015 and in the following run from 20.03.2015 to 30.03.2015.

Replication_BlockSize
(replicationBlockSize)

Integer

Block size for the transfer from DB to OpenSearch. Number of database records that the JDBC driver reads as a block and holds in memory.

It is currently becoming apparent that a larger block size is better. See also: Replication duration. But please do not overdo it, otherwise OutOfMemory exceptions will occur. A block size of 2500 is already set too high for some database tables.

Replication_SyncFiles
(replicationSyncFiles)

boolean

Synchronization of BLOBs

Please note that subsequent activation is currently not possible.

Replication_UnzipSyncedFiles
(replicationUnzipSyncedFiles)

boolean

Unzips synchronized file contents automatically.

Only used if replicationSyncFiles is active.

Replication_AdjustUpperDateLimitInSeconds
(adjustUpperDateLimitInSeconds)

Integer

Avoids a database problem in which record updates - which are initiated by a database trigger - are written too late and therefore cannot be taken into account during the replication run (we are talking about 1-3 seconds here).

With this option, the records are selected with an update timestamp minus the specified value. In the default setting of 0, this option is deactivated and a timestamp in the future is selected as before (replicationBlockDayRange).

Example 3 seconds: The records are selected from the timestamp of the last replicated record up to the current timestamp of the database minus 3 seconds. Of course, the most up-to-date data is not always available in the BPC, but hopefully it will be in a consistent state with the database.

Replication_VamOrganizationId
(vamOrganizationId)

String

experimental

If not empty, then special features are used for the replication of data management data. Only the data of the entered organizationId (as in data management configuration) is replicated

Replication_LoggingEnabled
(replicationLoggingEnabled)

boolean

The logging of replication runs per replication job can be activated and deactivated here.

Shadow copies

During a shadow copy, all documents (that are after the specified 'replicationStartDate') of the OpenSearch index (see 'targetIndex') are copied to a new index at the specified time. At the end, the alias is changed to the new index and the previous index is deleted. This can be done once a week, for example, to get back to a lean index.

Additional note: The documents marked as deleted in OpenSearch (either deleted manually or by the tail sync) are of course not copied to the new index. Attention: The associated replication run is suspended until the shadow copy has been created.

Setting (Key) Data type Description

ShadowCopy_Enabled
(replicationShadowCopyEnabled)

boolean

Enable creation of shadow copies (shadow copy).

ShadowCopy_CronPattern
(replicationShadowCopyCronPattern)

String

Cron-like pattern (according to Quartz-Scheduler syntax) for defining the time when the shadow copies are to be created.

Examples
0 15 16 ? * SUN = Jeden Sonntag um 16:15 Uhr
0 */30 * * * ? = Alle 30 Minuten
30 59 11 ? * 1,2,3,4,5 = Am Montag, Dienstag, Mittwoch, Donnerstag und Freitag um 11:59:30 Uhr

Further examples and documentation can be found on the Quartz-Scheduler website.

ShadowCopy_KeepCopiesCount
(replicationShadowCopyKeepCopiesCount)

Integer

Number of shadow copies to be kept.

0 = no shadow copy is kept

3 = 3 shadow copies are kept (these are in 'Close' status to save resources)

Tail Sync

The 'Tail Sync' works on the current index (see 'targetIndex') and synchronizes older data (new/changed/deleted database records) with OpenSearch so that they match the database again. This can be carried out once every night, for example. Documents that are older than the replication start date (see 'replicationStartDate') are first deleted from the OpenSearch index. It then goes through the database in blocks (default 10-day steps) and compares them with OpenSearch. New/changed database records are transferred to the OpenSearch index and documents that no longer exist in the database are deleted in OpenSearch. The Tail Sync only works with the specified 'idColumns' and 'lastUpdateColumn' fields (Hint: a suitable DB index works wonders!)). The complete data is only read out for new/changed database records.

As an alternative or in addition to a time-controlled tail sync, this can also be started manually. For this purpose, there is a button under Replication → Jobs for each replication job that starts a tail sync. The settings replicationTailSyncEnabled and replicationTailSyncCronPattern have no effect on the manual start, all other settings are taken into account.

The sync does not run up to the current date (see [replicationTailSyncRelativeEndDate]), as these records will continue to be processed by our normal replication.

Setting (Key) Data type Description

TailSync_Enabled
(replicationTailSyncEnabled)

boolean

Activate the 'Tail Sync'

TailSync_CronPattern
(replicationTailSyncCronPattern)

String

Cron-like pattern (according to Quartz Scheduler syntax) to define the time when the Tail Sync should be started.

Examples
0 5 2 * * ? = Jede Nacht um 2:05 Uhr
0 35 21 ? * Sun = Jeden Sonntag um 21:35 Uhr
30 5 3 ? * 1,2,3,4,5 = Am Montag, Dienstag, Mittwoch, Donnerstag und Freitag um 3:05:30 Uhr

Further examples and documentation can be found on the Quartz-Scheduler website.

TailSync_BlockSize
(replicationTailSyncBlockSize)

Integer

Block size of the database driver. Note the information on the option replicationBlockSize.

TailSync_RelativeStartDate
(replicationTailSyncRelativeStartDate)

String

The relative start date specified. Synchronization should then start from this date. Data that is before the replication start date will continue to be deleted. This date is only used if it is after the regular start date and before the end date.

1 second|minute|hour|day|week|month|year ago
n seconds|minutes|hours|days|weeks|months|years ago

This option should only be used in special cases. Inconsistencies may occur if records are deleted from the database that lie before this relative start date. These then remain in the OpenSearch index although they are no longer present in the database.

TailSync_RelativeEndDate
(replicationTailSyncRelativeEndDate)

String

The relative end date specified. Synchronization should then continue up to this point in time and no further.

1 second|minute|hour|day|week|month|year ago
n seconds|minutes|hours|days|weeks|months|years ago

TailSync_BlockDayRange
(replicationTailSyncBlockDayRange)

Integer

Number of days (block) in which the data should be processed. See also replicationBlockDayRange.

TailSync_RelativeDeleteOlderThanDate
(replicationTailSyncRelativeDeleteOlderThanDate)

String

The relative deletion date to be specified. All documents that are older are deleted.

If this option is not set, all documents that are older than the start date of the replication are deleted (see replicationStartDate).

TailSync_LoggingEnabled
(replicationTailSyncLoggingEnabled)

boolean

The relative deletion date to be specified. Specifies whether the tail sync runs are to be logged in the bpc-tailsync-log index for this replication job.

The global setting replicationJobsTailsyncLogEnabled must also be activated.

Consistency Check

A simple consistency check is carried out after the replication runs. The number of documents in the source and target are compared. This is done in the period 'replicationStartDate' and the last date which is in the target (OpenSearch).

If there is a lot of data (tens of millions of records), this can lead to an increased load on the system and an initial replication can be extremely slowed down. Only activate the consistency check if you are sure that the data will be fully replicated.

Setting (Key) Data type Description

ConsistencyCheck_Frequency
(replicationConsistencyCheckFrequency)

Integer

The frequency with which the consistency check is performed.

Examples:

  • 0 = is never performed

  • 1 = is performed after every replication run

  • 10 = is performed after every 10th replication run

Lookup Joins

Can be used to enrich documents to be replicated with additional data. If, for example only a partner ID is contained in the data to be replicated and the name of the partner etc. is still required in the monitor.

Denormalization is used under OpenSearch, i.e. this data is also transferred to the document and is available like all other data (high-performance search; aggregation, …​.). Prerequisite: The lookup "tables" must be available as independent indexes and can, for example, be transferred from a DB table via an additional replication.

The lookup data can be updated manually ( BPC settings → Overview → Status → Replication → Jobs → Job → Synchronize lookup joins ) and automatically using our OpenSearch BPC plugin (os-bpc-plugin).

Several lookup tables can be referenced, the possible values of an ENTRY are described below. Structure: "join": [ { EINTRAG }, { EINTRAG }, …​ ]

For the value comparison to work, the column types in the database tables must be identical. In the case of string fields, "*.raw" should be used as the lookupKeyField, but without the ".raw"

Lookup Joins Configuration example
[
    {
        "keyField": "PARTNER",
        "lookupIndex": "lookup-partner",
        "lookupKeyField": "ID.raw",
        "resultFieldsPrefix": "partner_",
        "resultFieldsExcluded": [ "ID", "LASTUPDATE" ]
    },
    {
        "keyField": "MESSAGETYPE",
        "lookupIndex": "lookup-messagetype",
        "lookupKeyField": "ID.raw",
        "resultFieldsPrefix": "messagetype_",
        "resultFieldsExcluded": [ "ID", "LASTUPDATE" ]
    }
]
Field Data type Description

keyField

String

The key field in the table to be replicated. The data from the lookup index is transferred via the value of the field.

Example: PARTNER_ID
This contains the ID of the partner and the actual data of the partner that is to be transferred in the specified lookup index. (Hint: DB Foreign Key)

keyFieldValuesSeparator

String

If there are several values in the keyField that are separated by a separator, this separator can be specified here. The lookup join is then performed on the individual values.

Example:

The 'lookupIndex' contains the following data:

  • ID = 20, LONGNAME = grams

  • ID = 30, LONGNAME = kilograms

  • ID = 42, LONGNAME = centimetres

If the keyFieldValuesSeparator is '%%', the resultFieldsPrefix is 'MENGENEINHEIT_' and the transferred keyValue value is '42%%20%%30', the following field is created:

QUANTITY_LONGNAME = centimeter%%gram%%kilogram

lookupIndex

String

The OpenSearch index with the lookup data.

Example: lookup-partner
In this example, the detailed data of the partner.

lookupKeyField

String

Example: ID.raw
This field is used to search for the value of the 'keyField' (see above). (Hint: DB Primary Key)

resultFieldsPrefix

String

The fields to be transferred from the lookup index must be provided with a unique prefix so that there is no conflict with existing fields.

Example: partner_

resultFieldsIncluded

String Array

If not all fields from the lookup index are to be included, the fields to be included can be specified here.

Example: [ "FIRST_NAME", "LAST_NAME" ]

resultFieldsExcluded

String Array

If almost all fields are to be included, the fields to be excluded can be specified here.

Example: [ "ID", "UPDATED" ]

Logging of replications

The runs of the replication jobs can be logged and displayed, for example, via the automatically created monitor "Replication Jobs Monitor". Similarly, runs of the tail sync can also be logged and viewed in the monitor "Tail Sync Logs Monitor". These monitors are created at startup if they are not available. They cannot be permanently deleted. The indices used bpc-replicationjobs-log and bpc-tailsync-log are also created automatically if a run needs to be logged.

Logging can be switched on and off globally and for each replication job (see corresponding settings above). Logging is activated globally for both replications and tail syncs by default, but deactivated in the individual replication jobs. Therefore, no logging is performed by default.

The log entries for replication runs are written in bulk to the index mentioned above after some time. This has little impact on the overall performance of the system. However, a lot of entries can be generated in a very short time (millions in a few hours), so perhaps do not run the cleanup too often and keep an eye on the number of documents it contains: Settings → Core Services → Indices

Since tail syncs take place much less frequently, the bpc-tailsync-log index should not grow so quickly during tail sync logging, but it is still regularly cleaned of old entries (see configuration above).


Keywords: