Replication

This service replicates data from an RDBMS to OpenSearch.

Impact on system performance

Replication has a direct impact on system performance. In particular, the number of threads (Replication → Settings → Replication_Threads) and the interval (replicationInterval) play a role. The more jobs run in a short time, the higher the CPU load. This applies in particular if a lot of data is replicated.

Although the replicationBlockSize has a positive effect on the replication duration, it also has a direct influence on the memory requirements of Apache Karaf.

Preparation for data record updates

A few prerequisites should be created before setting up the replications. Replication requires a column with exclusively incrementing timestamps for each table (log as well as child log), as this timestamp is used to subsequently find new/updated data records. If different processes, servers, … log into the DB in parallel and therefore possibly with a time delay (even if only milliseconds), it can happen that individual data records are not displayed in the BPC because they are committed with a timestamp older than the latest replicated one.

It is therefore recommended:

log not the server time of the logging server, but that of the DB server, as different process servers may have slightly different times
Ensure that this column is actually rewritten with every insert or update

Both can be done simply via an additional (hidden) column with default value and associated trigger. This has the advantage that nothing changes at all for the logging process)

Adding a technical TIMESTAMP column for replication

Oracle

For PM log and child log (without prefix) under Oracle, the procedure for creating then looks like this, for example:

--LOG:
--Add a column (with 6 decimal places of precision); set the default value for inserts to the current time in UTC (this eliminates the need for adjustments in the logging process)
ALTER TABLE LOG ADD (DB_UPDATE_TS TIMESTAMP DEFAULT SYSTIMESTAMP AT TIME ZONE 'UTC' NOT NULL);

--“Redistribute” existing log entries; otherwise, replication won't handle it well:
--Either put them all in one block (!not recommended for tables with more than 1 million entries!):
UPDATE LOG SET DB_UPDATE_TS = TIMESTAMP;
Commit;

  --or for large datasets and live systems that also need to perform operations on the tables, as an anonymous PL/SQL block with minimal undo tablespace requirements:
  --In this case, an index should also be created on the timestamp column to avoid full table scans!
    declare begin
      FOR counter IN 0 .. 3650 LOOP
        --dbms_output.put_line(to_char(to_date('2010-01-01', 'YYYY-MM-DD') + counter, 'YYYY-MM-DD') || ' - ' || to_char(to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter, 'YYYY-MM-DD'));
        --LOG:
        update log set DB_UPDATE_TS = TIMESTAMP where TIMESTAMP between to_date('2010-01-01', 'YYYY-MM-DD') + counter and to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter;
        commit;
        --CHILDLOG:
        update childlog set DB_UPDATE_TS = TIMESTAMP where TIMESTAMP between to_date('2010-01-01', 'YYYY-MM-DD') + counter and to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter;
        commit;
      END LOOP;
    end;

--Create a trigger that resets the column with every update (this eliminates the need for adjustments to the logging process)
CREATE OR REPLACE
TRIGGER LOG_DB_UPDATE_TS
BEFORE UPDATE ON LOG
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
DECLARE
BEGIN
:NEW.DB_UPDATE_TS := SYSTIMESTAMP AT TIME ZONE 'UTC';
END;
/

--Create Index:
CREATE INDEX IDX_LOG_DBLU ON LOG(DB_UPDATE_TS) COMPUTE STATISTICS;


--Here's the whole thing again for Childlog:
ALTER TABLE CHILDLOG ADD (DB_UPDATE_TS TIMESTAMP DEFAULT SYSTIMESTAMP AT TIME ZONE 'UTC' NOT NULL);

--See the PL/SQL block above, if applicable!
UPDATE CHILDLOG SET DB_UPDATE_TS = TIMESTAMP;
commit;

CREATE OR REPLACE
TRIGGER CHILDLOG_DB_UPDATE_TS
BEFORE UPDATE ON CHILDLOG
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
DECLARE
BEGIN
:NEW.DB_UPDATE_TS := SYSTIMESTAMP AT TIME ZONE 'UTC';
END;
/

CREATE INDEX IDX_CHILDLOG_DBLU ON CHILDLOG(DB_UPDATE_TS) COMPUTE STATISTICS;
--Finished

MSSQL

TIMESTAMP should be drawn in UTC, as otherwise there will be a blind spot between 2 and 3 o’clock during the winter time changeover! The code does not yet take this into account. use SYSUTCDATETIME instead of SYSDATETIME - but check the behavior in the BPC.

The trigger for childlog is missing. Normally, however, this is not required as the entries are only ever added and not updated.

/* Add columns. Use DATETIME2 for higher precision than DATETIME. Since MSSQL does not support millisecond precision, use the SYSDATETIME() function instead of current_timestamp; this function provides nanosecond precision. */
ALTER TABLE [LOG] ADD DB_UPDATE_TS DATETIME2 DEFAULT SYSDATETIME() NOT NULL;
GO

ALTER TABLE [CHILDLOG] ADD DB_UPDATE_TS DATETIME2 DEFAULT SYSDATETIME() NOT NULL;
GO

/* Initialize columns */
BEGIN
    UPDATE [LOG] SET DB_UPDATE_TS = [timestamp];
END
GO

BEGIN
    UPDATE [CHILDLOG] SET DB_UPDATE_TS = [timestamp];
END
GO

/* Trigger */
CREATE TRIGGER LOG_DB_UPDATE_TS
  ON [LOG]
  AFTER UPDATE
  AS
BEGIN
    IF NOT UPDATE(DB_UPDATE_TS)
    BEGIN
        UPDATE t
            SET t.DB_UPDATE_TS = SYSDATETIME()
            FROM [LOG] AS t
            INNER JOIN inserted AS i
            ON t.PROCESSID = i.PROCESSID;
    END
END
GO

/* Indices */
CREATE INDEX IDX_LOG_DBLU ON LOG (DB_UPDATE_TS);
CREATE INDEX IDX_CHILDLOG_DBLU ON CHILDLOG (DB_UPDATE_TS);

MySQL

and for MySQL quite simply:

/* Adds a sufficiently precise timestamp managed by the database to the table */
ALTER TABLE LOG ADD DB_UPDATE_TS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6)
ALTER TABLE CHILDLOG ADD DB_UPDATE_TS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6)

PostgreSQL

and for PostgreSQL also via trigger:

ALTER TABLE log ADD db_update_ts timestamp NOT null DEFAULT (timezone('UTC', now()));
ALTER TABLE childlog ADD db_update_ts timestamp NOT null DEFAULT (timezone('UTC', now()));

CREATE OR REPLACE FUNCTION update_db_update_ts_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.db_update_ts = timezone('UTC', now());
    RETURN NEW;
END;
$$ language 'plpgsql';

CREATE TRIGGER update_log_db_update_ts BEFORE INSERT OR UPDATE ON log FOR EACH ROW EXECUTE PROCEDURE update_db_update_ts_column();
CREATE TRIGGER update_childlog_db_update_ts BEFORE INSERT OR UPDATE ON childlog FOR EACH ROW EXECUTE PROCEDURE update_db_update_ts_column();

CREATE INDEX idx_log_dblu ON log (db_update_ts);
CREATE INDEX idx_childlog_dblu ON childlog (db_update_ts);

Configuration

Data sources

Data sources are connections to individual databases. The individual replication jobs then use/reference these. These are set up under Backend Connections of type "data_source" and referenced using the component ID.

Interface

A separate interface is available for setting up the individual replication jobs under Settings → Replication → Components → Editor. Entries can be created, deleted, duplicated and also activated/deactivated individually via the interface.

Configuration parameters of the Replication module

The various parameters and associated functions are described below. These can be found under BPC Administration → Replication → General.

Unresolved include directive in modules/core/pages/admin/modules/replication.adoc - include::partial$generated/replication/module-settings.adoc[]

Configuration parameters of a replication job

The various Parameters and associated functions are described below. These can be found under BPC Administration → Replication → Components. It is recommended to use the specialized interface: BPC Administration → Replication → Editor

Unresolved include directive in modules/core/pages/admin/modules/replication.adoc - include::partial$generated/replication/instance-settings.adoc[]

Basic settings

Setting (Key) Data type Description

Setting (Key)	Data type	Description
Replication_Enabled (`replicationEnabled`)	boolean	Activate/deactivate the replication job
Replication_StartDate (`replicationStartDate`)	String	Replication of data records that are newer than this point in time. Date format "yyyy-MM-dd' 'HH:mm:ss.SSS" or as a relative value in the format: `1 second\|minute\|hour\|day\|week\|month\|year ago n seconds\|minutes\|hours\|days\|weeks\|months\|years ago` Examples `1970-01-01 00:00:00.000 1 week ago 6 months ago 42 days ago`

Replication_Enabled
(replicationEnabled)

boolean

Activate/deactivate the replication job

Replication_StartDate
(replicationStartDate)

String

Replication of data records that are newer than this point in time.
Date format "yyyy-MM-dd' 'HH:mm:ss.SSS"

as a relative value in the format:

1 second|minute|hour|day|week|month|year ago
n seconds|minutes|hours|days|weeks|months|years ago

Examples

1970-01-01 00:00:00.000
1 week ago
6 months ago
42 days ago

Source / Source

Setting (Key) Data type Description

Source_DataSource
(rdmsDataSourceName)

String

Data source to be used (ID of the Backend Connections of type data_source)

Source_Table
(sourceTable)

String

Table name of the source or CTE if optional. Name of the CTE if the optional 'Source_CommonTableExpressionQuery' is used.

Source_Timezone
(sourceTimeZone)

String

Time zone of the date fields used in the source database table (used internally TimeZone.getTimeZone). Is only applied to the actual data and not to the 'lastUpdateColumn' column.

Examples

UTC
GMT+1
GMT-8
America/Los_Angeles
Europe/Berlin
Etc/GMT
CET

Source_IdColumns
(idColumns)

String

Columns for the creation of a unique key in OpenSearch. For example: "PROCESSID,CHILDID"

Source_LastUpdateColumn
(lastUpdateColumn)

String

This column is used to determine the age of the data set

It is strongly recommended to create an index for these columns in the source database.

Source_LastUpdateColumnTimezone
(lastUpdateColumnTimeZone)

String

Time zone, which is used in the source database table for the data in the lastUpdateColumn column.

The setting is only used in conjunction with [adjustUpperDateLimitInSeconds].

Examples

UTC
GMT+1
GMT-8
America/Los_Angeles
Europe/Berlin
Etc/GMT
CET

Source_QueryTimeoutInSeconds
(sourceQueryTimeoutInSeconds)

Integer

Defines how long the JDBC driver waits for a response from the DB. See also JDBC java.sql.Statement.setQueryTimeout()

Source_CommonTableExpressionQuery
(sourceCommonTableExpressionQuery)

String

Can be used instead of database views. May only contain the SELECT statement of a Common Table Expression (CTE). The WITH $sourceTable$ AS ( $sourceCommonTableExpressionQuery$ ) $bpcQuery$; is generated so that it matches the queries made on it by the BPC. If this option is used, then 'Source_Table' (sourceTable) is used as the name of the CTE. Incidentally, this can match the name of an existing database table.

Target

Setting (Key) Data type Description

Setting (Key)	Data type	Description
Target_Index (`targetIndex`)	String	Target index in OpenSearch. If not yet available, the index is created automatically.
Target_CaseSensitivityOfFields (`targetIndexCaseSensitivityOfFields`)	String	Specifies how the fields are to be created in OpenSearch (case-sensitive). `asSource` = are created exactly as they are returned by the database `lowerCase` = column names are converted to lower case `upperCase` = column names are converted to upper case
Target_IndexCreationSettings (`targetIndexCreationSettings`)	JSON	Assign different settings to the target index during creation. If this field is empty, then the Core Services settings → Core_IndexCreationSettings are used as usual. If it is set, then only these are used. The settings of Core_IndexCreationSettings must then be used as a basis by copy&paste. Example with added "Index Sorting" `{ "number_of_shards": "5", "number_of_replicas": "1", "index": { "sort.field": "LASTUPDATE", "sort.order": "desc" }, "analysis": { "normalizer": { "lowercaseNormalizer": { "filter": [ "lowercase" ], "char_filter": [], "type": "custom" } } } }` If "Index Sorting" is to be used, then OpenSearch mappings must also be created for the specified sorting fields (LASTUPDATE in the example) (see Target_IndexMappings).
Target_IndexMappings (`targetIndexMappings`)	JSON	Assign a mapping to the target index when it is created. Should only be necessary in certain cases. The mapping for the "Index Sorting" example from above. `{ "properties": { "LASTUPDATE": { "type": "date" } } }`
Target_IndexDynamicTemplates (`targetIndexDynamicTemplates`)	JSON	Assign a customized mapping to the target index. If this is set, then the global (see Core setting → Core_IndexDynamicTemplates) is not used. In the Elasticsearch documentation (reference to the OpenSearch documentation) there is more information about the possibilities of Dynamic Templates. In the following example, all fields that OpenSearch recognizes as text fields (strings) are provided with a mapping ("all_textfields") where the content is not analyzed (saves storage space and the data can still be displayed). Plus an exception ("special case"): Our standard mapping is used for all text fields that have the name postfix 'name'. Example `[ { "spezialfall": { "match_mapping_type": "string", "match": "name", "mapping": { "type": "text", "fields": { "lowercase": { "normalizer": "lowercaseNormalizer", "type": "keyword" }, "raw": { "type": "keyword" } } } } }, { "alle_textfelder": { "match_mapping_type": "string", "match": "", "mapping": { "type": "keyword", "analyzer": false } } } ]` To specify the OpenSearch type for a database field. It can happen from time to time that OpenSearch makes a mistake with the mapping and uses an inappropriate type. Concrete example: The Oracle column with the name 'NUMBER' of the data type 'NUMBER(10,2)' is created in the OpenSearch mapping as type 'long' instead of 'float'. This can be corrected with the example below. Example `[ { "ZAHL_long_als_float": { "match_mapping_type": "long", "match": "ZAHL", "mapping": { "type": "float" } } } ]`

Target_Index
(targetIndex)

String

Target index in OpenSearch.
If not yet available, the index is created automatically.

Target_CaseSensitivityOfFields
(targetIndexCaseSensitivityOfFields)

String

Specifies how the fields are to be created in OpenSearch (case-sensitive).

asSource = are created exactly as they are returned by the database
lowerCase = column names are converted to lower case
upperCase = column names are converted to upper case

Target_IndexCreationSettings
(targetIndexCreationSettings)

JSON

Assign different settings to the target index during creation.

If this field is empty, then the Core Services settings → Core_IndexCreationSettings are used as usual.
If it is set, then only these are used. The settings of Core_IndexCreationSettings must then be used as a basis by copy&paste.

Example with added "Index Sorting"

{
   "number_of_shards": "5",
   "number_of_replicas": "1",
   "index": {
      "sort.field": "LASTUPDATE",
      "sort.order": "desc"
   },
   "analysis": {
      "normalizer": {
         "lowercaseNormalizer": {
            "filter": [
               "lowercase"
            ],
            "char_filter": [],
            "type": "custom"
         }
      }
   }
}

If "Index Sorting" is to be used, then OpenSearch mappings must also be created for the specified sorting fields (LASTUPDATE in the example) (see Target_IndexMappings).

Target_IndexMappings
(targetIndexMappings)

JSON

Assign a mapping to the target index when it is created. Should only be necessary in certain cases.

The mapping for the "Index Sorting" example from above.

{
   "properties": {
      "LASTUPDATE": {
         "type": "date"
      }
   }
}

Target_IndexDynamicTemplates
(targetIndexDynamicTemplates)

JSON

Assign a customized mapping to the target index. If this is set, then the global (see Core setting → Core_IndexDynamicTemplates) is not used.

In the Elasticsearch documentation (reference to the OpenSearch documentation) there is more information about the possibilities of Dynamic Templates.

In the following example, all fields that OpenSearch recognizes as text fields (strings) are provided with a mapping ("all_textfields") where the content is not analyzed (saves storage space and the data can still be displayed). Plus an exception ("special case"): Our standard mapping is used for all text fields that have the name postfix 'name'.

Example

[
  {
    "spezialfall": {
      "match_mapping_type": "string",
      "match": "*name",
      "mapping": {
        "type": "text",
        "fields": {
          "lowercase": {
            "normalizer": "lowercaseNormalizer",
            "type": "keyword"
          },
          "raw": {
            "type": "keyword"
          }
        }
      }
    }
  },
  {
    "alle_textfelder": {
      "match_mapping_type": "string",
      "match": "*",
      "mapping": {
        "type": "keyword",
        "analyzer": false
      }
    }
  }
]

To specify the OpenSearch type for a database field. It can happen from time to time that OpenSearch makes a mistake with the mapping and uses an inappropriate type. Concrete example: The Oracle column with the name 'NUMBER' of the data type 'NUMBER(10,2)' is created in the OpenSearch mapping as type 'long' instead of 'float'. This can be corrected with the example below.

Example

[
  {
    "ZAHL_long_als_float": {
      "match_mapping_type": "long",
      "match": "ZAHL",
      "mapping": {
        "type": "float"
      }
    }
  }
]

Advanced Settings / Advanced

Setting (Key) Data type Description

Replication_RestartWhereLeftOff
(restartReplicationWhereLeftOff)

boolean

When the server is started or a job is changed, the replication jobs are restarted and start replicating again from the beginning (see replicationStartDate). If this setting is set to 'true', replication restarts at the date of the most recent data record that was replicated. These timestamps are stored as metadata in the OpenSearch index.

Replication_Delay
(replicationDelay)

Integer

Delay in seconds after which replication is started

Replication_Interval
(replicationInterval)

Integer

Interval in seconds for replication

Replication_BlockDayRange
(replicationBlockDayRange)

Integer

Number of days (block) that are to be processed for each job run. Do not set this value too high, as this will cause an increased load on the source database and OpenSearch/Lucene will not be able to take a breather.

Example 10 days: The job has just arrived at 10.03.2015, then the data records are replicated from 10.03.2015 to 20.03.2015 and in the following run from 20.03.2015 to 30.03.2015.

Replication_BlockSize
(replicationBlockSize)

Integer

Block size for the transfer from DB to OpenSearch. Number of database records that the JDBC driver reads as a block and holds in memory.

It is currently becoming apparent that a larger block size is better. See also: Replication duration. But please do not overdo it, otherwise OutOfMemory exceptions will occur. A block size of 2500 is already set too high for some database tables.

Replication_SyncFiles
(replicationSyncFiles)

boolean

Synchronization of BLOBs

Please note that subsequent activation is currently not possible.

Replication_UnzipSyncedFiles
(replicationUnzipSyncedFiles)

boolean

Unzips synchronized file contents automatically.

Only used if replicationSyncFiles is active.

Replication_AdjustUpperDateLimitInSeconds
(adjustUpperDateLimitInSeconds)

Integer

Avoids a database problem in which record updates - which are initiated by a database trigger - are written too late and therefore cannot be taken into account during the replication run (we are talking about 1-3 seconds here).

With this option, the records are selected with an update timestamp minus the specified value. In the default setting of 0, this option is deactivated and a timestamp in the future is selected as before (replicationBlockDayRange).

Example 3 seconds: The records are selected from the timestamp of the last replicated record up to the current timestamp of the database minus 3 seconds. Of course, the most up-to-date data is not always available in the BPC, but hopefully it will be in a consistent state with the database.

Replication_VamOrganizationId
(vamOrganizationId)

String

experimental

If not empty, then special features are used for the replication of data management data. Only the data of the entered organizationId (as in data management configuration) is replicated

Replication_LoggingEnabled
(replicationLoggingEnabled)

boolean

The logging of replication runs per replication job can be activated and deactivated here.

Shadow copies

During a shadow copy, all documents (that are after the specified 'replicationStartDate') of the OpenSearch index (see 'targetIndex') are copied to a new index at the specified time. At the end, the alias is changed to the new index and the previous index is deleted. This can be done once a week, for example, to get back to a lean index.

Additional note: The documents marked as deleted in OpenSearch (either deleted manually or by the tail sync) are of course not copied to the new index. Attention: The associated replication run is suspended until the shadow copy has been created.

Setting (Key) Data type Description

Setting (Key)	Data type	Description
ShadowCopy_Enabled (`replicationShadowCopyEnabled`)	boolean	Enable creation of shadow copies (shadow copy).
ShadowCopy_CronPattern (`replicationShadowCopyCronPattern`)	String	Cron-like pattern (according to Quartz-Scheduler syntax) for defining the time when the shadow copies are to be created. Examples `0 15 16 ? * SUN = Jeden Sonntag um 16:15 Uhr 0 /30 * * ? = Alle 30 Minuten 30 59 11 ? * 1,2,3,4,5 = Am Montag, Dienstag, Mittwoch, Donnerstag und Freitag um 11:59:30 Uhr` Further examples and documentation can be found on the Quartz-Scheduler website.
ShadowCopy_KeepCopiesCount (`replicationShadowCopyKeepCopiesCount`)	Integer	Number of shadow copies to be kept. `0` = no shadow copy is kept `3` = 3 shadow copies are kept (these are in 'Close' status to save resources)

ShadowCopy_Enabled
(replicationShadowCopyEnabled)

boolean

Enable creation of shadow copies (shadow copy).

ShadowCopy_CronPattern
(replicationShadowCopyCronPattern)

String

Cron-like pattern (according to Quartz-Scheduler syntax) for defining the time when the shadow copies are to be created.

Examples

0 15 16 ? * SUN = Jeden Sonntag um 16:15 Uhr
0 */30 * * * ? = Alle 30 Minuten
30 59 11 ? * 1,2,3,4,5 = Am Montag, Dienstag, Mittwoch, Donnerstag und Freitag um 11:59:30 Uhr

Further examples and documentation can be found on the Quartz-Scheduler website.

ShadowCopy_KeepCopiesCount
(replicationShadowCopyKeepCopiesCount)

Integer

Number of shadow copies to be kept.

0 = no shadow copy is kept

3 = 3 shadow copies are kept (these are in 'Close' status to save resources)

Tail Sync

The 'Tail Sync' works on the current index (see 'targetIndex') and synchronizes older data (new/changed/deleted database records) with OpenSearch so that they match the database again. This can be carried out once every night, for example. Documents that are older than the replication start date (see 'replicationStartDate') are first deleted from the OpenSearch index. It then goes through the database in blocks (default 10-day steps) and compares them with OpenSearch. New/changed database records are transferred to the OpenSearch index and documents that no longer exist in the database are deleted in OpenSearch. The Tail Sync only works with the specified 'idColumns' and 'lastUpdateColumn' fields (Hint: a suitable DB index works wonders!)). The complete data is only read out for new/changed database records.

As an alternative or in addition to a time-controlled tail sync, this can also be started manually. For this purpose, there is a button under Replication → Jobs for each replication job that starts a tail sync. The settings replicationTailSyncEnabled and replicationTailSyncCronPattern have no effect on the manual start, all other settings are taken into account.

The sync does not run up to the current date (see [replicationTailSyncRelativeEndDate]), as these records will continue to be processed by our normal replication.

Setting (Key) Data type Description

TailSync_Enabled
(replicationTailSyncEnabled)

boolean

Activate the 'Tail Sync'

TailSync_CronPattern
(replicationTailSyncCronPattern)

String

Cron-like pattern (according to Quartz Scheduler syntax) to define the time when the Tail Sync should be started.

Examples

0 5 2 * * ? = Jede Nacht um 2:05 Uhr
0 35 21 ? * Sun = Jeden Sonntag um 21:35 Uhr
30 5 3 ? * 1,2,3,4,5 = Am Montag, Dienstag, Mittwoch, Donnerstag und Freitag um 3:05:30 Uhr

Further examples and documentation can be found on the Quartz-Scheduler website.

TailSync_BlockSize
(replicationTailSyncBlockSize)

Integer

Block size of the database driver. Note the information on the option replicationBlockSize.

TailSync_RelativeStartDate
(replicationTailSyncRelativeStartDate)

String

The relative start date specified. Synchronization should then start from this date. Data that is before the replication start date will continue to be deleted. This date is only used if it is after the regular start date and before the end date.

1 second|minute|hour|day|week|month|year ago
n seconds|minutes|hours|days|weeks|months|years ago

This option should only be used in special cases. Inconsistencies may occur if records are deleted from the database that lie before this relative start date. These then remain in the OpenSearch index although they are no longer present in the database.

TailSync_RelativeEndDate
(replicationTailSyncRelativeEndDate)

String

The relative end date specified. Synchronization should then continue up to this point in time and no further.

1 second|minute|hour|day|week|month|year ago
n seconds|minutes|hours|days|weeks|months|years ago

TailSync_BlockDayRange
(replicationTailSyncBlockDayRange)

Integer

Number of days (block) in which the data should be processed. See also replicationBlockDayRange.

TailSync_RelativeDeleteOlderThanDate
(replicationTailSyncRelativeDeleteOlderThanDate)

String

The relative deletion date to be specified. All documents that are older are deleted.

If this option is not set, all documents that are older than the start date of the replication are deleted (see replicationStartDate).

TailSync_LoggingEnabled
(replicationTailSyncLoggingEnabled)

boolean

The relative deletion date to be specified. Specifies whether the tail sync runs are to be logged in the bpc-tailsync-log index for this replication job.

The global setting replicationJobsTailsyncLogEnabled must also be activated.

Consistency Check

A simple consistency check is carried out after the replication runs. The number of documents in the source and target are compared. This is done in the period 'replicationStartDate' and the last date which is in the target (OpenSearch).

If there is a lot of data (tens of millions of records), this can lead to an increased load on the system and an initial replication can be extremely slowed down. Only activate the consistency check if you are sure that the data will be fully replicated.

Setting (Key) Data type Description

Setting (Key)	Data type	Description
ConsistencyCheck_Frequency (`replicationConsistencyCheckFrequency`)	Integer	The frequency with which the consistency check is performed. Examples: `0` = is never performed `1` = is performed after every replication run `10` = is performed after every 10th replication run

ConsistencyCheck_Frequency
(replicationConsistencyCheckFrequency)

Integer

The frequency with which the consistency check is performed.

Examples:

0 = is never performed
1 = is performed after every replication run
10 = is performed after every 10th replication run

Lookup Joins

Can be used to enrich documents to be replicated with additional data. If, for example only a partner ID is contained in the data to be replicated and the name of the partner etc. is still required in the monitor.

Denormalization is used under OpenSearch, i.e. this data is also transferred to the document and is available like all other data (high-performance search; aggregation, ….). Prerequisite: The lookup "tables" must be available as independent indexes and can, for example, be transferred from a DB table via an additional replication.

The lookup data can be updated manually ( BPC settings → Overview → Status → Replication → Jobs → Job → Synchronize lookup joins ) and automatically using our OpenSearch BPC plugin (os-bpc-plugin).

Several lookup tables can be referenced, the possible values of an ENTRY are described below. Structure: "join": [ { EINTRAG }, { EINTRAG }, … ]

For the value comparison to work, the column types in the database tables must be identical. In the case of string fields, "*.raw" should be used as the lookupKeyField, but without the ".raw"

Lookup Joins Configuration example

[
    {
        "keyField": "PARTNER",
        "lookupIndex": "lookup-partner",
        "lookupKeyField": "ID.raw",
        "resultFieldsPrefix": "partner_",
        "resultFieldsExcluded": [ "ID", "LASTUPDATE" ]
    },
    {
        "keyField": "MESSAGETYPE",
        "lookupIndex": "lookup-messagetype",
        "lookupKeyField": "ID.raw",
        "resultFieldsPrefix": "messagetype_",
        "resultFieldsExcluded": [ "ID", "LASTUPDATE" ]
    }
]

Field	Data type	Description
keyField	String	The key field in the table to be replicated. The data from the lookup index is transferred via the value of the field. Example: PARTNER_ID This contains the ID of the partner and the actual data of the partner that is to be transferred in the specified lookup index. (Hint: DB Foreign Key)
keyFieldValuesSeparator	String	If there are several values in the keyField that are separated by a separator, this separator can be specified here. The lookup join is then performed on the individual values. Example: The 'lookupIndex' contains the following data: ID = 20, LONGNAME = grams ID = 30, LONGNAME = kilograms ID = 42, LONGNAME = centimetres If the keyFieldValuesSeparator is '%%', the resultFieldsPrefix is 'MENGENEINHEIT_' and the transferred keyValue value is '42%%20%%30', the following field is created: QUANTITY_LONGNAME = centimeter%%gram%%kilogram
lookupIndex	String	The OpenSearch index with the lookup data. Example: lookup-partner In this example, the detailed data of the partner.
lookupKeyField	String	Example: ID.raw This field is used to search for the value of the 'keyField' (see above). (Hint: DB Primary Key)
resultFieldsPrefix	String	The fields to be transferred from the lookup index must be provided with a unique prefix so that there is no conflict with existing fields. Example: partner_
resultFieldsIncluded	String Array	If not all fields from the lookup index are to be included, the fields to be included can be specified here. Example: [ "FIRST_NAME", "LAST_NAME" ]
resultFieldsExcluded	String Array	If almost all fields are to be included, the fields to be excluded can be specified here. Example: [ "ID", "UPDATED" ]

Field

Data type

Description

keyField

String

The key field in the table to be replicated. The data from the lookup index is transferred via the value of the field.

Example: PARTNER_ID
This contains the ID of the partner and the actual data of the partner that is to be transferred in the specified lookup index. (Hint: DB Foreign Key)

keyFieldValuesSeparator

String

If there are several values in the keyField that are separated by a separator, this separator can be specified here. The lookup join is then performed on the individual values.

Example:

The 'lookupIndex' contains the following data:

ID = 20, LONGNAME = grams
ID = 30, LONGNAME = kilograms
ID = 42, LONGNAME = centimetres

If the keyFieldValuesSeparator is '%%', the resultFieldsPrefix is 'MENGENEINHEIT_' and the transferred keyValue value is '42%%20%%30', the following field is created:

QUANTITY_LONGNAME = centimeter%%gram%%kilogram

lookupIndex

String

The OpenSearch index with the lookup data.

Example: lookup-partner
In this example, the detailed data of the partner.

lookupKeyField

String

Example: ID.raw
This field is used to search for the value of the 'keyField' (see above). (Hint: DB Primary Key)

resultFieldsPrefix

String

The fields to be transferred from the lookup index must be provided with a unique prefix so that there is no conflict with existing fields.

Example: partner_

resultFieldsIncluded

String Array

If not all fields from the lookup index are to be included, the fields to be included can be specified here.

Example: [ "FIRST_NAME", "LAST_NAME" ]

resultFieldsExcluded

String Array

If almost all fields are to be included, the fields to be excluded can be specified here.

Example: [ "ID", "UPDATED" ]

Logging of replications

The runs of the replication jobs can be logged and displayed, for example, via the automatically created monitor "Replication Jobs Monitor". Similarly, runs of the tail sync can also be logged and viewed in the monitor "Tail Sync Logs Monitor". These monitors are created at startup if they are not available. They cannot be permanently deleted. The indices used bpc-replicationjobs-log and bpc-tailsync-log are also created automatically if a run needs to be logged.

Logging can be switched on and off globally and for each replication job (see corresponding settings above). Logging is activated globally for both replications and tail syncs by default, but deactivated in the individual replication jobs. Therefore, no logging is performed by default.

The log entries for replication runs are written in bulk to the index mentioned above after some time. This has little impact on the overall performance of the system. However, a lot of entries can be generated in a very short time (millions in a few hours), so perhaps do not run the cleanup too often and keep an eye on the number of documents it contains: Settings → Core Services → Indices

Since tail syncs take place much less frequently, the bpc-tailsync-log index should not grow so quickly during tail sync logging, but it is still regularly cleaned of old entries (see configuration above).

Keywords: