Replication
This service replicates data from an RDBMS to OpenSearch.
|
Impact on system performance Replication has a direct impact on system performance. In particular, the number of threads (Replication → Settings → Replication_Threads) and the interval (replicationInterval) play a role. The more jobs run in a short time, the higher the CPU load. This applies in particular if a lot of data is replicated. Although the replicationBlockSize has a positive effect on the replication duration, it also has a direct influence on the memory requirements of Apache Karaf. |
Preparation for data record updates
A few prerequisites should be created before setting up the replications. Replication requires a column with exclusively incrementing timestamps for each table (log as well as child log), as this timestamp is used to subsequently find new/updated data records. If different processes, servers, … log into the DB in parallel and therefore possibly with a time delay (even if only milliseconds), it can happen that individual data records are not displayed in the BPC because they are committed with a timestamp older than the latest replicated one.
It is therefore recommended:
-
log not the server time of the logging server, but that of the DB server, as different process servers may have slightly different times
-
Ensure that this column is actually rewritten with every insert or update
Both can be done simply via an additional (hidden) column with default value and associated trigger. This has the advantage that nothing changes at all for the logging process)
Adding a technical TIMESTAMP column for replication
Oracle
For PM log and child log (without prefix) under Oracle, the procedure for creating then looks like this, for example:
--LOG:
--Add a column (with 6 decimal places of precision); set the default value for inserts to the current time in UTC (this eliminates the need for adjustments in the logging process)
ALTER TABLE LOG ADD (DB_UPDATE_TS TIMESTAMP DEFAULT SYSTIMESTAMP AT TIME ZONE 'UTC' NOT NULL);
--“Redistribute” existing log entries; otherwise, replication won't handle it well:
--Either put them all in one block (!not recommended for tables with more than 1 million entries!):
UPDATE LOG SET DB_UPDATE_TS = TIMESTAMP;
Commit;
--or for large datasets and live systems that also need to perform operations on the tables, as an anonymous PL/SQL block with minimal undo tablespace requirements:
--In this case, an index should also be created on the timestamp column to avoid full table scans!
declare begin
FOR counter IN 0 .. 3650 LOOP
--dbms_output.put_line(to_char(to_date('2010-01-01', 'YYYY-MM-DD') + counter, 'YYYY-MM-DD') || ' - ' || to_char(to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter, 'YYYY-MM-DD'));
--LOG:
update log set DB_UPDATE_TS = TIMESTAMP where TIMESTAMP between to_date('2010-01-01', 'YYYY-MM-DD') + counter and to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter;
commit;
--CHILDLOG:
update childlog set DB_UPDATE_TS = TIMESTAMP where TIMESTAMP between to_date('2010-01-01', 'YYYY-MM-DD') + counter and to_date('2010-01-01', 'YYYY-MM-DD') + 1 + counter;
commit;
END LOOP;
end;
--Create a trigger that resets the column with every update (this eliminates the need for adjustments to the logging process)
CREATE OR REPLACE
TRIGGER LOG_DB_UPDATE_TS
BEFORE UPDATE ON LOG
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
DECLARE
BEGIN
:NEW.DB_UPDATE_TS := SYSTIMESTAMP AT TIME ZONE 'UTC';
END;
/
--Create Index:
CREATE INDEX IDX_LOG_DBLU ON LOG(DB_UPDATE_TS) COMPUTE STATISTICS;
--Here's the whole thing again for Childlog:
ALTER TABLE CHILDLOG ADD (DB_UPDATE_TS TIMESTAMP DEFAULT SYSTIMESTAMP AT TIME ZONE 'UTC' NOT NULL);
--See the PL/SQL block above, if applicable!
UPDATE CHILDLOG SET DB_UPDATE_TS = TIMESTAMP;
commit;
CREATE OR REPLACE
TRIGGER CHILDLOG_DB_UPDATE_TS
BEFORE UPDATE ON CHILDLOG
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
DECLARE
BEGIN
:NEW.DB_UPDATE_TS := SYSTIMESTAMP AT TIME ZONE 'UTC';
END;
/
CREATE INDEX IDX_CHILDLOG_DBLU ON CHILDLOG(DB_UPDATE_TS) COMPUTE STATISTICS;
--Finished
MSSQL
|
|
|
The trigger for childlog is missing. Normally, however, this is not required as the entries are only ever added and not updated. |
/* Add columns. Use DATETIME2 for higher precision than DATETIME. Since MSSQL does not support millisecond precision, use the SYSDATETIME() function instead of current_timestamp; this function provides nanosecond precision. */
ALTER TABLE [LOG] ADD DB_UPDATE_TS DATETIME2 DEFAULT SYSDATETIME() NOT NULL;
GO
ALTER TABLE [CHILDLOG] ADD DB_UPDATE_TS DATETIME2 DEFAULT SYSDATETIME() NOT NULL;
GO
/* Initialize columns */
BEGIN
UPDATE [LOG] SET DB_UPDATE_TS = [timestamp];
END
GO
BEGIN
UPDATE [CHILDLOG] SET DB_UPDATE_TS = [timestamp];
END
GO
/* Trigger */
CREATE TRIGGER LOG_DB_UPDATE_TS
ON [LOG]
AFTER UPDATE
AS
BEGIN
IF NOT UPDATE(DB_UPDATE_TS)
BEGIN
UPDATE t
SET t.DB_UPDATE_TS = SYSDATETIME()
FROM [LOG] AS t
INNER JOIN inserted AS i
ON t.PROCESSID = i.PROCESSID;
END
END
GO
/* Indices */
CREATE INDEX IDX_LOG_DBLU ON LOG (DB_UPDATE_TS);
CREATE INDEX IDX_CHILDLOG_DBLU ON CHILDLOG (DB_UPDATE_TS);
MySQL
and for MySQL quite simply:
/* Adds a sufficiently precise timestamp managed by the database to the table */
ALTER TABLE LOG ADD DB_UPDATE_TS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6)
ALTER TABLE CHILDLOG ADD DB_UPDATE_TS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6)
PostgreSQL
and for PostgreSQL also via trigger:
ALTER TABLE log ADD db_update_ts timestamp NOT null DEFAULT (timezone('UTC', now()));
ALTER TABLE childlog ADD db_update_ts timestamp NOT null DEFAULT (timezone('UTC', now()));
CREATE OR REPLACE FUNCTION update_db_update_ts_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.db_update_ts = timezone('UTC', now());
RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER update_log_db_update_ts BEFORE INSERT OR UPDATE ON log FOR EACH ROW EXECUTE PROCEDURE update_db_update_ts_column();
CREATE TRIGGER update_childlog_db_update_ts BEFORE INSERT OR UPDATE ON childlog FOR EACH ROW EXECUTE PROCEDURE update_db_update_ts_column();
CREATE INDEX idx_log_dblu ON log (db_update_ts);
CREATE INDEX idx_childlog_dblu ON childlog (db_update_ts);
Configuration
Data sources
Data sources are connections to individual databases. The individual replication jobs then use/reference these. These are set up under Backend Connections of type "data_source" and referenced using the component ID.
Interface
A separate interface is available for setting up the individual replication jobs under Settings → Replication → Components → Editor. Entries can be created, deleted, duplicated and also activated/deactivated individually via the interface.
Configuration parameters of the Replication module
The various parameters and associated functions are described below. These can be found under BPC Administration → Replication → General.
Module
General module settings
Name (ID) |
Description |
|---|---|
Icon |
Individually selectable icon that is displayed before the title. |
Source
Settings for the source system
Name (ID) |
Description |
|---|---|
Database Connection |
Database connection for accessing the data. This must first be created via Backend Connection of type Data Source. |
SELECT (CTE) |
If this value is set, this query is used as the source instead of the table name. May only contain the SELECT statement of a Common Table Expression (CTE). The WITH $sourceTable$ AS ( $sourceCommonTableExpressionQuery$ ) $bpcQuery$; is generated so that it matches the queries placed on it by the BPC. If this option is used, then 'Source_Table' (sourceTable) is used as the name of the CTE. Incidentally, this can match the name of an existing database table. |
Table |
Name of the table or view in the source database. If a SELECT (CTE) is specified, this is used as the name in the CTE. |
Time Zone |
If the date columns do not contain time zone information, they are interpreted with the specified time zone. |
Primary Key |
Columns for the creation of a unique key in OpenSearch. An incorrect configuration leads to data records being overwritten. |
Last Update |
The column must contain the time of the last change to the data record. |
Last Update Time Zone |
If the last update column does not contain time zone information, it is interpreted with the specified time zone. |
Timeout |
Defines how long the JDBC driver waits for a response from the DB. Specified in seconds. |
Target
Settings for the target system
Name (ID) |
Description |
|---|---|
Index |
Name of the index in which the data is to be stored. This is created automatically if required. |
Index Creation Settings |
Settings that differ from the default. The value is set as its "settings" value when creating/generating an OpenSearch index. It is also used during reindexing, as this creates a new index. |
Field Mappings |
Optional setting for the individual fields (also known as mapping). This can be used, for example, to define the data type of the field. |
Dynamic Field Templates |
Settings that differ from the standard. The value is set as "dynamic_templates" when creating/generating an OpenSearch index in field mappings ("mappings"). Is also used during reindexing, as this creates a new index. |
Adapt Field Names |
The capitalization of the fields created in OpenSearch can be changed here. The field names are formed from the column names of the database. |
Advanced Settings
Advanced settings
Name (ID) |
Description |
|---|---|
Enable |
Activates the execution of this replication job. |
Start Delay |
Delay of this replication (in seconds) after the initial start of replication execution. Can be used to speed up the start of the BPC or to initially prioritize other replication jobs. |
Interval |
Interval in seconds at which this replication job should be executed. This interval is not guaranteed if there are not enough threads available and too many replication jobs are running (possibly for too long). |
Start Date |
Only data that is newer than this point in time is replicated. |
Day Range per Run |
If data from the past is replicated, this controls the number of days that should be processed for each job run. A high value can lead to the source database and OpenSearch being heavily loaded. This may also block other replication jobs. |
Maximum Records |
Maximum number of data records that can be loaded at once from the source database and written to OpenSearch. This value directly influences the consumption of memory in Karaf, as the memory required to store all data records in the maximum size in the memory is reserved. However, a large value has a positive effect on the speed of replication. |
Replicate Binary Files |
Replicates columns of type BLOB. |
Decompress Binary Files |
If binary data is replicated and the option is activated, the system checks whether the data was compressed with GZip and decompresses it before saving. |
Restart Replication Where Left Off |
If this option is active, the replication job is continued at the point in the data where it last stopped when it is restarted or reactivated. Otherwise, replication begins at the configured start time. |
Adjust Date Limit |
Adjustment of the upper time limit (in seconds). Influences the upper date limit when selecting data by adding this value to the current time. Changing the value can lead to data records being replicated unnecessarily several times or changes only being replicated with a certain delay. |
Data Management Organization ID |
If set, special features are used for the replication of data management data. Specifically, only the data of the entered organizationId (as in data management configuration) is replicated |
Enable |
Activates logging for this replication job. |
Shadow Copy
A shadow copy copies all documents (created after the defined 'replicationStartDate') of the OpenSearch index (see 'targetIndex') into a new index at the specified time. In the end, the alias is switched to the new index and the previous index is deleted.
Name (ID) |
Description |
|---|---|
Activate creation of shadow copies. |
|
Cron Pattern |
A Cron-like pattern (using Quartz Scheduler syntax) to define when the shadow copies are to be created. (See <a target="_blank" href="https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html">Quartz CronTrigger Documentation</a>) |
Number of Copies |
Number of shadow copies to be kept. |
Tail Sync
The 'Tail Sync' works on the current index (see 'targetIndex') and synchronizes older data (new/changed/deleted database records) with OpenSearch so that they match the database again.
Name (ID) |
Description |
|---|---|
Enable |
Activates the Tail Sync function. This deletes old data records or subsequently synchronizes data records between the source and target. This is necessary if data records are deleted in the source. New or changed data records should be recognized via regular replication if the setting is correct. If the tail sync is not enabled it can still be started manually and the other tail sync settings will be used. |
Cron Pattern |
A Cron-like pattern (using Quartz Scheduler syntax) to define when the tail sync should be started. (See <a target="_blank" href="https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html">Quartz CronTrigger Documentation</a>) |
Maximum Records |
Maximum number of data records that can be loaded at once from the source database and written to OpenSearch. This value directly influences the consumption of memory in Karaf, as the memory required to store all data records in the maximum size in the memory is reserved. However, a large value has a positive effect on the speed of replication. |
Start Date |
The relative start date specified. Synchronization starts from this date. Data before the replication start date will continue to be deleted. This date is only used if it is after the regular start date and before the end date. |
End Date |
The relative end date specified. Synchronization takes place up to this point in time. The end should not overlap with the regular replication (interval + buffer). |
Day Range per Run |
This controls the number of days that should be processed in each run. A high value can lead to the source database and OpenSearch being heavily loaded. |
Delete Old Records |
The relative deletion date to be specified. All documents that are older are deleted. If this option is not set, it will not be executed. |
Enable Tail Sync logging |
If enabled, Tail Sync runs are logged in the `bpc-tailsync-log` index. |
Consistency Check
After replication runs, a simple consistency check is performed. The number of documents in the source and target are compared.
Name (ID) |
Description |
|---|---|
Consistency Check Frequency |
The frequency with which the consistency check is performed. The number of data records in the source database is compared with the number in OpenSearch. The check can severely affect the performance of the replication and should be switched off if historical data is replicated. |
Lookup Joins
These can be used to enrich replicated documents with additional data. For example, if the data to be replicated only contains a partner ID, the partner's name and other info can be added for monitoring.
Name (ID) |
Description |
|---|---|
Lookup Joins |
Can be used to automatically enrich logging documents with additional data, e.g. if the data to be logged only contains an ID and further associated data can be loaded via an existing index. |
Configuration parameters of a replication job
The various Parameters and associated functions are described below. These can be found under BPC Administration → Replication → Components. It is recommended to use the specialized interface: BPC Administration → Replication → Editor
Module
General module settings
Name (ID) |
Description |
|---|---|
Icon |
Individually selectable icon that is displayed before the title. |
Source
Settings for the source system
Name (ID) |
Description |
|---|---|
Database Connection |
Database connection for accessing the data. This must first be created via Backend Connection of type Data Source. |
SELECT (CTE) |
If this value is set, this query is used as the source instead of the table name. May only contain the SELECT statement of a Common Table Expression (CTE). The WITH $sourceTable$ AS ( $sourceCommonTableExpressionQuery$ ) $bpcQuery$; is generated so that it matches the queries placed on it by the BPC. If this option is used, then 'Source_Table' (sourceTable) is used as the name of the CTE. Incidentally, this can match the name of an existing database table. |
Table |
Name of the table or view in the source database. If a SELECT (CTE) is specified, this is used as the name in the CTE. |
Time Zone |
If the date columns do not contain time zone information, they are interpreted with the specified time zone. |
Primary Key |
Columns for the creation of a unique key in OpenSearch. An incorrect configuration leads to data records being overwritten. |
Last Update |
The column must contain the time of the last change to the data record. |
Last Update Time Zone |
If the last update column does not contain time zone information, it is interpreted with the specified time zone. |
Timeout |
Defines how long the JDBC driver waits for a response from the DB. Specified in seconds. |
Target
Settings for the target system
Name (ID) |
Description |
|---|---|
Index |
Name of the index in which the data is to be stored. This is created automatically if required. |
Index Creation Settings |
Settings that differ from the default. The value is set as its "settings" value when creating/generating an OpenSearch index. It is also used during reindexing, as this creates a new index. |
Field Mappings |
Optional setting for the individual fields (also known as mapping). This can be used, for example, to define the data type of the field. |
Dynamic Field Templates |
Settings that differ from the standard. The value is set as "dynamic_templates" when creating/generating an OpenSearch index in field mappings ("mappings"). Is also used during reindexing, as this creates a new index. |
Adapt Field Names |
The capitalization of the fields created in OpenSearch can be changed here. The field names are formed from the column names of the database. |
Advanced Settings
Advanced settings
Name (ID) |
Description |
|---|---|
Enable |
Activates the execution of this replication job. |
Start Delay |
Delay of this replication (in seconds) after the initial start of replication execution. Can be used to speed up the start of the BPC or to initially prioritize other replication jobs. |
Interval |
Interval in seconds at which this replication job should be executed. This interval is not guaranteed if there are not enough threads available and too many replication jobs are running (possibly for too long). |
Start Date |
Only data that is newer than this point in time is replicated. |
Day Range per Run |
If data from the past is replicated, this controls the number of days that should be processed for each job run. A high value can lead to the source database and OpenSearch being heavily loaded. This may also block other replication jobs. |
Maximum Records |
Maximum number of data records that can be loaded at once from the source database and written to OpenSearch. This value directly influences the consumption of memory in Karaf, as the memory required to store all data records in the maximum size in the memory is reserved. However, a large value has a positive effect on the speed of replication. |
Replicate Binary Files |
Replicates columns of type BLOB. |
Decompress Binary Files |
If binary data is replicated and the option is activated, the system checks whether the data was compressed with GZip and decompresses it before saving. |
Restart Replication Where Left Off |
If this option is active, the replication job is continued at the point in the data where it last stopped when it is restarted or reactivated. Otherwise, replication begins at the configured start time. |
Adjust Date Limit |
Adjustment of the upper time limit (in seconds). Influences the upper date limit when selecting data by adding this value to the current time. Changing the value can lead to data records being replicated unnecessarily several times or changes only being replicated with a certain delay. |
Data Management Organization ID |
If set, special features are used for the replication of data management data. Specifically, only the data of the entered organizationId (as in data management configuration) is replicated |
Enable |
Activates logging for this replication job. |
Shadow Copy
A shadow copy copies all documents (created after the defined 'replicationStartDate') of the OpenSearch index (see 'targetIndex') into a new index at the specified time. In the end, the alias is switched to the new index and the previous index is deleted.
Name (ID) |
Description |
|---|---|
Activate creation of shadow copies. |
|
Cron Pattern |
A Cron-like pattern (using Quartz Scheduler syntax) to define when the shadow copies are to be created. (See <a target="_blank" href="https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html">Quartz CronTrigger Documentation</a>) |
Number of Copies |
Number of shadow copies to be kept. |
Tail Sync
The 'Tail Sync' works on the current index (see 'targetIndex') and synchronizes older data (new/changed/deleted database records) with OpenSearch so that they match the database again.
Name (ID) |
Description |
|---|---|
Enable |
Activates the Tail Sync function. This deletes old data records or subsequently synchronizes data records between the source and target. This is necessary if data records are deleted in the source. New or changed data records should be recognized via regular replication if the setting is correct. If the tail sync is not enabled it can still be started manually and the other tail sync settings will be used. |
Cron Pattern |
A Cron-like pattern (using Quartz Scheduler syntax) to define when the tail sync should be started. (See <a target="_blank" href="https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html">Quartz CronTrigger Documentation</a>) |
Maximum Records |
Maximum number of data records that can be loaded at once from the source database and written to OpenSearch. This value directly influences the consumption of memory in Karaf, as the memory required to store all data records in the maximum size in the memory is reserved. However, a large value has a positive effect on the speed of replication. |
Start Date |
The relative start date specified. Synchronization starts from this date. Data before the replication start date will continue to be deleted. This date is only used if it is after the regular start date and before the end date. |
End Date |
The relative end date specified. Synchronization takes place up to this point in time. The end should not overlap with the regular replication (interval + buffer). |
Day Range per Run |
This controls the number of days that should be processed in each run. A high value can lead to the source database and OpenSearch being heavily loaded. |
Delete Old Records |
The relative deletion date to be specified. All documents that are older are deleted. If this option is not set, it will not be executed. |
Enable Tail Sync logging |
If enabled, Tail Sync runs are logged in the `bpc-tailsync-log` index. |
Consistency Check
After replication runs, a simple consistency check is performed. The number of documents in the source and target are compared.
Name (ID) |
Description |
|---|---|
Consistency Check Frequency |
The frequency with which the consistency check is performed. The number of data records in the source database is compared with the number in OpenSearch. The check can severely affect the performance of the replication and should be switched off if historical data is replicated. |
Lookup Joins
These can be used to enrich replicated documents with additional data. For example, if the data to be replicated only contains a partner ID, the partner's name and other info can be added for monitoring.
Name (ID) |
Description |
|---|---|
Lookup Joins |
Can be used to automatically enrich logging documents with additional data, e.g. if the data to be logged only contains an ID and further associated data can be loaded via an existing index. |
Basic settings
| Setting (Key) | Data type | Description |
|---|---|---|
Replication_Enabled |
boolean |
Activate/deactivate the replication job |
Replication_StartDate |
String |
Replication of data records that are newer than this point in time. or as a relative value in the format:
Examples
|
Source / Source
| Setting (Key) | Data type | Description | ||
|---|---|---|---|---|
Source_DataSource |
String |
Data source to be used (ID of the Backend Connections of type |
||
Source_Table |
String |
Table name of the source or CTE if optional. Name of the CTE if the optional 'Source_CommonTableExpressionQuery' is used. |
||
Source_Timezone |
String |
Time zone of the date fields used in the source database table (used internally TimeZone.getTimeZone). Is only applied to the actual data and not to the 'lastUpdateColumn' column. Examples
|
||
Source_IdColumns |
String |
Columns for the creation of a unique key in OpenSearch.
For example: |
||
Source_LastUpdateColumn |
String |
This column is used to determine the age of the data set
|
||
Source_LastUpdateColumnTimezone |
String |
Time zone, which is used in the source database table for the data in the
Examples
|
||
Source_QueryTimeoutInSeconds |
Integer |
Defines how long the JDBC driver waits for a response from the DB. See also JDBC java.sql.Statement.setQueryTimeout() |
||
Source_CommonTableExpressionQuery |
String |
Can be used instead of database views.
May only contain the SELECT statement of a Common Table Expression (CTE).
The |
Target
| Setting (Key) | Data type | Description |
|---|---|---|
Target_Index |
String |
Target index in OpenSearch. |
Target_CaseSensitivityOfFields |
String |
Specifies how the fields are to be created in OpenSearch (case-sensitive).
|
Target_IndexCreationSettings |
JSON |
Assign different settings to the target index during creation.
Example with added "Index Sorting"
If "Index Sorting" is to be used, then OpenSearch mappings must also be created for the specified sorting fields (LASTUPDATE in the example) (see Target_IndexMappings). |
Target_IndexMappings |
JSON |
Assign a mapping to the target index when it is created. Should only be necessary in certain cases. The mapping for the "Index Sorting" example from above.
|
Target_IndexDynamicTemplates |
JSON |
Assign a customized mapping to the target index. If this is set, then the global (see Core setting → Core_IndexDynamicTemplates) is not used. In the Elasticsearch documentation (reference to the OpenSearch documentation) there is more information about the possibilities of Dynamic Templates. In the following example, all fields that OpenSearch recognizes as text fields (strings) are provided with a mapping ("all_textfields") where the content is not analyzed (saves storage space and the data can still be displayed). Plus an exception ("special case"): Our standard mapping is used for all text fields that have the name postfix 'name'. Example
To specify the OpenSearch type for a database field. It can happen from time to time that OpenSearch makes a mistake with the mapping and uses an inappropriate type. Concrete example: The Oracle column with the name 'NUMBER' of the data type 'NUMBER(10,2)' is created in the OpenSearch mapping as type 'long' instead of 'float'. This can be corrected with the example below. Example
|
Advanced Settings / Advanced
Shadow copies
During a shadow copy, all documents (that are after the specified 'replicationStartDate') of the OpenSearch index (see 'targetIndex') are copied to a new index at the specified time. At the end, the alias is changed to the new index and the previous index is deleted. This can be done once a week, for example, to get back to a lean index.
Additional note: The documents marked as deleted in OpenSearch (either deleted manually or by the tail sync) are of course not copied to the new index. Attention: The associated replication run is suspended until the shadow copy has been created.
| Setting (Key) | Data type | Description |
|---|---|---|
ShadowCopy_Enabled |
boolean |
Enable creation of shadow copies (shadow copy). |
ShadowCopy_CronPattern |
String |
Cron-like pattern (according to Quartz-Scheduler syntax) for defining the time when the shadow copies are to be created. Examples
Further examples and documentation can be found on the Quartz-Scheduler website. |
ShadowCopy_KeepCopiesCount |
Integer |
Number of shadow copies to be kept.
|
Tail Sync
The 'Tail Sync' works on the current index (see 'targetIndex') and synchronizes older data (new/changed/deleted database records) with OpenSearch so that they match the database again. This can be carried out once every night, for example. Documents that are older than the replication start date (see 'replicationStartDate') are first deleted from the OpenSearch index. It then goes through the database in blocks (default 10-day steps) and compares them with OpenSearch. New/changed database records are transferred to the OpenSearch index and documents that no longer exist in the database are deleted in OpenSearch. The Tail Sync only works with the specified 'idColumns' and 'lastUpdateColumn' fields (Hint: a suitable DB index works wonders!)). The complete data is only read out for new/changed database records.
As an alternative or in addition to a time-controlled tail sync, this can also be started manually.
For this purpose, there is a button under Replication → Jobs for each replication job that starts a tail sync.
The settings replicationTailSyncEnabled and replicationTailSyncCronPattern have no effect on the manual start, all other settings are taken into account.
|
The sync does not run up to the current date (see |
| Setting (Key) | Data type | Description | ||
|---|---|---|---|---|
TailSync_Enabled |
boolean |
Activate the 'Tail Sync' |
||
TailSync_CronPattern |
String |
Cron-like pattern (according to Quartz Scheduler syntax) to define the time when the Tail Sync should be started. Examples
Further examples and documentation can be found on the Quartz-Scheduler website. |
||
TailSync_BlockSize |
Integer |
Block size of the database driver.
Note the information on the option |
||
TailSync_RelativeStartDate |
String |
The relative start date specified. Synchronization should then start from this date. Data that is before the replication start date will continue to be deleted. This date is only used if it is after the regular start date and before the end date.
|
||
TailSync_RelativeEndDate |
String |
The relative end date specified. Synchronization should then continue up to this point in time and no further.
|
||
TailSync_BlockDayRange |
Integer |
Number of days (block) in which the data should be processed.
See also |
||
TailSync_RelativeDeleteOlderThanDate |
String |
The relative deletion date to be specified. All documents that are older are deleted. If this option is not set, all documents that are older than the start date of the replication are deleted (see |
||
TailSync_LoggingEnabled |
boolean |
The relative deletion date to be specified.
Specifies whether the tail sync runs are to be logged in the The global setting |
Consistency Check
A simple consistency check is carried out after the replication runs. The number of documents in the source and target are compared. This is done in the period 'replicationStartDate' and the last date which is in the target (OpenSearch).
|
If there is a lot of data (tens of millions of records), this can lead to an increased load on the system and an initial replication can be extremely slowed down. Only activate the consistency check if you are sure that the data will be fully replicated. |
| Setting (Key) | Data type | Description |
|---|---|---|
ConsistencyCheck_Frequency |
Integer |
The frequency with which the consistency check is performed. Examples:
|
Lookup Joins
Can be used to enrich documents to be replicated with additional data. If, for example only a partner ID is contained in the data to be replicated and the name of the partner etc. is still required in the monitor.
Denormalization is used under OpenSearch, i.e. this data is also transferred to the document and is available like all other data (high-performance search; aggregation, ….). Prerequisite: The lookup "tables" must be available as independent indexes and can, for example, be transferred from a DB table via an additional replication.
The lookup data can be updated manually ( BPC settings → Overview → Status → Replication → Jobs → Job → Synchronize lookup joins ) and automatically using our OpenSearch BPC plugin (os-bpc-plugin).
Several lookup tables can be referenced, the possible values of an ENTRY are described below. Structure: "join": [ { EINTRAG }, { EINTRAG }, … ]
|
For the value comparison to work, the column types in the database tables must be identical. In the case of string fields, "*.raw" should be used as the lookupKeyField, but without the ".raw" |
[
{
"keyField": "PARTNER",
"lookupIndex": "lookup-partner",
"lookupKeyField": "ID.raw",
"resultFieldsPrefix": "partner_",
"resultFieldsExcluded": [ "ID", "LASTUPDATE" ]
},
{
"keyField": "MESSAGETYPE",
"lookupIndex": "lookup-messagetype",
"lookupKeyField": "ID.raw",
"resultFieldsPrefix": "messagetype_",
"resultFieldsExcluded": [ "ID", "LASTUPDATE" ]
}
]
| Field | Data type | Description |
|---|---|---|
keyField |
String |
The key field in the table to be replicated. The data from the lookup index is transferred via the value of the field. Example: PARTNER_ID |
keyFieldValuesSeparator |
String |
If there are several values in the keyField that are separated by a separator, this separator can be specified here. The lookup join is then performed on the individual values. Example: The 'lookupIndex' contains the following data:
If the keyFieldValuesSeparator is '%%', the resultFieldsPrefix is 'MENGENEINHEIT_' and the transferred keyValue value is '42%%20%%30', the following field is created: QUANTITY_LONGNAME = centimeter%%gram%%kilogram |
lookupIndex |
String |
The OpenSearch index with the lookup data. Example: lookup-partner |
lookupKeyField |
String |
Example: ID.raw |
resultFieldsPrefix |
String |
The fields to be transferred from the lookup index must be provided with a unique prefix so that there is no conflict with existing fields. Example: partner_ |
resultFieldsIncluded |
String Array |
If not all fields from the lookup index are to be included, the fields to be included can be specified here. Example: [ "FIRST_NAME", "LAST_NAME" ] |
resultFieldsExcluded |
String Array |
If almost all fields are to be included, the fields to be excluded can be specified here. Example: [ "ID", "UPDATED" ] |
Logging of replications
The runs of the replication jobs can be logged and displayed, for example, via the automatically created monitor "Replication Jobs Monitor".
Similarly, runs of the tail sync can also be logged and viewed in the monitor "Tail Sync Logs Monitor".
These monitors are created at startup if they are not available.
They cannot be permanently deleted.
The indices used bpc-replicationjobs-log and bpc-tailsync-log are also created automatically if a run needs to be logged.
Logging can be switched on and off globally and for each replication job (see corresponding settings above). Logging is activated globally for both replications and tail syncs by default, but deactivated in the individual replication jobs. Therefore, no logging is performed by default.
The log entries for replication runs are written in bulk to the index mentioned above after some time. This has little impact on the overall performance of the system. However, a lot of entries can be generated in a very short time (millions in a few hours), so perhaps do not run the cleanup too often and keep an eye on the number of documents it contains: Settings → Core Services → Indices
Since tail syncs take place much less frequently, the bpc-tailsync-log index should not grow so quickly during tail sync logging, but it is still regularly cleaned of old entries (see configuration above).