ADx Architecture
ADx can be configured in many different ways. Each of the basic setups mentioned below covers different requirements.
This document provides setup and configuration examples - the actual configuration depends on specific requirements.
The ADx deployment is responsible for providing the UI and API related to documents, including operations such as create content, get content, search contents, and more. For details, check ADx REST API documentation.
The TF Conversion deployment is responsible for providing services related to converting documents, such as merge pdfs
, split pdf
, office to pdf
, and more . For details, check Available Conversion Jobs.
On this page
- Single Node Architecture Diagram - Setup 1
- Separate Single Node Architecture Diagram - Setup 2
- Separate Clustered Nodes Architecture Diagram - Setup 3
- Communication Flow - Sequence Diagrams
- Node Requirements
- Backup
- Operating system settings
- Load Balancer
- Firewall settings
- Java requirements
- Runtime
- Runtime - ADx
- Runtime - TF Conversion
- Mount Points
- Databases
Single Node Architecture Diagram - Setup 1
Setup 1 is intended to be used for POC, demo and development (against ADx API) purposes. All instances are on a single node. All data is stored in a single database and the file system is a single mount point. There is no focus on performance, scalability, fail-safety or backup strategies. The setup provides all the APIs available from ADx and TF conversion.
Separate Single Node Architecture Diagram - Setup 2
Setup 2 is intended to be used for development and testing purposes. ADx and Conversion are separated on single nodes. The communication (internal as well as external) is done via optional load balancers. The databases are separated (e.g. to use TF conversion for multiple ADx deployments or being flexible for developer teams to clean up their instances if necessary) as well as the file system. Since it is not intended for production there is no focus on scalability, fail-safety or backup strategies. The separation of ADx and TF conversion separates the CPU-intense conversion from the ADx repositories to size the nodes appropriately. The setup provides all the APIs available from ADx and TF conversion.
Separate Clustered Nodes Architecture Diagram - Setup 3
Setup 3 is intended to be used for production purposes as well as for the latest test stage. ADx and Conversion are separated on cluster nodes. The communication (internal as well as external) is done via load balancers. Each system All databases and file system are designed to achieve maximum value of separation, for optimal performance, scalability, fail-safety, backup strategies and responsibilities.
Communication Flow - Sequence Diagrams
Create Content
Create Repository
Node Requirements
The node requirements are meant to be per node. In case of cluster usage the requirements are multiplied. The nodes itself can be either VMs or bare metal. Take in consideration that also the underlying operational system needs a certain amount of memory.
Node requirements - ADx
Setup | CPU | Memory | Network | Operating System |
---|---|---|---|---|
Setup 1 * | 4 CPU | 12 GB | GBit or better | Linux - Kernel 3.10 or newer |
Setup 2 | 4 CPU | 12 GB | GBit or better | Linux - Kernel 3.10 or newer |
Setup 3 | 8 CPU | 32 GB | GBit or better | Linux - Kernel 3.10 or newer |
*shared on one node
Node requirements - TF conversion
Setup | CPU | Memory | Network | Operating System |
---|---|---|---|---|
Setup 1 * | - | - | - | - |
Setup 2 | 4 CPU | 12 GB | GBit or better | Linux - Kernel 3.10 or newer |
Setup 3 | 8 CPU | 32 GB | GBit or better | Linux - Kernel 3.10 or newer |
*shared on one node
Backup
The backup recommendations are grouped into 5 categories. These recommendations are valid for database as well as for file system data.
Category | Description |
---|---|
Priority 1 | This category holds the most critical data. This includes business data (e.g. documents (content) stored in repositories) as well as configuration of ADx/TF conversion itself (e.g. user information or configuration of the repository). Loosing this kind of data means data loss! It is recommended to backup this category regularly (e.g. daily) to allow disaster recovery. |
Priority 2 | This category holds data that can be restored based on the information of priority 1 but is time consuming. This includes e.g. representations of documents (content). It is recommended to backup this regularly with longer time slots (e.g. every 2 days / weekly) |
Priority 3 | This category holds data that is not relevant for business users but from an admin perspective. This includes e.g. log files. It is recommended to backup this regularly with longer time slots (e.g. every 2 days / weekly) |
Priority 4 | This category holds data which can be recovered by redeployment or is transient data. There is no need of backups. |
Priority external | ADx can access external repositories (e.g. Documentum) to have a normalized view on documents. The external repository needs to be backed up independently of ADx. |
Operating system settings
- Number of concurrently open file descriptors:
Deployment | Value |
---|---|
ADx | 500000 |
TF conversion | 100000 |
Messaging | 10000 |
- Random generator: Since servers have low entropy use a service is necessary in enrich it.
Deployment | Link | Version |
---|---|---|
ADx | haveged | 1.9.2+ |
TF conversion | haveged | 1.9.2+ |
Messaging | haveged | 1.9.2+ |
- Command line tools:
Deployment | Link | Version |
---|---|---|
ADx | curl | 7.29+ |
TF conversion | curl | 7.29+ |
TF conversion | tesseract | 4.0.0+ |
TF conversion | wkhtmltopdf | 0.12.5+ |
Messaging | curl | 7.29+ |
Load Balancer
There are no special requirements for the load balancer. Neither the algorithm nor the session handling (sticky or not) has any special requirements. Since e.g. a conversion health check is synchronous for obvious reasons it is recommended to have a session timeout ≥ 10min.
Firewall settings
On each node (ADx/Conversion) the configured ports (see installation details) need to be opened:
- The external communication (
https
for security reasons) which provides the API to users, developers, and other systems must be available via the load balancers (or node in case there is no load balancer). - Internally, the nodes need to open the ports for messaging in case of a clustered setup. Multicast must be allowed in case of automatic service discovery (optional).
Java requirements
Java Version: OpenJDK or Oracle JDK 11
Java in general supports more than 32GB heap size for one JVM but it needs to be considered that above 32GB Java uses 64bit references, which uses more memory by itself. When deciding to exceed the 32GB boundary, consider that is is necessary to increase the memory dramatically to have a similar heap available. In practice, this means that when increasing the memory above 32GB it is necessary to go over 40GB. See http://java-performance.info/over-32g-heap-java/ for a detailed explanation.
To avoid heap resizing during uptime of the servers which leads to performance issues -Xmx
and -Xms
should be equal.
Based on the available memory settings chosen in the setup configuration above make sure that the operating system does not needs to cache because of memory limitations. The rest should then be assigned to ADx, TF conversion and Messaging.
Java settings - ADx
Setting | Deployment Parameter | Value | description |
---|---|---|---|
-Xmx | maxHeapSize | 4GB - 31GB | Maximum heap size |
-Xms | initialHeapSize | 4GB - 31GB | Minimum heap size |
Java settings - TF conversion
Setting | Deployment Parameter | Value | description |
---|---|---|---|
-Xmx | maxHeapSize | 4GB - 31GB | Maximum heap size |
-Xms | initialHeapSize | 4GB - 31GB | Minimum heap size |
Runtime
The runtime of ADx and TF conversion is Apache Tomcat (http://tomcat.apache.org/).
Runtime - ADx
Setting | Deployment Parameter | Value | Default | Description |
---|---|---|---|---|
http Port | httpPort | any available port | 8080 | expose to the outside (Attention, not encrypted!) |
https Port | httpsPort | any available port | 8443 | expose to the outside |
AJP Port | ajpPort | any available port | 8009 | only internal usage - do not expose to the outside |
server Port | serverPort | any available port | 8005 | only internal usage - do not expose to the outside |
Maximum Connections | maxConnections | a positive number or no limit (recommended) | -1 (no limit) | maximum number of connections |
Maximum Threads | maxThreads | ≥1000 | 4000 | maximum number of request worker |
Runtime - TF Conversion
Setting | Deployment Parameter | Value | Default | Description |
---|---|---|---|---|
http Port | httpPort | any available port | 8080 | expose to the outside (Attention, not encrypted!) |
https Port | httpsPort | any available port | 8443 | expose to the outside |
AJP Port | ajpPort | any available port | 8009 | only internal usage - do not expose to the outside |
server Port | serverPort | any available port | 8005 | only internal usage - do not expose to the outside |
Maximum Connections | maxConnections | a positive number or no limit (recommended) | -1 (no limit) | maximum number of connections |
Maximum Threads | maxThreads | ≥1000 | 4000 | maximum number of request worker |
Mount Points
Since sizing of documents cannot be predicted it is recommended to have a file system with dynamic grow capabilities (e.g. XFS or ZFS)
Mount Points - ADx
Mount Point | Deployment Parameter | Size | Comment | Dynamic sizing | Backup | Type |
---|---|---|---|---|---|---|
ADx installation directory | installationPath | 5GB | holds ADx installation | no | Priority 4 | local |
log files | logFilesDir | multiple GB - depending on log configuration | holds log files | yes | Priority 3 | local |
temp files | tempDir | multiple GB - depending on load | holds temporary files | yes | Priority 4 | local |
content files per repository | configurable via UI during creation of the repository | Depending on number of documents | holds content files | yes | Priority 1 | shared* |
fulltext index | ELASTIC_SERVICE_DATA_PATH | Depending on number of documents | holds elasticsearch fulltext index | yes | Priority 1 | shared* |
*if used, shared between ADx nodes
Mount Points - TF conversion
Mount Point | Deployment Parameter | Size | Comment | Dynamic sizing | Backup | Comment | Type |
---|---|---|---|---|---|---|---|
TF conversion installation directory | installationPath | 5GB | holds log files | no | Priority 4 | local | |
log files | logFilesDir | multiple GB - depending on log configuration | holds log files | yes | Priority 3 | local | |
temp files | tempDir | multiple GB - depending on load | holds temporary files | yes | Priority 4 | local | |
conversion job files | CONV_STORAGE_FOLDER | Depending on the number of conversion requests | holds content files | yes | Priority 4 | only used if CONV_STORAGE_TYPE=fs | shared* |
*if used, shared between TF conversion nodes
Databases
Databases store the metadata and information for the different systems. ADx, for example, needs a database to store information about a document getting checked out, the name of the document or its history. This information goes into the database. Similar principle applies to Conversion. In addition the system databases are used for storing user information, sessions, locks, and so on. A relational database can handle this quite well - therefore it is used.
Database Sharing
It's possible to share databases in the configurations described below.
Shared database | Important information |
---|---|
System database for ADx and Conversion | Users, sessions, and other system data is shared between ADx and Conversion in this setup. |
Single database for Conversion system and access data | This setup is possible provided you don't need to back up your conversion system data. In practice it means that you can share the database provided you don't have any special conversion users. |
Single database for Conversion access data and ADx Cache data | This setup is possible but not recommended. Conversion access data is transient; ADx cache data is also a kind of transient, but it is really time consuming to recreate - it requires regular backups. |
Database - ADx
ADx needs one SQL database for system-related data (such as sessions, user accounts or repository configuration).
ADx also needs SQL databases for each repository. For tribefire repository two databases are necessary - cache and content. For external repositories (e.g. Documentum, CMIS) one database is necessary - cache.
SQL Database - ADx
Usage | Description | Backup | Type |
---|---|---|---|
System database | contains user information, repository configuration, user sessions, locking information, leadership information | Priority 1 | shared between all ADx nodes, optionally shared with TF conversion and/or Messaging |
Cache database | contains cache/representations of content - can be recreated in a time consuming process | Priority 2 | shared between all ADx |
Content database | contains business data | Priority 1 | shared between all ADx |
SQL Database - ADx - PostgreSQL
System database:
Requirements | Parameter | Description |
---|---|---|
Version | --- | 11 or newer |
ConnectionPool size User Session - min | USER_SESSIONS_DB.minPoolSize * | 0 |
ConnectionPool size User Session - max | USER_SESSIONS_DB.maxPoolSize * | 20 |
ConnectionPool size User Statistics - min | USER_SESSION_STATISTICS_DB.minPoolSize * | 0 |
ConnectionPool size User Statistics - max | USER_SESSION_STATISTICS_DB.maxPoolSize * | 10 |
ConnectionPool size Authorization - min | AUTH_DB.minPoolSize * | 0 |
ConnectionPool size Authorization - max | AUTH_DB.maxPoolSize * | 10 |
ConnectionPool size Locking - min | LOCKING_DB.minPoolSize * | 0 |
ConnectionPool size Locking - max | LOCKING_DB.maxPoolSize * | 5 |
ConnectionPool size LEADERSHIP_DB - min | LEADERSHIP_DB.minPoolSize * | 0 |
ConnectionPool size LEADERSHIP_DB - max | LEADERSHIP_DB.maxPoolSize * | 5 |
*or the default configuration of DEFAULT_DB
- the sum of the maximum connections of each connection pool is the required number of connections from the database
Cache database:
Requirements | Description |
---|---|
Version | 11 or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
Content database:
Requirements | Description |
---|---|
Version | 11 or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
SQL Database - ADx - Oracle
System database:
Requirements | Parameter | Description |
---|---|---|
Version | --- | 11g or newer |
ConnectionPool size User Session - min | USER_SESSIONS_DB.minPoolSize * | 0 |
ConnectionPool size User Session - max | USER_SESSIONS_DB.maxPoolSize * | 20 |
ConnectionPool size User Statistics - min | USER_SESSION_STATISTICS_DB.minPoolSize * | 0 |
ConnectionPool size User Statistics - max | USER_SESSION_STATISTICS_DB.maxPoolSize * | 10 |
ConnectionPool size Authorization - min | AUTH_DB.minPoolSize * | 0 |
ConnectionPool size Authorization - max | AUTH_DB.maxPoolSize * | 10 |
ConnectionPool size Locking - min | LOCKING_DB.minPoolSize * | 0 |
ConnectionPool size Locking - max | LOCKING_DB.maxPoolSize * | 5 |
ConnectionPool size LEADERSHIP_DB - min | LEADERSHIP_DB.minPoolSize * | 0 |
ConnectionPool size LEADERSHIP_DB - max | LEADERSHIP_DB.maxPoolSize * | 5 |
*or the default configuration of DEFAULT_DB
- the sum of the maximum connections of each connection pool is the required number of connections from the database
Cache database:
Requirements | Description |
---|---|
Version | 11g or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
Content database:
Requirements | Description |
---|---|
Version | 11g or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
SQL Database - ADx - MSSQL
System database:
Requirements | Parameter | Description |
---|---|---|
Version | --- | 12 or newer |
ConnectionPool size User Session - min | USER_SESSIONS_DB.minPoolSize * | 0 |
ConnectionPool size User Session - max | USER_SESSIONS_DB.maxPoolSize * | 20 |
ConnectionPool size User Statistics - min | USER_SESSION_STATISTICS_DB.minPoolSize * | 0 |
ConnectionPool size User Statistics - max | USER_SESSION_STATISTICS_DB.maxPoolSize * | 10 |
ConnectionPool size Authorization - min | AUTH_DB.minPoolSize * | 0 |
ConnectionPool size Authorization - max | AUTH_DB.maxPoolSize * | 10 |
ConnectionPool size Locking - min | LOCKING_DB.minPoolSize * | 0 |
ConnectionPool size Locking - max | LOCKING_DB.maxPoolSize * | 5 |
ConnectionPool size LEADERSHIP_DB - min | LEADERSHIP_DB.minPoolSize * | 0 |
ConnectionPool size LEADERSHIP_DB - max | LEADERSHIP_DB.maxPoolSize * | 5 |
*or the default configuration of DEFAULT_DB
- the sum of the maximum connections of each connection pool is the required number of connections from the database
Cache database:
Requirements | Description |
---|---|
Version | 12 or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
Content database:
Requirements | Description |
---|---|
Version | 12 or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
Database - TF conversion
TF conversion needs the following databases:
- One SQL database for system-related data (such as sessions or user accounts).
- One SQL database for storing the job information.
In addition, a proprietary database is used on all nodes. This database holds the configuration of TF conversion.
SQL Database - TF conversion
Type | Description | Backup | Type |
---|---|---|---|
System database | contains user information, repository configuration, user sessions, locking information, leadership information | Priority 1 | shared between all TF conversion nodes, optionally shared with ADx and/or Messaging |
Conversion database | contains transient data | Priority 4 | shared between all TF conversion nodes |
SQL Database - TF conversion - PostgreSQL
System database:
Requirements | Parameter | Description |
---|---|---|
Version | --- | 11 or newer |
ConnectionPool size User Session - min | USER_SESSIONS_DB.minPoolSize * | 0 |
ConnectionPool size User Session - max | USER_SESSIONS_DB.maxPoolSize * | 20 |
ConnectionPool size User Statistics - min | USER_SESSION_STATISTICS_DB.minPoolSize * | 0 |
ConnectionPool size User Statistics - max | USER_SESSION_STATISTICS_DB.maxPoolSize * | 10 |
ConnectionPool size Authorization - min | AUTH_DB.minPoolSize * | 0 |
ConnectionPool size Authorization - max | AUTH_DB.maxPoolSize * | 10 |
ConnectionPool size Locking - min | LOCKING_DB.minPoolSize * | 0 |
ConnectionPool size Locking - max | LOCKING_DB.maxPoolSize * | 5 |
ConnectionPool size LEADERSHIP_DB - min | LEADERSHIP_DB.minPoolSize * | 0 |
ConnectionPool size LEADERSHIP_DB - max | LEADERSHIP_DB.maxPoolSize * | 5 |
*or the default configuration of DEFAULT_DB
- the sum of the maximum connections of each connection pool is the required number of connections from the database
Conversion Job database:
Requirements | Description |
---|---|
Version | 11 or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
If CONV_STORAGE_TYPE=db
then the conversion files will be stored in database. This needs to be considered in case of sizing/caching configuration of the database.
SQL Database - TF conversion - Oracle
System database:
Requirements | Parameter | Description |
---|---|---|
Version | --- | 11g or newer |
ConnectionPool size User Session - min | USER_SESSIONS_DB.minPoolSize * | 0 |
ConnectionPool size User Session - max | USER_SESSIONS_DB.maxPoolSize * | 20 |
ConnectionPool size User Statistics - min | USER_SESSION_STATISTICS_DB.minPoolSize * | 0 |
ConnectionPool size User Statistics - max | USER_SESSION_STATISTICS_DB.maxPoolSize * | 10 |
ConnectionPool size Authorization - min | AUTH_DB.minPoolSize * | 0 |
ConnectionPool size Authorization - max | AUTH_DB.maxPoolSize * | 10 |
ConnectionPool size Locking - min | LOCKING_DB.minPoolSize * | 0 |
ConnectionPool size Locking - max | LOCKING_DB.maxPoolSize * | 5 |
ConnectionPool size LEADERSHIP_DB - min | LEADERSHIP_DB.minPoolSize * | 0 |
ConnectionPool size LEADERSHIP_DB - max | LEADERSHIP_DB.maxPoolSize * | 5 |
*or the default configuration of DEFAULT_DB
- the sum of the maximum connections of each connection pool is the required number of connections from the database
Conversion Job database:
Requirements | Description |
---|---|
Version | 11g or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
If CONV_STORAGE_TYPE=db
then the conversion files will be stored in database. This needs to be considered in case of sizing/caching configuration of the database.
SQL Database - TF conversion - MSSQL
System database:
Requirements | Parameter | Description |
---|---|---|
Version | --- | 12 or newer |
ConnectionPool size User Session - min | USER_SESSIONS_DB.minPoolSize * | 0 |
ConnectionPool size User Session - max | USER_SESSIONS_DB.maxPoolSize * | 20 |
ConnectionPool size User Statistics - min | USER_SESSION_STATISTICS_DB.minPoolSize * | 0 |
ConnectionPool size User Statistics - max | USER_SESSION_STATISTICS_DB.maxPoolSize * | 10 |
ConnectionPool size Authorization - min | AUTH_DB.minPoolSize * | 0 |
ConnectionPool size Authorization - max | AUTH_DB.maxPoolSize * | 10 |
ConnectionPool size Locking - min | LOCKING_DB.minPoolSize * | 0 |
ConnectionPool size Locking - max | LOCKING_DB.maxPoolSize * | 5 |
ConnectionPool size LEADERSHIP_DB - min | LEADERSHIP_DB.minPoolSize * | 0 |
ConnectionPool size LEADERSHIP_DB - max | LEADERSHIP_DB.maxPoolSize * | 5 |
*or the default configuration of DEFAULT_DB
- the sum of the maximum connections of each connection pool is the required number of connections from the database
Conversion Job database:
Requirements | Description |
---|---|
Version | 12 or newer |
ConnectionPool size min | 100 |
ConnectionPool size max | 150 |
If CONV_STORAGE_TYPE=db
then the conversion files will be stored in database. This needs to be considered in case of sizing/caching configuration of the database.