Configuration

The install package

The installation package contains everything necessary for the proper operation of pseudify.
This includes the configuration files to connect pseudify to a database and examples of how pseudify can be functionally extended.
The files are to be understood as start templates that you can adapt to your own requirements.

Content:

  • docker-compose.yml: Starts pseudify with the GUI for analyzing the database and modeling pseudonymization tasks (we call it the analyze setup).
  • docker-compose.llm-addon.yml: Extends the analyze setup with AI capabilities. Pseudify uses this locally running LLM to determine personally identifiable information (PII). The data is processed exclusively on your computer and does not leave it.
  • docker-compose.database.yml: Contains an example of how a database server can be started via docker compose if you need it.
  • userdata/: This folder contains everything you need to configure and extend pseudify.
  • userdata/.env.example: An example of the basic configuration of pseudify. Pseudify mainly uses env variables for the basic configuration.
  • userdata/config/: This folder contains files for advanced configuration.
  • userdata/src/: This folder contains the analysis and pseudonymization profile(s) that you have created with the GUI. This folder also includes examples for custom functional extensions.
  • userdata/src/Encoder/: This folder contains an example of a custom data encoder implementation (Rot13Encoder).
  • userdata/src/Faker/: This folder contains an example of a custom data faker implementation (BobRossLipsum).
  • userdata/src/Processing/: This folder contains an example of a custom condition expression implementation (isBobRoss()).
  • userdata/src/Profiles/: This folder contains the pseudify profiles. It can contain the low-level profiles (PHP) or the YAML profiles created via the GUI.
  • userdata/src/Profiles/Yaml/: This folder contains the YAML pseudify profiles created via the GUI.
  • userdata/src/Types/: This folder contains an example of custom database type implementations (Enum and Set).

Configuration options

.env

The basic configuration of pseudify takes place using values in an .env file.
The install package contain an exemplary .env file which can be used as a basis for your own configuration.

PSEUDIFY_FAKER_LOCALE

Default: en_US

Pseudify uses the FakerPHP/Faker component to generate the pseudonyms.
The component allows the generation of language-specific values.
Supported values of PSEUDIFY_FAKER_LOCALE can be found in the FakerPHP/Faker Repository.

Example
PSEUDIFY_FAKER_LOCALE=de_DE

PSEUDIFY_DATABASE_DRIVER

Default: pdo_mysql
Resolves to connection parameter: doctrine.dbal.connections.default.driver

The value of PSEUDIFY_DATABASE_DRIVER must be a supported driver of the Doctrine DBAL component.
The pseudify docker container comes with the following driver support:

  • pdo_mysql (A MySQL driver that uses the pdo_mysql PDO extension
  • mysqli (A MySQL driver that uses the mysqli extension
  • pdo_pgsql (A PostgreSQL driver that uses the pdo_pgsql PDO extension)
  • pdo_sqlite (An SQLite driver that uses the pdo_sqlite PDO extension)
  • sqlite3 (An SQLite driver that uses the sqlite3 extension)
  • pdo_sqlsrv (A Microsoft SQL Server driver that uses pdo_sqlsrv PDO)
  • sqlsrv (A Microsoft SQL Server driver that uses the sqlsrv PHP extension)

Info

Support for the oci8 driver for Oracle databases should be possible (pull requests are welcome).

Example
PSEUDIFY_DATABASE_DRIVER=pdo_mysql

PSEUDIFY_DATABASE_HOST

Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.host

The host name under which the database server can be reached.
This value is only used when using the following drivers:

Example
PSEUDIFY_DATABASE_HOST=host.docker.internal

PSEUDIFY_DATABASE_PORT

Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.port

The port under which the database server can be reached.
This value is only used when using the following drivers:

Example
PSEUDIFY_DATABASE_PORT=3306

PSEUDIFY_DATABASE_USER

Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.user

The user name of the database.
This value is only used when using the following drivers:

Example
PSEUDIFY_DATABASE_USER=pseudify

PSEUDIFY_DATABASE_PASSWORD

Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.password

The password of the database.
This value is only used when using the following drivers:

Example
PSEUDIFY_DATABASE_PASSWORD='super(!)sEcReT'

PSEUDIFY_DATABASE_SCHEMA

Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.dbname or doctrine.dbal.connections.default.path

For the following drivers, PSEUDIFY_DATABASE_SCHEMA corresponds to the database name:

For the following drivers, PSEUDIFY_DATABASE_SCHEMA corresponds to the file system path to the database:

Example
PSEUDIFY_DATABASE_SCHEMA=wordpress_prod

PSEUDIFY_DATABASE_CHARSET

Default: utf8mb4
Resolves to connection parameter: doctrine.dbal.connections.default.charset

The character set used during the connection to the database.
This value is only used when using the following drivers:

Example
PSEUDIFY_DATABASE_CHARSET=utf8mb4

PSEUDIFY_DATABASE_VERSION

Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.server_version

Doctrine comes with different database platform implementations for some vendors to support version-specific features, dialects and behaviours.
The drivers automatically detect the platform version and instantiate the appropriate platform class.
If you want to disable automatic database platform detection and explicitly select the platform version implementation, you can do this with the value in PSEUDIFY_DATABASE_VERSION.

Info

If you are using a MariaDB database, you should prefix the value PSEUDIFY_DATABASE_VERSION with mariadb- (example: mariadb-10.2).

Example
PSEUDIFY_DATABASE_VERSION=8.0

PSEUDIFY_DATABASE_SSL_INSECURE

Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.options.TrustServerCertificate

If the value of PSEUDIFY_DATABASE_SSL_INSECURE is set to 1, no check of the TLS certificate of the database server is performed.

This value is only used when using the following drivers:

PSEUDIFY_DATABASE_SSL_INSECURE=1

Advanced connection settings

If you need to configure other driver options, you can do so in the install package file userdata/config/configuration.yaml.
Examples and information for driver options can be found in the following documents:

After changes of the connection settings, the cache must be cleared

pseudify cache:clear

Multiple connection configurations

It is possible to configure multiple connections.
The connection named default is used as the default connection.
In the install package file userdata/config/configuration.yaml further connections can be configured under a different name.

doctrine:
  dbal:
    connections:
      myCustomConnection:
        driver: sqlsrv
        # ...

The configured connections can be used with the --connection parameter.

pseudify pseudify:pseudonymize --connection myCustomConnection myPseudonymizationProfileName

The following commands accept the --connection parameter:

  • pseudify:analyze
  • pseudify:autoconfiguration
  • pseudify:debug:analyze
  • pseudify:debug:pseudonymize
  • pseudify:debug:table_schema
  • pseudify:pseudonymize

Custom extensions

Registering custom database types

If user-defined database types are required, you can define them at connection level in the install package file userdata/config/configuration.yaml.

Example implementations for user-defined database types can be found in the following install package files:

These user-defined database types can then be configured in the install package file userdata/config/configuration.yaml

doctrine:
  dbal:
    types:
      enum: Waldhacker\Pseudify\Types\TYPO3\EnumType
      set: Waldhacker\Pseudify\Types\TYPO3\SetType
    connections:
      default:
        mapping_types:
          enum: enum
          set: set

Examples and information for user-defined data types can be found in the following documents:

After adding custom data types, the cache must be cleared.

pseudify cache:clear

Registering custom faker formatters

The FakerPHP/Faker component comes with a lot of predefined formatters to generate various data formats.
If you want to use custom formatters, you can look at the implementation of the BobRossLipsumProvider example.
The custom formatter must implement the interface Waldhacker\Pseudify\Core\Faker\FakeDataProviderInterface to be integrated into the system.
The best way to see how formatters can generate data is to look at the providers in the FakerPHP/Faker project.

After adding custom faker formatters, the cache must be cleared.

pseudify cache:clear

Registering custom decoders / encoders

If you want to use custom decoders / encoders, you can see an implementation in the example of the Rot13Encoder.
The custom decoder / encoder must implement the interface Waldhacker\Pseudify\Core\Processor\Encoder\EncoderInterface to be integrated into the system.
The best way to see how decoders / encoders can decode and encode data is to look at the built-in decoders/encoders like the Base64Encoder.
If you want your decoder / encoder to be configurable via the GUI, your encoder / decoder must provide a form type that defines the configuration form.
Look at the Base64Encoder which provides the Base64EncoderType to get an idea how to provide a configuration form.

Additional information for user-defined form types can be found in the following document:

After adding custom decoders/encoders, the cache must be cleared.

pseudify cache:clear

Note

User-defined decoders / encoders should follow the <Format>Encoder naming convention (e.g. HexEncoder, Rot13Encoder etc.).
This ensures that debug commands like pseudify:debug:analyse can represent the names of the decoders / encoders well.

Register custom condition expressions

If you want to use custom condition expressions, you can see an implementation in the example of the ConditionExpressionProvider.
You need to define an description and an Symfony\Component\ExpressionLanguage\ExpressionFunction implementation which implements the evaluator.

Additional information for user-defined expressions can be found in the following document:

Manage database access

Database access can be managed in various ways.
Some variants are presented below.

Access a database running on your host system

Pseudify is running as a standalone binary (pseudonymization setup)

Add the parameter --add-host=host.docker.internal:host-gateway to the docker run command.
The option PSEUDIFY_DATABASE_HOST in the install package file userdata/.env must be set to host.docker.internal.

Note

For this variant to work, the port of the database server on the docker gateway (host system) must be open.

Then run pseudify like:

$ docker run --rm -it --add-host=host.docker.internal:host-gateway -v "$(pwd)/userdata/":/opt/pseudify/userdata/ \
    ghcr.io/waldhacker/pseudify-ai:2.0 pseudify:debug:table_schema

pseudify is running using docker compose (analyze setup)

Add the OPTION extra_hosts: ['host.docker.internal:host-gateway'] in the install package file docker-compose.yml like:

services:
  pseudify:
    # ...
    extra_hosts:
      - 'host.docker.internal:host-gateway'

The option PSEUDIFY_DATABASE_HOST in the install package file userdata/.env must be set to host.docker.internal.

Then start pseudify like:

$ docker compose up -d

Access a database using docker services

pseudify is running as a standalone binary (pseudonymization setup)

Create a docker network with the name pseudify-net (if none already exists):

$ docker network create pseudify-net

Start a database server using the network pseudify-net (--network pseudify-net).
The database container is given the name mariadb_10_5 (--name mariadb_10_5).
Therefore the option PSEUDIFY_DATABASE_HOST in the install package file userdata/.env must be set to mariadb_10_5.

Note

For the import of the test database (-v "$(pwd)"/database-data:/docker-entrypoint-initdb.d) to work correctly, the command must be executed in the main directory of the install package.

$ docker run --rm --detach \
    --network pseudify-net \
    --name mariadb_10_5 \
    --env MARIADB_USER=pseudify \
    --env MARIADB_PASSWORD='P53ud1fy(!)w4ldh4ck3r' \
    --env MARIADB_ROOT_PASSWORD='P53ud1fy(!)w4ldh4ck3r' \
    --env MARIADB_DATABASE=pseudify_utf8mb4 \
    -v "$(pwd)"/database-data:/docker-entrypoint-initdb.d \
    mariadb:10.5

Then run pseudify like:

$ docker run --rm -it --add-host=host.docker.internal:host-gateway -v "$(pwd)/userdata/":/opt/pseudify/userdata/ \
    ghcr.io/waldhacker/pseudify-ai:2.0 pseudify:debug:table_schema

pseudify is running using docker compose (analyze setup)

You can use the install package file docker-compose.database.yml and adapt it to your needs.
The option PSEUDIFY_DATABASE_HOST in the install package file userdata/.env must be set to database service name like mariadb_10_5.

Then start pseudify like:

$ docker compose -f docker-compose.yml -f docker-compose.database.yml up -d

Debug the configuration

Commands exist to check the configuration of the system.

pseudify:information

The command pseudify pseudify:information lists:

  • available profiles to analyse the database (Registered analyse profiles)
  • available profiles to pseudonymise the database (Registered pseudonymize profiles)
  • registered database types
  • registered condition expressions
  • registered encoders / decoders
  • database drivers available in the system (Available built-in database drivers)
  • information per configured connection (Connection information for connection "<connecntion name>")
  • information about which database types are associated with which doctrine implementations (Registered doctrine database data type mappings)
  • information about the doctrine driver implementations used and the system driver used (Connection details).

debug:config DoctrineBundle

The command lists the combined database configuration, which consists of the core configuration
and the user-defined configuration from the install package.

debug:dotenv

The command lists the values from the .env file.