Configuration
The install package
The installation package contains everything necessary for the proper operation of pseudify.
This includes the configuration files to connect pseudify to a database and examples of how pseudify can be functionally extended.
The files are to be understood as start templates that you can adapt to your own requirements.
Content:
- docker-compose.yml: Starts pseudify with the GUI for analyzing the database and modeling pseudonymization tasks (we call it the
analyze setup
). - docker-compose.llm-addon.yml: Extends the
analyze setup
with AI capabilities. Pseudify uses this locally running LLM to determine personally identifiable information (PII). The data is processed exclusively on your computer and does not leave it. - docker-compose.database.yml: Contains an example of how a database server can be started via docker compose if you need it.
- userdata/: This folder contains everything you need to configure and extend pseudify.
- userdata/.env.example: An example of the basic configuration of pseudify. Pseudify mainly uses env variables for the basic configuration.
- userdata/config/: This folder contains files for advanced configuration.
- userdata/src/: This folder contains the analysis and pseudonymization profile(s) that you have created with the GUI. This folder also includes examples for custom functional extensions.
- userdata/src/Encoder/: This folder contains an example of a custom data encoder implementation (
Rot13Encoder
). - userdata/src/Faker/: This folder contains an example of a custom data faker implementation (
BobRossLipsum
). - userdata/src/Processing/: This folder contains an example of a custom condition expression implementation (
isBobRoss()
). - userdata/src/Profiles/: This folder contains the pseudify profiles. It can contain the
low-level profiles (PHP)
or the YAML profiles created via the GUI. - userdata/src/Profiles/Yaml/: This folder contains the YAML pseudify profiles created via the GUI.
- userdata/src/Types/: This folder contains an example of custom database type implementations (
Enum
andSet
).
Configuration options
.env
The basic configuration of pseudify takes place using values in an .env
file.
The install package
contain an exemplary .env file which can be used as a basis for your own configuration.
PSEUDIFY_FAKER_LOCALE
Default: en_US
Pseudify uses the FakerPHP/Faker component to generate the pseudonyms.
The component allows the generation of language-specific values.
Supported values of PSEUDIFY_FAKER_LOCALE
can be found in the FakerPHP/Faker Repository.
Example
PSEUDIFY_FAKER_LOCALE=de_DE
PSEUDIFY_DATABASE_DRIVER
Default: pdo_mysql
Resolves to connection parameter: doctrine.dbal.connections.default.driver
The value of PSEUDIFY_DATABASE_DRIVER
must be a supported driver of the Doctrine DBAL component.
The pseudify docker container comes with the following driver support:
- pdo_mysql (A MySQL driver that uses the pdo_mysql PDO extension
- mysqli (A MySQL driver that uses the mysqli extension
- pdo_pgsql (A PostgreSQL driver that uses the pdo_pgsql PDO extension)
- pdo_sqlite (An SQLite driver that uses the pdo_sqlite PDO extension)
- sqlite3 (An SQLite driver that uses the sqlite3 extension)
- pdo_sqlsrv (A Microsoft SQL Server driver that uses pdo_sqlsrv PDO)
- sqlsrv (A Microsoft SQL Server driver that uses the sqlsrv PHP extension)
Info
Support for the oci8
driver for Oracle databases should be possible (pull requests are welcome).
Example
PSEUDIFY_DATABASE_DRIVER=pdo_mysql
PSEUDIFY_DATABASE_HOST
Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.host
The host name under which the database server can be reached.
This value is only used when using the following drivers:
Example
PSEUDIFY_DATABASE_HOST=host.docker.internal
PSEUDIFY_DATABASE_PORT
Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.port
The port under which the database server can be reached.
This value is only used when using the following drivers:
Example
PSEUDIFY_DATABASE_PORT=3306
PSEUDIFY_DATABASE_USER
Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.user
The user name of the database.
This value is only used when using the following drivers:
Example
PSEUDIFY_DATABASE_USER=pseudify
PSEUDIFY_DATABASE_PASSWORD
Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.password
The password of the database.
This value is only used when using the following drivers:
Example
PSEUDIFY_DATABASE_PASSWORD='super(!)sEcReT'
PSEUDIFY_DATABASE_SCHEMA
Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.dbname
or doctrine.dbal.connections.default.path
For the following drivers, PSEUDIFY_DATABASE_SCHEMA
corresponds to the database name:
For the following drivers, PSEUDIFY_DATABASE_SCHEMA
corresponds to the file system path to the database:
Example
PSEUDIFY_DATABASE_SCHEMA=wordpress_prod
PSEUDIFY_DATABASE_CHARSET
Default: utf8mb4
Resolves to connection parameter: doctrine.dbal.connections.default.charset
The character set used during the connection to the database.
This value is only used when using the following drivers:
Example
PSEUDIFY_DATABASE_CHARSET=utf8mb4
PSEUDIFY_DATABASE_VERSION
Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.server_version
Doctrine comes with different database platform implementations for some vendors to support version-specific features, dialects and behaviours.
The drivers automatically detect the platform version and instantiate the appropriate platform class.
If you want to disable automatic database platform detection and explicitly select the platform version implementation, you can do this with the value in PSEUDIFY_DATABASE_VERSION
.
Info
If you are using a MariaDB database, you should prefix the value PSEUDIFY_DATABASE_VERSION
with mariadb-
(example: mariadb-10.2).
Example
PSEUDIFY_DATABASE_VERSION=8.0
PSEUDIFY_DATABASE_SSL_INSECURE
Default: <empty>
Resolves to connection parameter: doctrine.dbal.connections.default.options.TrustServerCertificate
If the value of PSEUDIFY_DATABASE_SSL_INSECURE
is set to 1
, no check of the TLS certificate of the database server is performed.
This value is only used when using the following drivers:
PSEUDIFY_DATABASE_SSL_INSECURE=1
Advanced connection settings
If you need to configure other driver options, you can do so in the install package file userdata/config/configuration.yaml
.
Examples and information for driver options can be found in the following documents:
After changes of the connection settings, the cache must be cleared
pseudify cache:clear
Multiple connection configurations
It is possible to configure multiple connections.
The connection named default
is used as the default connection.
In the install package file userdata/config/configuration.yaml
further connections can be configured under a different name.
doctrine:
dbal:
connections:
myCustomConnection:
driver: sqlsrv
# ...
The configured connections can be used with the --connection
parameter.
pseudify pseudify:pseudonymize --connection myCustomConnection myPseudonymizationProfileName
The following commands accept the --connection
parameter:
pseudify:analyze
pseudify:autoconfiguration
pseudify:debug:analyze
pseudify:debug:pseudonymize
pseudify:debug:table_schema
pseudify:pseudonymize
Custom extensions
Registering custom database types
If user-defined database types are required, you can define them at connection level in the install package file userdata/config/configuration.yaml
.
Example implementations for user-defined database types can be found in the following install package files:
These user-defined database types can then be configured in the install package file userdata/config/configuration.yaml
doctrine:
dbal:
types:
enum: Waldhacker\Pseudify\Types\TYPO3\EnumType
set: Waldhacker\Pseudify\Types\TYPO3\SetType
connections:
default:
mapping_types:
enum: enum
set: set
Examples and information for user-defined data types can be found in the following documents:
- Symfony DoctrineBundle - Registering custom Mapping Types
- Symfony DoctrineBundle - Registering custom Mapping Types in the SchemaTool
- Doctrine DBAL - Custom Mapping Types
After adding custom data types, the cache must be cleared.
pseudify cache:clear
Registering custom faker formatters
The FakerPHP/Faker component comes with a lot of predefined formatters to generate various data formats.
If you want to use custom formatters, you can look at the implementation of the BobRossLipsumProvider example.
The custom formatter must implement the interface Waldhacker\Pseudify\Core\Faker\FakeDataProviderInterface
to be integrated into the system.
The best way to see how formatters can generate data is to look at the providers in the FakerPHP/Faker project.
After adding custom faker formatters, the cache must be cleared.
pseudify cache:clear
Registering custom decoders / encoders
If you want to use custom decoders / encoders, you can see an implementation in the example of the Rot13Encoder.
The custom decoder / encoder must implement the interface Waldhacker\Pseudify\Core\Processor\Encoder\EncoderInterface
to be integrated into the system.
The best way to see how decoders / encoders can decode and encode data is to look at the built-in decoders/encoders like the Base64Encoder.
If you want your decoder / encoder to be configurable via the GUI, your encoder / decoder must provide a form type that defines the configuration form.
Look at the Base64Encoder which provides the Base64EncoderType to get an idea how to provide a configuration form.
Additional information for user-defined form types can be found in the following document:
After adding custom decoders/encoders, the cache must be cleared.
pseudify cache:clear
Note
User-defined decoders / encoders should follow the <Format>Encoder
naming convention (e.g. HexEncoder
, Rot13Encoder
etc.).
This ensures that debug commands like pseudify:debug:analyse
can represent the names of the decoders / encoders well.
Register custom condition expressions
If you want to use custom condition expressions, you can see an implementation in the example of the ConditionExpressionProvider.
You need to define an description and an Symfony\Component\ExpressionLanguage\ExpressionFunction
implementation which implements the evaluator
.
Additional information for user-defined expressions can be found in the following document:
Manage database access
Database access can be managed in various ways.
Some variants are presented below.
Access a database running on your host system
Pseudify is running as a standalone binary (pseudonymization setup
)
Add the parameter --add-host=host.docker.internal:host-gateway
to the docker run
command.
The option PSEUDIFY_DATABASE_HOST
in the install package file userdata/.env must be set to host.docker.internal
.
Note
For this variant to work, the port of the database server on the docker gateway (host system) must be open.
Then run pseudify like:
$ docker run --rm -it --add-host=host.docker.internal:host-gateway -v "$(pwd)/userdata/":/opt/pseudify/userdata/ \
ghcr.io/waldhacker/pseudify-ai:2.0 pseudify:debug:table_schema
pseudify is running using docker compose (analyze setup
)
Add the OPTION extra_hosts: ['host.docker.internal:host-gateway']
in the install package file docker-compose.yml like:
services:
pseudify:
# ...
extra_hosts:
- 'host.docker.internal:host-gateway'
The option PSEUDIFY_DATABASE_HOST
in the install package file userdata/.env must be set to host.docker.internal
.
Then start pseudify like:
$ docker compose up -d
Access a database using docker services
pseudify is running as a standalone binary (pseudonymization setup
)
Create a docker network with the name pseudify-net
(if none already exists):
$ docker network create pseudify-net
Start a database server using the network pseudify-net
(--network pseudify-net
).
The database container is given the name mariadb_10_5
(--name mariadb_10_5
).
Therefore the option PSEUDIFY_DATABASE_HOST
in the install package file userdata/.env must be set to mariadb_10_5
.
Note
For the import of the test database (-v "$(pwd)"/database-data:/docker-entrypoint-initdb.d
) to work correctly, the command must be executed in the main directory of the install package.
$ docker run --rm --detach \
--network pseudify-net \
--name mariadb_10_5 \
--env MARIADB_USER=pseudify \
--env MARIADB_PASSWORD='P53ud1fy(!)w4ldh4ck3r' \
--env MARIADB_ROOT_PASSWORD='P53ud1fy(!)w4ldh4ck3r' \
--env MARIADB_DATABASE=pseudify_utf8mb4 \
-v "$(pwd)"/database-data:/docker-entrypoint-initdb.d \
mariadb:10.5
Then run pseudify like:
$ docker run --rm -it --add-host=host.docker.internal:host-gateway -v "$(pwd)/userdata/":/opt/pseudify/userdata/ \
ghcr.io/waldhacker/pseudify-ai:2.0 pseudify:debug:table_schema
pseudify is running using docker compose (analyze setup
)
You can use the install package file docker-compose.database.yml and adapt it to your needs.
The option PSEUDIFY_DATABASE_HOST
in the install package file userdata/.env must be set to database service name like mariadb_10_5
.
Then start pseudify like:
$ docker compose -f docker-compose.yml -f docker-compose.database.yml up -d
Debug the configuration
Commands exist to check the configuration of the system.
pseudify:information
The command pseudify pseudify:information
lists:
- available profiles to analyse the database (
Registered analyse profiles
) - available profiles to pseudonymise the database (
Registered pseudonymize profiles
) - registered database types
- registered condition expressions
- registered encoders / decoders
- database drivers available in the system (
Available built-in database drivers
) - information per configured connection (
Connection information for connection "<connecntion name>"
) - information about which database types are associated with which doctrine implementations (
Registered doctrine database data type mappings
) - information about the doctrine driver implementations used and the system driver used (
Connection details
).
debug:config DoctrineBundle
The command lists the combined database configuration, which consists of the core configuration
and the user-defined configuration from the install package.
debug:dotenv
The command lists the values from the .env
file.