A Python tool for migrating user accounts from WebProtege's MongoDB database to Keycloak identity management. Designed to handle large-scale migrations (170K+ users) with configurable filtering, resume capability, and detailed logging.
WebProtege stores user accounts in MongoDB (webprotege.Users). As part of moving authentication to Keycloak, all user accounts need to be migrated. Since MongoDB passwords are MD5-salted hashes that are incompatible with Keycloak, migrated users are required to reset their password on first login.
| MongoDB Field | Keycloak Field | Notes |
|---|---|---|
emailAddress |
username and email |
Passed through as-is; validated by filters before migration |
_id |
attributes.webprotege_username |
Original WebProtégé username, preserves project ownership links |
| (not migrated) | firstName, lastName |
Left blank — realName splitting was unreliable for many name formats |
| (not migrated) | requiredActions |
Empty — migrated users use "Forgot password?" to set their initial password |
| (added) | attributes.mongo_migrated |
Tags migrated users for auditing |
Passwords, salt values, and password digests are not migrated.
The _id and emailAddress fields are passed through without modification. If they contain characters Keycloak rejects, the filters catch them pre-migration and log the user as skipped.
- Python 3.12+
- Access to the WebProtege MongoDB instance
- A running Keycloak instance with the
webprotegerealm created - Keycloak admin credentials
cd webprotege-user-migration
pip install -r requirements.txtSensitive values are read from environment variables. Copy the example file and fill in your values:
cp .env.example .env
# Edit .env with your actual credentialsRequired variables (see .env.example):
| Variable | Description |
|---|---|
MONGODB_URI |
MongoDB connection string |
KEYCLOAK_ADMIN_USERNAME |
Keycloak admin username |
KEYCLOAK_ADMIN_PASSWORD |
Keycloak admin password |
Non-sensitive settings are in config.yaml. Sensitive values use ${ENV_VAR} placeholders that are resolved from environment variables at runtime:
mongodb:
uri: "${MONGODB_URI}"
database: "webprotege"
collection: "Users"
batch_size: 500
keycloak:
base_url: "http://webprotege-local.edu/keycloak"
realm: "webprotege"
admin_realm: "master"
client_id: "admin-cli"
username: "${KEYCLOAK_ADMIN_USERNAME}"
password: "${KEYCLOAK_ADMIN_PASSWORD}"
request_delay: 0.1 # seconds between API calls
max_retries: 3
retry_backoff: 2.0Edit the Keycloak URL and non-sensitive settings directly in config.yaml.
Processes all users through filters and transformation without touching Keycloak. Produces full statistics and CSV logs of which users would be skipped and why.
python migrate.py --dry-runBefore running a live migration, make sure you have a Keycloak H2 database backup so you can revert if needed. See Keycloak H2 Database Backup and Restore.
python migrate.pyProgress is saved after every batch. If the script is interrupted (Ctrl+C, crash, etc.), simply re-run it and it picks up where it left off:
python migrate.pypython migrate.py --resetpython migrate.py --config /path/to/my-config.yamlThe migration includes a configurable filter engine that determines which users to migrate and how. Filters are defined in config.yaml and come in three types:
- Exclude filters — user is not migrated if any enabled exclude filter matches (OR logic)
- Disable filters — user is migrated but disabled in Keycloak if any enabled disable filter matches (OR logic)
- Include filters — user must pass all enabled include filters to be migrated (AND logic)
Evaluation order: exclude → disable → include (most restrictive wins).
| Filter | Description |
|---|---|
xss_injection |
Excludes users with HTML/script injection patterns in username, realName, or email |
invalid_email |
Excludes users with empty or structurally invalid email addresses (bad characters, consecutive dots, non-ASCII, etc.) |
username_too_long |
Excludes usernames exceeding a configurable max length (default: 255) |
regex_username_blocklist |
Excludes usernames matching blocklist patterns (e.g., blank/whitespace-only) |
duplicate_email |
When multiple users share an email, keeps the first alphabetically and skips the rest |
Disable filters are for users you consider real but inactive. They are migrated to Keycloak with enabled: false. No built-in disable filters are provided — add your own in migration/filters/custom_filters.py and reference them in config.yaml with type: disable.
| Filter | Description |
|---|---|
email_domain_whitelist |
Only migrates users from specified email domains |
username_regex_whitelist |
Only migrates usernames matching a regex pattern |
- Write a filter function in
migration/filters/custom_filters.py:
def exclude_test_accounts(doc: dict, params: dict) -> tuple[bool, str]:
username = doc.get("_id", "")
if username.lower().startswith("test_"):
return (True, "Test account")
return (False, "")- Reference it in
config.yaml:
filters:
- name: exclude_test_accounts
enabled: true
type: excludeBefore running the full migration, test with a small subset of users:
- Enable an include filter in
config.yamlto limit scope:
- name: email_domain_whitelist
enabled: true
type: include
params:
domains:
- "your-test-domain.edu"- Run the migration:
python migrate.py-
Verify the migrated users in the Keycloak admin console and test logging in (you should be prompted to set a new password).
-
Once satisfied, disable the include filter, reset progress, and run the full migration:
python migrate.py --resetAll logs are written to the logs/ directory:
| File | Contents |
|---|---|
migration.log |
Full migration log with timestamps, batch progress, and final summary |
skipped_users.csv |
Every skipped user with: username, email, filter name, reason |
failed_users.csv |
Every failed Keycloak API call with: username, email, error detail |
The progress state is saved in migration_state.json at the project root. This file tracks last_processed_id and cumulative counters, enabling resume after interruption.
Keycloak uses an embedded H2 database for its internal state (realms, users, sessions, etc.). Before running a migration, back up this database so you can revert if needed.
The Keycloak container name used below is webprotege-deploy-keycloak-1. Replace it with your actual container name if different.
- Stop Keycloak to ensure a consistent snapshot:
docker stop webprotege-deploy-keycloak-1- Copy the H2 database files from the container to a local backup directory:
mkdir -p keycloak-h2-backup
docker cp webprotege-deploy-keycloak-1:/opt/keycloak/data/h2/. ./keycloak-h2-backup/- Start Keycloak again:
docker start webprotege-deploy-keycloak-1- Stop Keycloak:
docker stop webprotege-deploy-keycloak-1- Copy the backup files back into the container:
docker cp ./keycloak-h2-backup/. webprotege-deploy-keycloak-1:/opt/keycloak/data/h2/- Fix file ownership inside the container.
docker cpsets files toroot, but Keycloak runs as thekeycloakuser:
docker exec -u 0 webprotege-deploy-keycloak-1 chown -R keycloak:keycloak /opt/keycloak/data/h2/- Start Keycloak:
docker start webprotege-deploy-keycloak-1Note: Skipping the
chownstep causes Keycloak to open the database in read-only mode, resulting inJdbcBatchUpdateException: The database is read onlyerrors.
- No passwords migrated — MD5-salted hashes are incompatible with Keycloak. Migrated users set their password via the "Forgot password?" flow on first login. Enable "Forgot password" and "User registration" in Realm Settings → Login in the Keycloak admin console.
- Email as username — The user's email address is used as the Keycloak username. This avoids character-validation issues with MongoDB
_idvalues and provides a familiar login identifier. _idstored aswebprotege_usernameattribute — The MongoDB_idis stored as a Keycloak user attribute (webprotege_username). This attribute serves as the identifier to query the project list that belongs to that user.- Raw field values — The
_idandemailAddressfields are passed through to Keycloak without modification. Altering these values (e.g., stripping characters) could break downstream references such as project ownership. Invalid values are caught by filters before reaching Keycloak. - Credentials via environment variables — Sensitive values (
MONGODB_URI,KEYCLOAK_ADMIN_USERNAME,KEYCLOAK_ADMIN_PASSWORD) are read from environment variables using${VAR}placeholders inconfig.yaml. - Sequential API calls with throttle — Keycloak has no bulk user creation endpoint. A configurable delay (default 0.1s) between calls prevents overloading the server. Estimated full migration time at default rate: ~5 hours for 170K users.
- Resume via state file — The migration cursor is sorted by
_id. After each batch, the last processed_idis saved. On restart, a$gtfilter skips already-processed records. - Direct
requestslibrary — No dependency onpython-keycloak. Simpler, fewer version compatibility issues with Keycloak 26.1.