A software toolkit for the interconversion of standard data models for phenotypic data
📘 Documentation: https://cnag-biomedical-informatics.github.io/convert-pheno
📓 Google Colab tutorial: https://colab.research.google.com/drive/1T6F3bLwfZyiYKD6fl1CIxs9vG068RHQ6?usp=sharing
📦 CPAN Distribution: https://metacpan.org/pod/Convert::Pheno
🐳 Docker Hub Image: https://hub.docker.com/r/manuelrueda/convert-pheno/tags
🌐 Web App UI: https://convert-pheno.cnag.cat
Convert-Pheno is a toolkit for interconverting standard clinical and phenotypic data models
Supported formats include BFF, PXF, OMOP CDM, REDCap, CDISC-ODM, CSV, and experimental openEHR canonical input
Typical CLI usage:
convert-pheno -ipxf pxf.json -obff individuals.json
convert-pheno -ipxf pxf.json -obff --entities individuals biosamples datasets cohorts --out-dir out/
convert-pheno -ibff individuals.json -opxf phenopackets.json
convert-pheno -iomop dump.sql.gz -obff individuals.json.gz --stream --ohdsi-dbFor backward compatibility, the -iomop ... -obff form still keeps the individuals-only BFF output behavior.
Note: openEHR support is currently experimental and currently limited to canonical composition input with BFF or PXF output.
See the CLI documentation for the current experimental openEHR usage details.
Internally, most conversions use BFF as the center model before continuing to other output formats when needed.
BFF output can now be entity-aware through --entities.
Current support:
individualsas the default BFF output entitybiosamplesas first-class BFF output for-ipxfwhen the input contains biosample datadatasetsandcohortssynthesized from the normalizedindividualscollection
Example:
convert-pheno -ipxf pxf.json -obff --entities individuals biosamples datasets cohorts --out-dir out/This can write:
out/individuals.jsonout/biosamples.jsonout/datasets.jsonout/cohorts.json
For mapping-file workflows such as csv2bff, redcap2bff, and cdisc2bff, synthesized datasets and cohorts can be customized through the top-level beacon section of the mapping file
Mapping-file based tabular conversions now use an entity-aware layout
projectkeeps project-level metadatabeacon.individualscontains the semantic mapping rules for Beaconindividualsbeacon.datasets,beacon.cohorts, andbeacon.biosamplescan provide metadata or defaults for emitted Beacon entities
This makes the mapping structure consistent with multi-entity BFF output while keeping individuals as the central normalized model
Useful recent options include:
--default-vital-statusto control the fallbacksubject.vitalStatus.statusinbff2pxf--search-audit-tsvto write a TSV report of ontology lookup results for mapping-file conversions- generic
-i/-osyntax in addition to the format-specific shortcuts --out-name key=fileto customize filenames in multi-file BFF or OMOP output
Detailed installation instructions live in dedicated Markdown docs:
Repository installs that run cpanm --installdeps . may also need system
libraries such as libssl-dev for the SSL/JSONLD dependency chain.
Published documentation:
The CLI now keeps concise built-in help in bin/convert-pheno.
Long-form CLI documentation lives in Markdown:
Repository fixtures under t/ double as runnable examples.
Useful examples:
bin/convert-pheno -ipxf t/pxf2bff/in/pxf.json -obff individuals.json
bin/convert-pheno -ipxf t/pxf2bff/in/pxf_biosamples.json -obff --entities individuals biosamples datasets cohorts --out-dir out/
bin/convert-pheno -icsv t/csv2bff/in/csv_data.csv --mapping-file t/csv2bff/in/csv_mapping.yaml --search-audit-tsv search-audit.tsv -obff individuals.json
bin/convert-pheno -ibff t/bff2pxf/in/individuals.json -opxf phenopackets.json --default-vital-status UNKNOWN_STATUS
bin/convert-pheno -iomop t/omop2bff/in/omop_cdm_eunomia.sql -opxf phenopackets.json
bin/convert-pheno -iomop t/omop2bff/in/gz/omop_cdm_eunomia.sql.gz -obff individuals.json.gz --stream --omop-tables DRUG_EXPOSUREIf you use Convert-Pheno in published work, please cite:
Rueda, M et al. (2024). Convert-Pheno: A software toolkit for the interconversion of standard data models for phenotypic data. Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2023.104558
Manuel Rueda, PhD. CNAG: https://www.cnag.eu

