This is a lightweight proof-of-concept tool for summarising the expression data on a gene page from VEuPathDB databases using OpenAI's GPT-4o model or Anthropic's Claude 4 Sonnet.
- API key for your chosen model:
- OpenAI: add
OPENAI_API_KEY=xxxxxxxxxxxxxto a file called.envin this directory - Anthropic: add
ANTHROPIC_API_KEY=xxxxxxxxxxxxxto a file called.envin this directory
- OpenAI: add
- volta if possible: https://docs.volta.sh/guide/getting-started
- it takes care of your node and yarn versions
- node - 18.20.5 tested (higher versions will likely work)
- yarn - 1.22.19 tested
yarnIf you have volta installed, this will also make sure you have the right versions of node and yarn, otherwise you'll need to install those manually if you run into any issues.
yarn buildThis compiles src/main.ts into dist/main.js
You can run the script with any gene ID from supported VEuPathDB databases:
With OpenAI GPT-4o (default):
node dist/main.js PlasmoDB PF3D7_1016300With Claude 4 Sonnet:
node dist/main.js PlasmoDB PF3D7_1016300 --claudeSupported databases: PlasmoDB, VectorBase, ToxoDB, CryptoDB, FungiDB, GiardiaDB, TrichDB, AmoebaDB, MicrosporidiaDB, PiroplasmaDB, TriTrypDB
Note: Use node dist/main.js directly instead of yarn start when using the --claude flag, as npm/yarn scripts don't pass through additional arguments.
It will output three files in the example-output directory:
GENE_ID.01.MODEL.summaries.json- the per experiment AI summaries (JSON)GENE_ID.01.MODEL.summary.json- the AI summary-of-summaries and grouping (JSON)GENE_ID.01.MODEL.summary.html- a nice HTML version of the summary
Where MODEL is either OpenAI or Claude depending on which API you used.
To view the HTML open it as a local file in your web browser (Ctrl-O usually).
You can commit any generated files to the repo if you like (within reason)!
- OpenAI API key
- add
OPENAI_API_KEY=xxxxxxxxxxxxxto a file called.envin this directory
- add
- Docker installed on your system
To build the Docker image, use the following command:
docker build -t expression-shepherd .To start a container from the image and get a shell.
The command below "mounts" ./example-output inside the container so any outputs will be seen in the host filesystem too.
docker run -d --rm --env-file .env -v $(pwd)/example-output:/app/example-output expression-shepherd shThe container will be removed when you exit the shell. (But not the image.)
If the container is already running but you need a new shell:
docker ps
# find the CONTAINER_ID
docker exec -it --env-file .env <CONTAINER_ID> shYou can then manually run the script (see step 3. in the non-Docker section above):
node dist/main.js PlasmoDB PF3D7_0818900Or you can just run the script at container launch time:
docker run -d --rm --env-file .env -v $(pwd)/example-output:/app/example-output expression-shepherd node dist/main.js PlasmoDB PF3D7_0818900Or like this in an already running container:
docker exec -it --env-file .env <CONTAINER_ID> node dist/main.js PlasmoDB PF3D7_0818900Note that volta is not available in the node container but it does have suitable versions of node and yarn installed anyway.