Skip to content

Good first issue: add a DuckDB or Polars recipe for the JSONL.GZ dataset #2

@generatedgallerybot

Description

@generatedgallerybot

Goal

Add a tiny downstream analysis recipe so people can actually use the dataset without writing boilerplate.

Input

Use one of:

Suggested output

A short example under examples/ that shows how to:

  • load the compressed JSONL
  • count top styles/subjects/model families
  • sample records by label
  • print prompt + source URL for inspection

DuckDB, Polars, or plain Python are all fine.

Guardrails

  • No paid APIs.
  • No secrets.
  • Include the media-rights caveat if touching image URLs.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions