Skip to content

billybasass/vector32-3d

Repository files navigation

Vector32-3D: Vector-Based, 32-Bit-Color Image-to-3D Pipeline

Overview

This project aims to build an AI pipeline that converts 2D images (with 32-bit color) into 3D models using vector-based geometry (curves) and neural implicit surfaces (SDF), outputting high-fidelity, HDR-ready 3D assets in USD/glTF format.

Features

  • Input: High-res raster images (PNG/EXR, 32-bit color)
  • Supports multi-image (multi-view) input for each 3D object
  • Vectorize silhouettes to SVG curves
  • Dual encoders: CNN for raster, MLP/Transformer for curves
  • Neural SDF for geometry, color head for 32-bit RGBA
  • Differentiable rendering (PyTorch3D/NVDiffRast/redner)
  • Losses: photometric, silhouette IoU, eikonal, curve reconstruction, total variation
  • Progressive training and validation
  • Export watertight mesh with 32-bit textures (PNG/EXR)
  • Package as USD/glTF

Project Structure

vector32-3D/
│
├── data/                # Datasets: images, SVGs, poses
├── vectorization/       # SVG extraction and cleaning
├── model/               # Encoders, SDF, color head, losses
├── renderer/            # Differentiable rendering
├── training/            # Training and validation scripts
├── export/              # Mesh extraction, UV unwrapping, texture baking
├── packaging/           # Export to USD/glTF
├── research/            # References and papers
└── README.md

Roadmap

  1. Define 3D representation (curves + SDF + 32-bit RGBA)
  2. Prepare dataset: raster images, SVG curves, camera poses
    • Ensure dataset supports multiple images per object (multi-view)
  3. Vectorization pipeline (SVG extraction/cleaning)
  4. Dual-encoder model (CNN + MLP/Transformer)
    • Design model to accept and aggregate features from multiple images/views
  5. SDF decoder and color head
  6. Differentiable renderer integration
  7. Loss functions
  8. Training loop
    • Implement batching and aggregation for multi-image input
  9. Validation
    • Validate on held-out multi-view images
  10. Mesh extraction and UV unwrapping
  11. Texture baking and export
  12. Packaging (USD/glTF)
  13. Research tracking

References

  • diffvg, PyTorch3D, NVDiffRast, redner, NeuS, SIREN, EG3D, Illustration2VecSDF, OpenEXR, xatlas, Blender API

Getting Started

  1. Clone the repo and install dependencies (see requirements.txt).

  2. Prepare your dataset in data/.

    • Organize images so each object has multiple views (multi-image support)
    • If you use Blender you can run the blenderEXRdatasetscript.py to help make your dataset. To use it you need to edit the asset folder and output root in the script to the folder you are using.
    image

    After that you can run the script using

    "file path to your Blender.exe" --background --python "file path to blenderEXRdatasetscript.py"
    
  3. Follow the roadmap to implement each module.

License

MIT

Multi-View Dataset Pipeline

Overview

This pipeline provides a robust, extensible framework for loading, validating, and processing multi-view 3D datasets with 32-bit RGBA images, camera metadata, and downstream integration for vectorization, neural SDF, and asset schema workflows.

Architecture/Data Flow

Images/Metadata → [Image Loader] → [Camera Metadata Parser] → [View Alignment] → [Consistency Checker] → [Pipeline Integration] → [Vectorization/SDF/Asset Schema]

Module Usage

Image Loader (data/image_loader.py)

  • Supports 32-bit RGBA PNG and EXR files
  • Usage:
    from data.image_loader import ImageLoader
    arr = ImageLoader.load_image('view_000.exr')

EXR Image (data/exr_image.py)

  • Dedicated EXR loader/writer for 32-bit float RGBA
  • Usage:
    from data.exr_image import EXRImage
    arr = EXRImage.load('img.exr')
    EXRImage.save('out.exr', arr)

Camera Metadata Parser (data/camera_metadata.py)

  • Loads/validates camera intrinsics/extrinsics from JSON/CSV
  • Usage:
    from data.camera_metadata import CameraMetadataParser
    entries = CameraMetadataParser.load('cameras.json')

View Alignment (data/view_alignment.py)

  • Normalizes/aligns camera poses to canonical frame
  • Usage:
    from data.view_alignment import ViewAlignment
    aligned_Rs, aligned_ts = ViewAlignment.align_cameras(Rs, ts)

Consistency Checker (data/consistency_checker.py)

  • Validates image/metadata count, pose consistency, outliers
  • Usage:
    from data.consistency_checker import ConsistencyChecker
    ok = ConsistencyChecker.run_all_checks(images, metadata, 'cameras.json')

Pipeline Integration (data/pipeline_integration.py)

  • Connects validated data to vectorization, SDF, asset schema
  • Usage:
    from data.pipeline_integration import PipelineIntegration
    integration = PipelineIntegration(asset)
    views = integration.get_all_views()

Sample Directory Structure

dataset_root/
  asset_001/
    images/
      view_000.exr
      view_001.exr
    cameras.json
    curves.json
    metadata.json

Metadata Format Examples

cameras.json

[
  {
    "view_id": "0",
    "image_path": "images/view_000.exr",
    "intrinsics": [[1,0,0],[0,1,0],[0,0,1]],
    "extrinsics": {"R": [[1,0,0],[0,1,0],[0,0,1]], "t": [0,0,0]}
  }
]

cameras.csv

view_id,image_path,K_00,K_01,K_02,K_10,K_11,K_12,K_20,K_21,K_22,R_00,R_01,R_02,R_10,R_11,R_12,R_20,R_21,R_22,t_0,t_1,t_2
0,images/view_000.exr,1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0

Downstream Integration

  • Vectorization: Use to_vectorization_format() from PipelineIntegration for curve extraction.
  • SDF Training: Use to_sdf_training_format() for neural SDF workflows.
  • Asset Schema: All data is compatible with MultiViewAsset for export and further processing.

Running Validation Tests

  • Run all tests:
    python -m data.run_tests
  • All modules will report PASS/FAIL. See logs for error details.
  • Troubleshooting: Check for missing files, shape mismatches, or metadata errors in logs.

Extending the Pipeline

  • Add new image formats: Extend ImageLoader and EXRImage.
  • Add validation rules: Update ConsistencyChecker.
  • Add downstream modules: Implement new adapters in PipelineIntegration.
  • Update documentation and tests for all changes.

Onboarding & Contribution

  • All modules are documented with docstrings and usage examples.
  • See docs/ for architecture diagrams and advanced usage.
  • New contributors can get started in 30 minutes by following this README and running the test suite.

For more details, see code docstrings and the docs/ directory.

ColorField API Documentation

Purpose

ColorField is a modular, numpy-based UV space color predictor for 32-bit RGBA and HDR (float32) color fields. It is designed for research extensibility, robust input validation, and high-fidelity 3D asset generation.

Constructor Parameters

  • input_dim (int): Input dimension (default: 2 for UV).
  • output_dim (int): Output dimension (default: 4 for RGBA).
  • hidden_layers (List[int]): Hidden layer sizes (default: [64, 64]).
  • activation (Callable): Activation function (default: np.tanh).
  • use_hdr (bool): If True, output is float32 (HDR); else, output is clamped to [0,1] (LDR).
  • logger (logging.Logger): Optional logger for debug/info/warning messages.
  • seed (int): Optional random seed for reproducibility.

Main Methods

  • forward(uv: np.ndarray) -> np.ndarray: Predict RGBA color for given UV coordinates.
  • predict_multi_view(uv_list: List[np.ndarray]) -> np.ndarray: Aggregate predictions from multiple UV arrays for multi-view consistency.
  • scale_texture(colors: np.ndarray, target_shape: tuple) -> np.ndarray: Scale predicted color texture to a new resolution (nearest neighbor).
  • add_custom_head(func: Callable): Add a custom color head for research (e.g., spectral mapping).

Usage Examples

from model.colorfield import ColorField
import numpy as np

cf = ColorField(input_dim=2, output_dim=4, use_hdr=False, seed=42)
uv = np.random.rand(10, 2).astype(np.float32)
colors = cf.forward(uv)

# Multi-view consistency
uv1 = np.random.rand(10, 2).astype(np.float32)
uv2 = np.random.rand(10, 2).astype(np.float32)
agg_colors = cf.predict_multi_view([uv1, uv2])

# Texture scaling
tex = np.random.rand(32, 32, 4).astype(np.float32)
scaled = cf.scale_texture(tex, (64, 64))

# Extensibility: custom color head
cf.add_custom_head(lambda x: np.ones_like(x) * 0.5)

Configuration Options

  • LDR vs HDR: Set use_hdr=True for float32 output (HDR, unclamped), or False for [0,1] clamped output (LDR).
  • Extensibility: Use add_custom_head to experiment with new color field heads (e.g., neural texture compression, spectral color fields).
  • Logging: Pass a custom logger or use the default for debug/info/warning output.

Testing

Unit and integration tests for ColorField are in model/test_model_colorfield.py. Run with:

python -m unittest model/test_model_colorfield.py

Tests cover:

  • Color prediction (LDR/HDR)
  • Multi-view consistency
  • Input validation (shapes/types/ranges)
  • Extensibility (custom head)
  • Texture scaling

Research Paper Logging API

The training progress tracking system now supports logging research paper citations and metadata via the VisualizationManager:

API

VisualizationManager.log_research_paper(paper: Dict[str, Any])
  • paper: Dictionary with fields such as title, authors, doi, bibtex, pdf_url, notes, etc.
  • Each backend (JSON, CSV, REST, etc.) will store or transmit the citation appropriately.

Example Usage

paper = {
    'title': 'A Great Paper',
    'authors': 'Doe, J.; Smith, A.',
    'doi': '10.1234/example.doi',
    'bibtex': '@article{doe2024, ...}',
    'pdf_url': 'http://example.com/paper.pdf',
    'notes': 'Key reference for method.'
}
manager.log_research_paper(paper)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors