Skip to main content

Happy New Year 2026 — wishing you good health and a positive year ahead. — Dr. Kusse Sukuta Bersha (PhD)

Research Project

STEM Gaussian Simulator

Phenomenological Gaussian Projection Model

Back to projects

Overview

Physics-informed simulator generating synthetic STEM nanocluster images with deterministic reproducibility for machine learning research.

Machine learning research on nanocluster microscopy requires large labeled datasets with known ground truth, but experimental data is expensive and ground truth is often ambiguous. The system must generate physically-plausible synthetic STEM images that are deterministic and reproducible.

Note: This is a phenomenological proxy for ML benchmarking, not multislice simulation. The Gaussian model approximates projected intensities for controlled experiments.

The Phenomenological Gaussian Projection Model

The simulator uses a simplified physics model where each atomic column contributes a 2D Gaussian to the final projection. This approach balances physical plausibility with computational efficiency.

Key Parameters

σ (sigma)
Gaussian width controlling spatial spread
α (alpha)
Amplitude parameter for intensity scaling
pixel_size
Physical spacing in image space (Å/pixel)
noise_preset
Noise model (Poisson, Gaussian, or mixed)

All parameters are tracked in manifest files ensuring full reproducibility of generated datasets.

Parameter Effects

The σ (width) and α (contrast) parameters control the appearance of simulated images. Larger σ creates more diffuse features, while higher α increases Z-contrast between atoms.

Parameter sweep showing σ × α variations in Gaussian PSF model

4×4 parameter grid: σ ∈ {0.3, 0.6, 0.9, 1.2} × α ∈ {0.5, 1.0, 1.5, 2.0}

Structural Classes

The simulator generates projections for four distinct Au55 nanocluster geometries, each with different symmetry properties and energy landscapes.

Four structural classes: Cuboctahedral, Icosahedral, Decahedral, and Garzon

Icosahedral (Ih) is the lowest energy structure for Au55, but cuboctahedral (Oh) and decahedral (D5h) are also thermally accessible. Garzon structures represent irregular geometries with lower symmetry.

Reproducibility Guarantee

Every generated image is deterministically reproducible using the manifest-based contract system. The same parameters and seed will always produce bit-identical outputs across platforms.

Example Manifest Entry

{
  "image_id": "Au55_icosahedron_001",
  "structure_id": "Au55",
  "geometry": "icosahedron",
  "orientation_seed": 42,
  "simulation_params": {
    "sigma": 0.6,
    "alpha": 1.0,
    "pixel_size": 0.2,
    "noise_preset": "poisson_low"
  },
  "dataset_version": "nc_projection_gaussian_v1",
  "git_tag": "paper1-freeze",
  "created_at": "2024-01-15T10:30:00Z"
}
SHA256
5c429dd64446b2ed37fad0b6b99490588be1c745

Submodule hash at paper1-freeze tag

Dataset Version: Each dataset is immutable and versioned (e.g., nc_projection_gaussian_v1)

Git Tag: Code version is pinned (paper1-freeze) ensuring the exact simulator implementation can be retrieved

Seed Tracking: RNG seed stored per image enables exact regeneration of random noise patterns

System Architecture

Phenomenological Gaussian projection model that simulates atomic column contributions with configurable parameters (σ, α, pixel size, noise preset). Uses seeded random number generation for deterministic reproducibility and manifest-based dataset tracking.

Air-Gapped Design

  • Simulation Layer: Physics-based image generation (NumPy, SciPy, ASE) — no ML dependencies
  • ML Layer: CNN training and evaluation (PyTorch) — no simulation code imports
  • Interface: File system only — manifests and NumPy arrays in data/processed/datasets/

This separation allows independent evolution of physics and ML components while maintaining a clean contract.

Validation & Results

Reproducibility
Deterministic (seeded RNG)
Bit-level identical outputs with same seed.
Dataset version
paper1-freeze
Pinned git tag for publication.
Parameter tracking
Manifest-based
JSON metadata for every generated image.
Limitations
  • Simplified physics model may not capture all experimental artifacts and detector noise patterns.
  • Limited to projection geometry; does not model full 3D dynamic scattering.
  • Requires domain expertise to select physically reasonable parameter ranges.

Future Directions

  • Validate against experimental STEM data to quantify realism gap.
  • Add dynamic scattering effects for more accurate physics.
  • Expand noise models beyond Poisson/Gaussian to capture detector artifacts.

Reproduce Locally

Clone the repository with submodules and checkout the exact tag used for this showcase:

# Clone with submodules
git clone --recurse-submodules https://github.com/ksbk/drkusse.git
cd drkusse/external/nanocluster-cnn

# Checkout pinned tag
git checkout paper1-freeze

# Install dependencies
make install-dev

# Generate demo dataset (instant)
nc-make-demo

# Or generate custom dataset
python scripts/datasets/generate_projection_gaussian.py \
  --xyz-dir data/raw/structures \
  --output-dir data/processed/datasets/my_dataset \
  --sigma 0.6 --alpha 1.0 --pixel-size 0.2 \
  --seed 42

See the README for full documentation.

Gaussian projection model transforms atomic coordinates into synthetic STEM images with configurable noise and parameters.

Quick Facts

Type
Research Tool
Status
In Development
Version
paper1-freeze
License
MIT

Technologies

Computer VisionSimulationReproducibilityResearch

This is a submodule integration showcasing the research project. The simulator runs independently in the nanocluster-cnn repository.