STEM Gaussian Simulator
Phenomenological Gaussian Projection Model
Overview
Physics-informed simulator generating synthetic STEM nanocluster images with deterministic reproducibility for machine learning research.
Machine learning research on nanocluster microscopy requires large labeled datasets with known ground truth, but experimental data is expensive and ground truth is often ambiguous. The system must generate physically-plausible synthetic STEM images that are deterministic and reproducible.
The Phenomenological Gaussian Projection Model
The simulator uses a simplified physics model where each atomic column contributes a 2D Gaussian to the final projection. This approach balances physical plausibility with computational efficiency.
Key Parameters
- σ (sigma)
- Gaussian width controlling spatial spread
- α (alpha)
- Amplitude parameter for intensity scaling
- pixel_size
- Physical spacing in image space (Å/pixel)
- noise_preset
- Noise model (Poisson, Gaussian, or mixed)
All parameters are tracked in manifest files ensuring full reproducibility of generated datasets.
Parameter Effects
The σ (width) and α (contrast) parameters control the appearance of simulated images. Larger σ creates more diffuse features, while higher α increases Z-contrast between atoms.
4×4 parameter grid: σ ∈ {0.3, 0.6, 0.9, 1.2} × α ∈ {0.5, 1.0, 1.5, 2.0}
Structural Classes
The simulator generates projections for four distinct Au55 nanocluster geometries, each with different symmetry properties and energy landscapes.
Icosahedral (Ih) is the lowest energy structure for Au55, but cuboctahedral (Oh) and decahedral (D5h) are also thermally accessible. Garzon structures represent irregular geometries with lower symmetry.
Reproducibility Guarantee
Every generated image is deterministically reproducible using the manifest-based contract system. The same parameters and seed will always produce bit-identical outputs across platforms.
Example Manifest Entry
{
"image_id": "Au55_icosahedron_001",
"structure_id": "Au55",
"geometry": "icosahedron",
"orientation_seed": 42,
"simulation_params": {
"sigma": 0.6,
"alpha": 1.0,
"pixel_size": 0.2,
"noise_preset": "poisson_low"
},
"dataset_version": "nc_projection_gaussian_v1",
"git_tag": "paper1-freeze",
"created_at": "2024-01-15T10:30:00Z"
}5c429dd64446b2ed37fad0b6b99490588be1c745Submodule hash at paper1-freeze tag
Dataset Version: Each dataset is immutable and versioned (e.g., nc_projection_gaussian_v1)
Git Tag: Code version is pinned (paper1-freeze) ensuring the exact simulator implementation can be retrieved
Seed Tracking: RNG seed stored per image enables exact regeneration of random noise patterns
System Architecture
Phenomenological Gaussian projection model that simulates atomic column contributions with configurable parameters (σ, α, pixel size, noise preset). Uses seeded random number generation for deterministic reproducibility and manifest-based dataset tracking.
Air-Gapped Design
- Simulation Layer: Physics-based image generation (NumPy, SciPy, ASE) — no ML dependencies
- ML Layer: CNN training and evaluation (PyTorch) — no simulation code imports
- Interface: File system only — manifests and NumPy arrays in
data/processed/datasets/
This separation allows independent evolution of physics and ML components while maintaining a clean contract.
Validation & Results
- Simplified physics model may not capture all experimental artifacts and detector noise patterns.
- Limited to projection geometry; does not model full 3D dynamic scattering.
- Requires domain expertise to select physically reasonable parameter ranges.
Future Directions
- →Validate against experimental STEM data to quantify realism gap.
- →Add dynamic scattering effects for more accurate physics.
- →Expand noise models beyond Poisson/Gaussian to capture detector artifacts.
Reproduce Locally
Clone the repository with submodules and checkout the exact tag used for this showcase:
# Clone with submodules git clone --recurse-submodules https://github.com/ksbk/drkusse.git cd drkusse/external/nanocluster-cnn # Checkout pinned tag git checkout paper1-freeze # Install dependencies make install-dev # Generate demo dataset (instant) nc-make-demo # Or generate custom dataset python scripts/datasets/generate_projection_gaussian.py \ --xyz-dir data/raw/structures \ --output-dir data/processed/datasets/my_dataset \ --sigma 0.6 --alpha 1.0 --pixel-size 0.2 \ --seed 42
See the README for full documentation.
Quick Facts
- Type
- Research Tool
- Status
- In Development
- Version
- paper1-freeze
- License
- MIT
Technologies
This is a submodule integration showcasing the research project. The simulator runs independently in the nanocluster-cnn repository.