StructureGenerator & generate()#
- class pasted._generator.StructureGenerator(*, n_atoms: int, charge: int, mult: int, mode: str = 'gas', region: str | None = None, branch_prob: float = 0.3, chain_persist: float = 0.5, chain_bias: float = 0.0, bond_range: tuple[float, float] = (1.2, 1.6), center_z: int | None = None, coord_range: tuple[int, int] = (4, 8), shell_radius: tuple[float, float] = (1.8, 2.5), elements: str | list[str] | None = None, element_fractions: dict[str, float] | None = None, element_min_counts: dict[str, int] | None = None, element_max_counts: dict[str, int] | None = None, cov_scale: float = 1.0, relax_cycles: int = 1500, maxent_steps: int = 300, maxent_lr: float = 0.05, maxent_cutoff_scale: float = 2.5, trust_radius: float = 0.5, convergence_tol: float = 0.001, add_hydrogen: bool = True, n_samples: int = 1, n_success: int | None = None, seed: int | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cutoff: float | None = None, filters: list[str] | None = None, verbose: bool = False)[source]#
Bases:
objectGenerate random atomic structures with disorder metrics.
All parameters use Python snake_case names that correspond 1-to-1 with their CLI
--flagcounterparts.- Parameters:
n_atoms – Number of atoms per structure (before optional H augmentation).
charge – Total system charge (applied to every structure).
mult – Spin multiplicity 2S+1.
mode – Placement mode:
"gas"(default),"chain", or"shell".region – [gas] Region spec:
"sphere:R"|"box:L"|"box:LX,LY,LZ". Required when mode=”gas”.branch_prob – [chain] Branching probability (default: 0.3).
chain_persist – [chain] Directional persistence ∈ [0, 1] (default: 0.5).
chain_bias – [chain] Global-axis drift strength ∈ [0, 1] (default: 0.0). The direction of the first bond becomes the bias axis; each subsequent step is blended toward that axis before normalisation. 0.0 → no bias (backwards-compatible); higher values produce more elongated structures with larger
shape_aniso.bond_range – [chain / shell tails] Bond-length range in Å (default:
(1.2, 1.6)).center_z – [shell] Atomic number of center atom.
None→ random per sample.coord_range – [shell] Coordination-number range (default:
(4, 8)).shell_radius – [shell] Shell-radius range in Å (default:
(1.8, 2.5)).elements – Element pool. A spec string such as
"1-30"or"6,7,8", an explicit list of element symbols, orNonefor all Z = 1–106.element_fractions – Relative sampling weights for elements in the pool, as a
{symbol: weight}dict (e.g.{"C": 0.5, "N": 0.3, "O": 0.2}). Weights are relative — they are normalised internally and need not sum to 1. Elements absent from the dict receive a weight of 1.0. WhenNone(default), every element in the pool is sampled with equal probability.element_min_counts – Minimum number of atoms per element guaranteed in every generated structure (e.g.
{"C": 2, "N": 1}). The required atoms are placed first; remaining slots are filled by weighted random sampling.None(default) → no lower bounds. The sum of all minimum counts must not exceedn_atoms.element_max_counts –
Maximum number of atoms allowed per element (e.g.
{"N": 5, "O": 3}). Elements that have reached their cap are excluded from sampling for the remaining slots.None(default) → no upper bounds.Note
When both element_min_counts and element_max_counts are given, each element’s min must be ≤ its max.
Note
The automatic hydrogen augmentation step (
add_hydrogen=True) runs after the constrained sampling and may temporarily exceed element_max_counts for H. Setadd_hydrogen=Falseif H count limits are critical.cov_scale – Minimum-distance scale factor:
d_min(i,j) = cov_scale × (r_i + r_j)using Pyykkö (2009) single-bond covalent radii. Default:1.0.relax_cycles – Maximum repulsion-relaxation iterations (default: 1500).
add_hydrogen – Automatically append H atoms when H is in the pool but the sampled composition contains none (default:
True).n_samples – Maximum number of placement attempts (default: 1). Use
0to allow unlimited attempts (only valid when n_success is also set, otherwise aValueErroris raised).n_success –
Target number of structures that must pass all filters before generation stops (default:
None).None→ generate exactly n_samples attempts and return all that passed (original behaviour).N > 0withn_samples > 0→ stop as soon as N structures pass or n_samples attempts are exhausted, whichever comes first. Returns the structures collected so far with a warning if fewer than N were found.N > 0withn_samples = 0→ unlimited attempts; stop only when N structures have passed.
seed – Random seed for reproducibility (
None→ non-deterministic).n_bins – Histogram bins for
H_spatialandRDF_dev(default: 20).w_atom – Weight of
H_atominH_total(default: 0.5).w_spatial – Weight of
H_spatialinH_total(default: 0.5).cutoff – Distance cutoff in Å for Steinhardt and graph metrics.
None→ auto-computed ascov_scale × 1.5 × median(r_i + r_j)over the element pool.filters – Filter strings of the form
"METRIC:MIN:MAX"(use"-"for an open bound). Only structures satisfying all filters are returned.verbose – Print progress and statistics to stderr (default:
False). The CLI always passesTrue; library callers usually leave it off.
Examples
Class API:
from pasted import StructureGenerator gen = StructureGenerator( n_atoms=12, charge=0, mult=1, mode="gas", region="sphere:9", elements="1-30", n_samples=50, seed=42, filters=["H_total:2.0:-"], ) structures = gen.generate() for s in structures: print(s)
Functional API:
from pasted import generate structures = generate( n_atoms=12, charge=0, mult=1, mode="chain", elements="6,7,8", n_samples=20, seed=0, )
- generate() GenerationResult[source]#
Generate structures and return a
GenerationResult.Collects all structures yielded by
stream(), attaches generation metadata (attempt counts, rejection breakdowns), and returns aGenerationResultthat behaves like alist[Structure]in all normal usage while also carrying the diagnostics needed for automated pipelines.GenerationResultsupports the fulllistinterface (indexing, iteration,len,bool) so existing code that doesresult[0]orfor s in resultcontinues to work without modification.Warnings are also emitted via
warnings.warn()(categoryUserWarning) when:Any attempts are rejected by the charge/multiplicity parity check.
No structures pass the metric filters.
The attempt budget is exhausted before
n_successis reached.
Each call creates a fresh
random.Randomseeded withself.seed, so repeated calls with the same seed are reproducible.- Returns:
Wraps the list of passing structures together with generation metadata. Use
result.structuresfor the raw list orresult.summary()for a one-line diagnostic string.- Return type:
GenerationResult
Examples
Drop-in list usage:
result = gen.generate() for s in result: print(s.to_xyz())
Metadata access:
result = gen.generate() if result.n_rejected_parity > 0: print(result.summary())
- stream() Iterator[Structure][source]#
Generate structures one by one, yielding each that passes all filters.
Unlike
generate(), structures are yielded immediately as they pass, so callers can write output or stop early without waiting for all attempts to complete.Respects both n_samples (maximum attempts) and n_success (target number of passing structures):
If n_success is set, the iterator stops as soon as that many structures have been yielded — even if n_samples attempts have not been exhausted.
If n_samples is
0(unlimited), the iterator runs until n_success structures have been yielded.If n_samples attempts are exhausted before n_success is reached, a warning is emitted to stderr and the iterator ends.
Each call creates a fresh
random.Randomseeded withself.seed, so repeated calls with the same seed are reproducible.- Yields:
Structure – Each structure that passed all filters, in generation order.
Examples
Write structures to a file as they are found:
gen = StructureGenerator( n_atoms=12, charge=0, mult=1, mode="gas", region="sphere:9", elements="1-30", n_success=10, n_samples=500, seed=42, ) for s in gen.stream(): s.write_xyz("out.xyz")
- pasted._generator.generate(*, n_atoms: int, charge: int, mult: int, mode: str = 'gas', region: str | None = None, branch_prob: float = 0.3, chain_persist: float = 0.5, chain_bias: float = 0.0, bond_range: tuple[float, float] = (1.2, 1.6), center_z: int | None = None, coord_range: tuple[int, int] = (4, 8), shell_radius: tuple[float, float] = (1.8, 2.5), elements: str | list[str] | None = None, element_fractions: dict[str, float] | None = None, element_min_counts: dict[str, int] | None = None, element_max_counts: dict[str, int] | None = None, cov_scale: float = 1.0, relax_cycles: int = 1500, maxent_steps: int = 300, maxent_lr: float = 0.05, maxent_cutoff_scale: float = 2.5, trust_radius: float = 0.5, convergence_tol: float = 0.001, add_hydrogen: bool = True, n_samples: int = 1, n_success: int | None = None, seed: int | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cutoff: float | None = None, filters: list[str] | None = None, verbose: bool = False) GenerationResult[source]#
Create a
StructureGeneratorand immediately callgenerate().All parameters are forwarded unchanged. See
StructureGeneratorfor full documentation.- Returns:
A list-compatible object containing the structures that passed all filters plus metadata about the generation run (attempt counts, rejection breakdowns). Behaves identically to
list[Structure]in all normal usage (indexing, iteration,len,bool).UserWarningis raised whenever:attempts are rejected by the charge/multiplicity parity check,
no structures pass the metric filters, or
the attempt budget is exhausted before
n_successis reached.
- Return type:
GenerationResult
Examples
Drop-in list usage:
from pasted import generate # 20 random gas-phase structures drawn from C/N/O structures = generate( n_atoms=10, charge=0, mult=1, mode="gas", region="sphere:8", elements="6,7,8", n_samples=20, seed=0, ) for i, s in enumerate(structures): s.write_xyz("out.xyz", append=(i > 0))
Inspecting rejection metadata:
result = generate( n_atoms=10, charge=0, mult=1, mode="gas", region="sphere:8", elements="6,7,8", n_samples=50, seed=0, filters=["H_total:1.5:-"], ) print(result.summary()) # e.g. "passed=3 attempted=50 rejected_parity=0 rejected_filter=47"
Structure#
- class pasted._generator.Structure(atoms: list[str], positions: list[tuple[float, float, float]], charge: int, mult: int, metrics: dict[str, float], mode: str, sample_index: int = 0, center_sym: str | None = None, seed: int | None = None)[source]
Bases:
objectA single generated atomic structure with its computed disorder metrics.
- positions
Cartesian coordinates in Å, one
(x, y, z)tuple per atom.
- charge
Total system charge.
- Type:
- mult
Spin multiplicity 2S+1.
- Type:
- mode
Placement mode used (
"gas","chain", or"shell").- Type:
- sample_index
1-based index within the batch of structures that passed filters.
- Type:
- center_sym
Element symbol of the shell center atom (shell mode only).
- Type:
str | None
- seed
Random seed used for generation (
Noneif unseeded).- Type:
int | None
- charge: int
- classmethod from_xyz(source: str | Path, *, frame: int = 0, recompute_metrics: bool = True, cutoff: float | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cov_scale: float = 1.0) Structure[source]
Load a
Structurefrom an XYZ file or string.Supports both plain XYZ and PASTED extended XYZ (with
charge=,mult=, and metric tokens on the comment line). When recompute_metrics isTrue(default), all disorder metrics are recomputed from the loaded geometry so that the returned structure is fully usable as optimizer input or for filtering.- Parameters:
source – Path to an XYZ file or a raw XYZ string.
frame – Zero-based frame index when source contains multiple concatenated structures (default: first frame).
recompute_metrics – Recompute all disorder metrics after loading. Set to
Falseto skip the recomputation and return the structure with whatever metric values were embedded in the extended XYZ comment (or an empty dict for plain XYZ).cutoff – Distance cutoff (Å) for metric computation. Auto-computed from the element pool when
None.n_bins – Histogram bins for
H_spatial/RDF_dev(default: 20).w_atom – Weight of
H_atominH_total(default: 0.5).w_spatial – Weight of
H_spatialinH_total(default: 0.5).cov_scale – Minimum distance scale factor used for metrics (default: 1.0).
- Return type:
Structure
- Raises:
ValueError – When the file / string cannot be parsed, or frame is out of range.
Examples
Load and immediately use as optimizer initial structure:
from pasted import Structure, StructureOptimizer s = Structure.from_xyz("my_structure.xyz") opt = StructureOptimizer( n_atoms=len(s), charge=s.charge, mult=s.mult, objective={"H_total": 1.0}, elements=[sym for sym in set(s.atoms)], max_steps=2000, seed=42, ) result = opt.run(initial=s)
- mode: str
- mult: int
- sample_index: int = 0
GenerationResult#
- class pasted._generator.GenerationResult(structures: list[Structure] = <factory>, n_attempted: int = 0, n_passed: int = 0, n_rejected_parity: int = 0, n_rejected_filter: int = 0, n_success_target: int | None = None)[source]
Bases:
objectReturn value of
generate()andStructureGenerator.generate().Behaves like a
list[Structure]in all normal usage (indexing, iteration,len, boolean test,for s in result) while also carrying metadata about how many attempts were made and why samples were rejected. This metadata is especially useful when integrating PASTED into automated pipelines such as ASE or high-throughput workflows, where a silent empty list would be indistinguishable from a successful run that just produced no results.- structures
Structures that passed all filters.
- Type:
list[pasted._generator.Structure]
- n_attempted
Total placement attempts made.
- Type:
- n_passed
Number of structures that passed all filters (equals
len(structures)unless the caller mutates the list).- Type:
- n_rejected_parity
Attempts rejected by the charge/multiplicity parity check.
- Type:
- n_rejected_filter
Attempts rejected by user-supplied metric filters.
- Type:
- n_success_target
The
n_successvalue that was in effect during generation (Nonewhen not set).- Type:
int | None
Examples
Drop-in replacement for
list[Structure]:result = generate(n_atoms=10, charge=0, mult=1, mode="gas", region="sphere:8", elements="6,7,8", n_samples=20, seed=0) for s in result: # iterates like a list print(s.to_xyz()) print(len(result)) # number that passed
Inspect rejection metadata:
if result.n_rejected_parity > 0: print(f"{result.n_rejected_parity} samples failed parity check") print(result.summary())
Notes
GenerationResultis adataclass(); downstream code should treat it as immutable. Thestructuresfield is a plainlistand may be sorted or sliced freely.- n_attempted: int = 0
- n_passed: int = 0
- n_rejected_filter: int = 0
- n_rejected_parity: int = 0
- structures: list[Structure]