StructureGenerator & generate()#

class pasted._generator.StructureGenerator(*, n_atoms: int, charge: int, mult: int, mode: str = 'gas', region: str | None = None, branch_prob: float = 0.3, chain_persist: float = 0.5, chain_bias: float = 0.0, bond_range: tuple[float, float] = (1.2, 1.6), center_z: int | None = None, coord_range: tuple[int, int] = (4, 8), shell_radius: tuple[float, float] = (1.8, 2.5), elements: str | list[str] | None = None, element_fractions: dict[str, float] | None = None, element_min_counts: dict[str, int] | None = None, element_max_counts: dict[str, int] | None = None, cov_scale: float = 1.0, relax_cycles: int = 1500, maxent_steps: int = 300, maxent_lr: float = 0.05, maxent_cutoff_scale: float = 2.5, trust_radius: float = 0.5, convergence_tol: float = 0.001, add_hydrogen: bool = True, n_samples: int = 1, n_success: int | None = None, seed: int | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cutoff: float | None = None, filters: list[str] | None = None, verbose: bool = False)[source]#

Bases: object

Generate random atomic structures with disorder metrics.

All parameters use Python snake_case names that correspond 1-to-1 with their CLI --flag counterparts.

Parameters:
  • n_atoms – Number of atoms per structure (before optional H augmentation).

  • charge – Total system charge (applied to every structure).

  • mult – Spin multiplicity 2S+1.

  • mode – Placement mode: "gas" (default), "chain", or "shell".

  • region – [gas] Region spec: "sphere:R" | "box:L" | "box:LX,LY,LZ". Required when mode=”gas”.

  • branch_prob – [chain] Branching probability (default: 0.3).

  • chain_persist – [chain] Directional persistence ∈ [0, 1] (default: 0.5).

  • chain_bias – [chain] Global-axis drift strength ∈ [0, 1] (default: 0.0). The direction of the first bond becomes the bias axis; each subsequent step is blended toward that axis before normalisation. 0.0 → no bias (backwards-compatible); higher values produce more elongated structures with larger shape_aniso.

  • bond_range – [chain / shell tails] Bond-length range in Å (default: (1.2, 1.6)).

  • center_z – [shell] Atomic number of center atom. None → random per sample.

  • coord_range – [shell] Coordination-number range (default: (4, 8)).

  • shell_radius – [shell] Shell-radius range in Å (default: (1.8, 2.5)).

  • elements – Element pool. A spec string such as "1-30" or "6,7,8", an explicit list of element symbols, or None for all Z = 1–106.

  • element_fractions – Relative sampling weights for elements in the pool, as a {symbol: weight} dict (e.g. {"C": 0.5, "N": 0.3, "O": 0.2}). Weights are relative — they are normalised internally and need not sum to 1. Elements absent from the dict receive a weight of 1.0. When None (default), every element in the pool is sampled with equal probability.

  • element_min_counts – Minimum number of atoms per element guaranteed in every generated structure (e.g. {"C": 2, "N": 1}). The required atoms are placed first; remaining slots are filled by weighted random sampling. None (default) → no lower bounds. The sum of all minimum counts must not exceed n_atoms.

  • element_max_counts

    Maximum number of atoms allowed per element (e.g. {"N": 5, "O": 3}). Elements that have reached their cap are excluded from sampling for the remaining slots. None (default) → no upper bounds.

    Note

    When both element_min_counts and element_max_counts are given, each element’s min must be ≤ its max.

    Note

    The automatic hydrogen augmentation step (add_hydrogen=True) runs after the constrained sampling and may temporarily exceed element_max_counts for H. Set add_hydrogen=False if H count limits are critical.

  • cov_scale – Minimum-distance scale factor: d_min(i,j) = cov_scale × (r_i + r_j) using Pyykkö (2009) single-bond covalent radii. Default: 1.0.

  • relax_cycles – Maximum repulsion-relaxation iterations (default: 1500).

  • add_hydrogen – Automatically append H atoms when H is in the pool but the sampled composition contains none (default: True).

  • n_samples – Maximum number of placement attempts (default: 1). Use 0 to allow unlimited attempts (only valid when n_success is also set, otherwise a ValueError is raised).

  • n_success

    Target number of structures that must pass all filters before generation stops (default: None).

    • None → generate exactly n_samples attempts and return all that passed (original behaviour).

    • N > 0 with n_samples > 0 → stop as soon as N structures pass or n_samples attempts are exhausted, whichever comes first. Returns the structures collected so far with a warning if fewer than N were found.

    • N > 0 with n_samples = 0 → unlimited attempts; stop only when N structures have passed.

  • seed – Random seed for reproducibility (None → non-deterministic).

  • n_bins – Histogram bins for H_spatial and RDF_dev (default: 20).

  • w_atom – Weight of H_atom in H_total (default: 0.5).

  • w_spatial – Weight of H_spatial in H_total (default: 0.5).

  • cutoff – Distance cutoff in Å for Steinhardt and graph metrics. None → auto-computed as cov_scale × 1.5 × median(r_i + r_j) over the element pool.

  • filters – Filter strings of the form "METRIC:MIN:MAX" (use "-" for an open bound). Only structures satisfying all filters are returned.

  • verbose – Print progress and statistics to stderr (default: False). The CLI always passes True; library callers usually leave it off.

Examples

Class API:

from pasted import StructureGenerator

gen = StructureGenerator(
    n_atoms=12, charge=0, mult=1,
    mode="gas", region="sphere:9",
    elements="1-30", n_samples=50, seed=42,
    filters=["H_total:2.0:-"],
)
structures = gen.generate()
for s in structures:
    print(s)

Functional API:

from pasted import generate

structures = generate(
    n_atoms=12, charge=0, mult=1,
    mode="chain", elements="6,7,8",
    n_samples=20, seed=0,
)
__iter__() Iterator[Structure][source]#

Iterate over generated structures (delegates to stream()).

__repr__() str[source]#

Return repr(self).

property cutoff: float#

Distance cutoff in Å used for Steinhardt and graph metrics.

property element_pool: list[str]#

A copy of the resolved element pool (list of symbols).

generate() GenerationResult[source]#

Generate structures and return a GenerationResult.

Collects all structures yielded by stream(), attaches generation metadata (attempt counts, rejection breakdowns), and returns a GenerationResult that behaves like a list[Structure] in all normal usage while also carrying the diagnostics needed for automated pipelines.

GenerationResult supports the full list interface (indexing, iteration, len, bool) so existing code that does result[0] or for s in result continues to work without modification.

Warnings are also emitted via warnings.warn() (category UserWarning) when:

  • Any attempts are rejected by the charge/multiplicity parity check.

  • No structures pass the metric filters.

  • The attempt budget is exhausted before n_success is reached.

Each call creates a fresh random.Random seeded with self.seed, so repeated calls with the same seed are reproducible.

Returns:

Wraps the list of passing structures together with generation metadata. Use result.structures for the raw list or result.summary() for a one-line diagnostic string.

Return type:

GenerationResult

Examples

Drop-in list usage:

result = gen.generate()
for s in result:
    print(s.to_xyz())

Metadata access:

result = gen.generate()
if result.n_rejected_parity > 0:
    print(result.summary())
stream() Iterator[Structure][source]#

Generate structures one by one, yielding each that passes all filters.

Unlike generate(), structures are yielded immediately as they pass, so callers can write output or stop early without waiting for all attempts to complete.

Respects both n_samples (maximum attempts) and n_success (target number of passing structures):

  • If n_success is set, the iterator stops as soon as that many structures have been yielded — even if n_samples attempts have not been exhausted.

  • If n_samples is 0 (unlimited), the iterator runs until n_success structures have been yielded.

  • If n_samples attempts are exhausted before n_success is reached, a warning is emitted to stderr and the iterator ends.

Each call creates a fresh random.Random seeded with self.seed, so repeated calls with the same seed are reproducible.

Yields:

Structure – Each structure that passed all filters, in generation order.

Examples

Write structures to a file as they are found:

gen = StructureGenerator(
    n_atoms=12, charge=0, mult=1,
    mode="gas", region="sphere:9",
    elements="1-30", n_success=10, n_samples=500, seed=42,
)
for s in gen.stream():
    s.write_xyz("out.xyz")
pasted._generator.generate(*, n_atoms: int, charge: int, mult: int, mode: str = 'gas', region: str | None = None, branch_prob: float = 0.3, chain_persist: float = 0.5, chain_bias: float = 0.0, bond_range: tuple[float, float] = (1.2, 1.6), center_z: int | None = None, coord_range: tuple[int, int] = (4, 8), shell_radius: tuple[float, float] = (1.8, 2.5), elements: str | list[str] | None = None, element_fractions: dict[str, float] | None = None, element_min_counts: dict[str, int] | None = None, element_max_counts: dict[str, int] | None = None, cov_scale: float = 1.0, relax_cycles: int = 1500, maxent_steps: int = 300, maxent_lr: float = 0.05, maxent_cutoff_scale: float = 2.5, trust_radius: float = 0.5, convergence_tol: float = 0.001, add_hydrogen: bool = True, n_samples: int = 1, n_success: int | None = None, seed: int | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cutoff: float | None = None, filters: list[str] | None = None, verbose: bool = False) GenerationResult[source]#

Create a StructureGenerator and immediately call generate().

All parameters are forwarded unchanged. See StructureGenerator for full documentation.

Returns:

A list-compatible object containing the structures that passed all filters plus metadata about the generation run (attempt counts, rejection breakdowns). Behaves identically to list[Structure] in all normal usage (indexing, iteration, len, bool).

UserWarning is raised whenever:

  • attempts are rejected by the charge/multiplicity parity check,

  • no structures pass the metric filters, or

  • the attempt budget is exhausted before n_success is reached.

Return type:

GenerationResult

Examples

Drop-in list usage:

from pasted import generate

# 20 random gas-phase structures drawn from C/N/O
structures = generate(
    n_atoms=10, charge=0, mult=1,
    mode="gas", region="sphere:8",
    elements="6,7,8", n_samples=20, seed=0,
)
for i, s in enumerate(structures):
    s.write_xyz("out.xyz", append=(i > 0))

Inspecting rejection metadata:

result = generate(
    n_atoms=10, charge=0, mult=1,
    mode="gas", region="sphere:8",
    elements="6,7,8", n_samples=50, seed=0,
    filters=["H_total:1.5:-"],
)
print(result.summary())
# e.g. "passed=3  attempted=50  rejected_parity=0  rejected_filter=47"

Structure#

class pasted._generator.Structure(atoms: list[str], positions: list[tuple[float, float, float]], charge: int, mult: int, metrics: dict[str, float], mode: str, sample_index: int = 0, center_sym: str | None = None, seed: int | None = None)[source]

Bases: object

A single generated atomic structure with its computed disorder metrics.

atoms

Element symbols, one per atom.

Type:

list[str]

positions

Cartesian coordinates in Å, one (x, y, z) tuple per atom.

Type:

list[tuple[float, float, float]]

charge

Total system charge.

Type:

int

mult

Spin multiplicity 2S+1.

Type:

int

metrics

Computed disorder metrics (see pasted._atoms.ALL_METRICS).

Type:

dict[str, float]

mode

Placement mode used ("gas", "chain", or "shell").

Type:

str

sample_index

1-based index within the batch of structures that passed filters.

Type:

int

center_sym

Element symbol of the shell center atom (shell mode only).

Type:

str | None

seed

Random seed used for generation (None if unseeded).

Type:

int | None

__len__() int[source]
__repr__() str[source]

Return repr(self).

atoms: list[str]
center_sym: str | None = None
charge: int
classmethod from_xyz(source: str | Path, *, frame: int = 0, recompute_metrics: bool = True, cutoff: float | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cov_scale: float = 1.0) Structure[source]

Load a Structure from an XYZ file or string.

Supports both plain XYZ and PASTED extended XYZ (with charge=, mult=, and metric tokens on the comment line). When recompute_metrics is True (default), all disorder metrics are recomputed from the loaded geometry so that the returned structure is fully usable as optimizer input or for filtering.

Parameters:
  • source – Path to an XYZ file or a raw XYZ string.

  • frame – Zero-based frame index when source contains multiple concatenated structures (default: first frame).

  • recompute_metrics – Recompute all disorder metrics after loading. Set to False to skip the recomputation and return the structure with whatever metric values were embedded in the extended XYZ comment (or an empty dict for plain XYZ).

  • cutoff – Distance cutoff (Å) for metric computation. Auto-computed from the element pool when None.

  • n_bins – Histogram bins for H_spatial / RDF_dev (default: 20).

  • w_atom – Weight of H_atom in H_total (default: 0.5).

  • w_spatial – Weight of H_spatial in H_total (default: 0.5).

  • cov_scale – Minimum distance scale factor used for metrics (default: 1.0).

Return type:

Structure

Raises:

ValueError – When the file / string cannot be parsed, or frame is out of range.

Examples

Load and immediately use as optimizer initial structure:

from pasted import Structure, StructureOptimizer

s = Structure.from_xyz("my_structure.xyz")
opt = StructureOptimizer(
    n_atoms=len(s), charge=s.charge, mult=s.mult,
    objective={"H_total": 1.0},
    elements=[sym for sym in set(s.atoms)],
    max_steps=2000, seed=42,
)
result = opt.run(initial=s)
metrics: dict[str, float]
mode: str
mult: int
positions: list[tuple[float, float, float]]
sample_index: int = 0
seed: int | None = None
to_xyz(prefix: str = '') str[source]

Serialise to extended XYZ format.

Parameters:

prefix – Custom prefix for the comment line. When omitted the standard "sample=N mode=M …" string is generated automatically.

Return type:

Multi-line string (no trailing newline).

write_xyz(path: str | Path, *, append: bool = True) None[source]

Write this structure to an XYZ file.

Parameters:
  • path – Output file path.

  • append – If True (default) the file is opened in append mode so that multiple structures can be written in sequence. Use append=False to overwrite.

GenerationResult#

class pasted._generator.GenerationResult(structures: list[Structure] = <factory>, n_attempted: int = 0, n_passed: int = 0, n_rejected_parity: int = 0, n_rejected_filter: int = 0, n_success_target: int | None = None)[source]

Bases: object

Return value of generate() and StructureGenerator.generate().

Behaves like a list[Structure] in all normal usage (indexing, iteration, len, boolean test, for s in result) while also carrying metadata about how many attempts were made and why samples were rejected. This metadata is especially useful when integrating PASTED into automated pipelines such as ASE or high-throughput workflows, where a silent empty list would be indistinguishable from a successful run that just produced no results.

structures

Structures that passed all filters.

Type:

list[pasted._generator.Structure]

n_attempted

Total placement attempts made.

Type:

int

n_passed

Number of structures that passed all filters (equals len(structures) unless the caller mutates the list).

Type:

int

n_rejected_parity

Attempts rejected by the charge/multiplicity parity check.

Type:

int

n_rejected_filter

Attempts rejected by user-supplied metric filters.

Type:

int

n_success_target

The n_success value that was in effect during generation (None when not set).

Type:

int | None

Examples

Drop-in replacement for list[Structure]:

result = generate(n_atoms=10, charge=0, mult=1,
                  mode="gas", region="sphere:8",
                  elements="6,7,8", n_samples=20, seed=0)
for s in result:          # iterates like a list
    print(s.to_xyz())
print(len(result))        # number that passed

Inspect rejection metadata:

if result.n_rejected_parity > 0:
    print(f"{result.n_rejected_parity} samples failed parity check")
print(result.summary())

Notes

GenerationResult is a dataclass(); downstream code should treat it as immutable. The structures field is a plain list and may be sorted or sliced freely.

__iter__() Iterator[Structure][source]
__len__() int[source]
__repr__() str[source]

Return repr(self).

n_attempted: int = 0
n_passed: int = 0
n_rejected_filter: int = 0
n_rejected_parity: int = 0
n_success_target: int | None = None
structures: list[Structure]
summary() str[source]

Return a human-readable one-line summary of the generation run.

Returns:

E.g. "passed=5  attempted=20  rejected_parity=2  rejected_filter=13".

Return type:

str