Debias Guide
The Debias module generates stochastic omit perturbation (STOMP) map ensembles. It produces SLURM sbatch scripts for HPC submission.
Configuration precedence
Values are resolved in this order:
- CLI flags / Python API overrides
- External YAML config file
- Internal defaults
Quick start
Single structure (CLI flags only):
pseudo-debias generate-params \
--run_name my_experiment \
--structure_path /data/target.pdb \
--reflections_path /data/target.mtz \
--work_dir /scratch/results
YAML config file:
# run.yaml
debias:
run_name: "my_experiment"
structure_path: "/data/target.pdb"
reflections_path: "/data/target.mtz"
omit_type: "atoms"
omit_fraction: 0.1
iterations: 5
seed: 42
paths:
work_dir: "/scratch/results"
slurm:
partition: "cs05r"
mem_per_cpu: "5G"
pseudo-debias generate-params --config run.yaml
Submit the generated SLURM jobs:
The submission command is printed at the end of generate-params and also recorded in the eliot log. For a small run (omission jobs ≤ screening_chunk_size) it is a single pair:
jid=$(sbatch --parsable sbatch/submit_preprocessing.slurm)
jid=$(sbatch --parsable --dependency=afterok:$jid sbatch/submit_omission.slurm)
For large screening runs the omission array is split into sequential chunks (see Scheduler-friendly chunking):
jid=$(sbatch --parsable sbatch/submit_preprocessing.slurm)
jid=$(sbatch --parsable --dependency=afterok:$jid sbatch/submit_omission_0.slurm)
jid=$(sbatch --parsable --dependency=afterok:$jid sbatch/submit_omission_1.slurm)
# … one line per chunk
Python API
from debias.api import load_debias_config, generate_slurm_job
cfg = load_debias_config(
config_path="run.yaml",
overrides=[
"debias.structure_path=/data/target.pdb",
"debias.reflections_path=/data/target.mtz",
],
)
generate_slurm_job(cfg)
Batch screening
Pass a CSV or Diamond SoakDB SQLite file to process many crystals at once:
pseudo-debias generate-params \
--config run.yaml \
--screening_path /data/fragment_screen.csv
The CSV must contain PDB (or CIF/structure) and MTZ columns. For SQLite files (Diamond XChem SoakDB), the module queries mainTable and filters for outcomes 4 - CompChem ready, 5 - Deposition ready, and 6 - Deposited.
Scheduler-friendly chunking
Each crystal produces ~50 omission parameter files, so a large screening run can generate tens of thousands of omission jobs. Submitting these as a single array risks overwhelming the scheduler.
screening_chunk_size (default 1000) caps the number of omission jobs per sbatch array. The full omission manifest is split into chunks of that size, each chunk gets its own submit_omission_N.slurm script and they are chained with --dependency=afterok so only one chunk runs at a time.
pseudo-debias generate-params \
--config run.yaml \
--screening_path /data/fragment_screen.csv \
--screening_chunk_size 500 # at most 500 omission jobs per array
Or via YAML:
debias:
screening_chunk_size: 500
The complete submission command (with all chunk dependencies) is both printed to stdout and recorded in the eliot log under debias:submission_command.
MTZ label resolution
Before generating omission .params files, PSEUDO reads each MTZ file with gemmi and auto-selects the observed-data and R-free flag columns. This prevents the Phenix error “Multiple equally suitable arrays of observed xray data found” and the equivalent error when multiple R-free / Status columns are present.
Auto-detection priority
Observed data:
| Priority | F column | SIGF column |
|---|---|---|
| 1 | F-obs-filtered |
SIGF-obs-filtered |
| 2 | F-obs |
SIGF-obs |
| 3 | FP |
SIGFP |
| 4 | FOBS |
SIGFOBS |
| 5 | Fobs |
SIGFobs |
| 6 | F |
SIGF |
| 7 | FTOT |
SIGTOT |
| 8 | IMEAN |
SIGIMEAN |
| 9 | I |
SIGI |
| 10 | IOBS |
SIGIOBS |
R-free flag (first matching column wins):
FreeR_flag → FREE → FREER → R-free-flags → Status
The resolved labels are printed to stdout and recorded in the eliot log under debias:mtz_labels_resolved. The full column inventory of each MTZ is logged under debias:mtz_columns_found.
When auto-detection fails
If a column pair cannot be matched, generate-params raises a ValueError listing the F/I and integer columns actually present in the MTZ:
[Ax0123] MTZ label detection failed for '.../refine.mtz':
No recognised amplitude/intensity pair found.
F/I columns present: ['FTOT', 'F_anomalous']
Set 'debias.mtz_f_labels' to e.g. "FP,SIGFP" to override.
Set the override and rerun:
pseudo-debias generate-params \
--config run.yaml \
--mtz_f_labels "FTOT,SIGTOT"
Or in YAML:
debias:
mtz_f_labels: "FTOT,SIGTOT"
mtz_rfree_label: "FreeR_flag"
Config overrides take precedence over auto-detection. Setting both columns explicitly skips MTZ reading entirely for that crystal.
always_omit
Force specific residues/atoms to be omitted in every iteration — essential for unbiased ligand validation:
debias:
always_omit: "A 567, A 568" # chain resnum [atom_name]
Output directory layout
<work_dir>/<run_name>/
├── sbatch/
│ ├── submit_preprocessing.slurm
│ ├── submit_omission.slurm # single chunk (small runs)
│ ├── submit_omission_0.slurm #
│ ├── submit_omission_1.slurm # chunked (large screening runs)
│ ├── ... #
│ ├── preprocessing_manifest.txt
│ ├── omit_manifest.txt # full reference manifest (always written)
│ ├── omit_manifest_0.txt #
│ ├── omit_manifest_1.txt # per-chunk manifests
│ └── ... #
│
├── logs/ # SLURM .out and .err files
│
└── <crystal_id>/
├── processed/
│ ├── {stem}_original.pdb
│ └── {stem}_updated.pdb
├── metadata/
│ └── {stem}_omission_map.json
├── params/
│ ├── {stem}_0.params
│ └── ...
└── results/
├── {stem}_0/{stem}_0.mtz
└── ...
{stem} is derived from the input structure filename.
Re-run behaviour
By default, crystals whose first perturbation map (results/<stem>_0/<stem>_0.mtz) already exists are skipped. Pass --force / -f or set debias.force: true in YAML to regenerate everything:
pseudo-debias generate-params --config run.yaml --force
debias:
force: true