Skip to content

State, errors, observability, and testing

This page covers the support systems around execution rather than the scheduler itself.

Persistent state in .bitrab/

bitrab/folder.py is the authority for the workspace folder.

Important APIs:

  • scan_folder()
  • list_runs()
  • prune_runs()
  • clean_artifacts()
  • clean_job_dirs()
  • clean_logs()
  • write_run_log()

The design intent is simple:

  • run logs should be cheap to list later
  • size reporting should not require a full re-walk every time
  • cleanup commands should map directly to workspace subtrees

Structured events

execution/events.py adds a typed event layer around PipelineCallbacks.

Main types:

  • EventType
  • PipelineEvent
  • EventCollector
  • PipelineSummary

This layer is important because it decouples:

  • execution
  • summaries
  • log persistence
  • future UI/reporting work

If you are adding a new lifecycle hook, update the callback path and then decide whether it should also become a structured event.

Error handling and recovery

Main exception types live in bitrab/exceptions.py:

  • BitrabError
  • GitlabRunnerError
  • JobExecutionError
  • JobTimeoutError

The error boundaries are roughly:

Layer Typical errors
config loading GitlabRunnerError
job shell execution subprocess.CalledProcessError, wrapped as JobExecutionError
timeout JobTimeoutError
CLI surface caught and reported in cli.py command handlers

Recovery features today are narrow but useful:

  • job-level retry support
  • allow_failure
  • persisted run summaries in .bitrab/logs/
  • watch mode re-runs on config changes

There is no broad "resume partially completed pipeline" system.

Logging and observability

Bitrab does not have a heavy centralized logging framework. Observability is mostly a mix of:

  • user-facing console output via safe_print()
  • structured runtime events
  • persisted run summaries and event logs
  • per-job log files in CI mode

This keeps the runtime understandable, but it also means feature work often has to decide between:

  • human output
  • structured event output
  • both

Testing strategy

The test suite in test/ is broad and mostly organized by behavior area rather than package path.

Broad buckets:

Area Example tests
CLI behavior test/test_cli.py
YAML/schema/validation test/test_schema.py, test/test_validate_pipeline.py, test/test_capabilities.py
Config semantics test/test_rules.py, test/test_extends.py, test/test_matrix.py
Runtime scheduling test/test_dag_execution.py, test/test_scenarios.py, test/test_scenario_dags.py
Output and UI test/test_textual_app.py, test/test_tui_mode.py
Support systems test/test_artifacts.py, test/test_events.py, test/test_folder.py, test/test_watch.py

The runtime also uses scenario-style tests under test/scenarios/ to exercise more realistic pipeline shapes.

Performance and concurrency notes

Parallelism is configurable, but the main bottlenecks are still:

  • host CPU and process startup cost
  • shared checkout filesystem contention
  • log routing overhead in parallel/TUI modes

For performance-sensitive or correctness-sensitive changes, start with:

  • execution/stage_runner.py
  • execution/shell.py
  • execution/artifacts.py
  • tui/orchestrator.py

Mutation detection

mutation.py is a bitrab-specific safety feature.

It snapshots the project tree before a job, compares it after the job, and warns on unexpected writes outside the builtin and configured whitelist. This is especially useful for keeping verification-style jobs honest, especially when they run serially in the real checkout or when worktree isolation is unavailable.