State, errors, observability, and testing
This page covers the support systems around execution rather than the scheduler itself.
Persistent state in .bitrab/
bitrab/folder.py is the authority for the workspace folder.
Important APIs:
scan_folder()list_runs()prune_runs()clean_artifacts()clean_job_dirs()clean_logs()write_run_log()
The design intent is simple:
- run logs should be cheap to list later
- size reporting should not require a full re-walk every time
- cleanup commands should map directly to workspace subtrees
Structured events
execution/events.py adds a typed event layer around PipelineCallbacks.
Main types:
EventTypePipelineEventEventCollectorPipelineSummary
This layer is important because it decouples:
- execution
- summaries
- log persistence
- future UI/reporting work
If you are adding a new lifecycle hook, update the callback path and then decide whether it should also become a structured event.
Error handling and recovery
Main exception types live in bitrab/exceptions.py:
BitrabErrorGitlabRunnerErrorJobExecutionErrorJobTimeoutError
The error boundaries are roughly:
| Layer | Typical errors |
|---|---|
| config loading | GitlabRunnerError |
| job shell execution | subprocess.CalledProcessError, wrapped as JobExecutionError |
| timeout | JobTimeoutError |
| CLI surface | caught and reported in cli.py command handlers |
Recovery features today are narrow but useful:
- job-level retry support
allow_failure- persisted run summaries in
.bitrab/logs/ - watch mode re-runs on config changes
There is no broad "resume partially completed pipeline" system.
Logging and observability
Bitrab does not have a heavy centralized logging framework. Observability is mostly a mix of:
- user-facing console output via
safe_print() - structured runtime events
- persisted run summaries and event logs
- per-job log files in CI mode
This keeps the runtime understandable, but it also means feature work often has to decide between:
- human output
- structured event output
- both
Testing strategy
The test suite in test/ is broad and mostly organized by behavior area rather than package path.
Broad buckets:
| Area | Example tests |
|---|---|
| CLI behavior | test/test_cli.py |
| YAML/schema/validation | test/test_schema.py, test/test_validate_pipeline.py, test/test_capabilities.py |
| Config semantics | test/test_rules.py, test/test_extends.py, test/test_matrix.py |
| Runtime scheduling | test/test_dag_execution.py, test/test_scenarios.py, test/test_scenario_dags.py |
| Output and UI | test/test_textual_app.py, test/test_tui_mode.py |
| Support systems | test/test_artifacts.py, test/test_events.py, test/test_folder.py, test/test_watch.py |
The runtime also uses scenario-style tests under test/scenarios/ to exercise more realistic pipeline shapes.
Performance and concurrency notes
Parallelism is configurable, but the main bottlenecks are still:
- host CPU and process startup cost
- shared checkout filesystem contention
- log routing overhead in parallel/TUI modes
For performance-sensitive or correctness-sensitive changes, start with:
execution/stage_runner.pyexecution/shell.pyexecution/artifacts.pytui/orchestrator.py
Mutation detection
mutation.py is a bitrab-specific safety feature.
It snapshots the project tree before a job, compares it after the job, and warns on unexpected writes outside the builtin and configured whitelist. This is especially useful for keeping verification-style jobs honest, especially when they run serially in the real checkout or when worktree isolation is unavailable.