Reproducible environments with uv
Your pipeline works perfectly on your laptop today. You deploy it to production next week, and it breaks because pandas updated from 2.1.0 to 2.2.0. A method you use changed behavior. Code that hasn’t changed produces different results because the environment changed. Reproducible environments with uv solve this by locking every dependency to exact versions, ensuring identical execution across time and machines.
What you’ll understand
After reading this page, you’ll understand:
- Why reproducible environments matter for preventing environment drift, transitive dependency conflicts, and time-based failures that cause production breaks from dependency updates
- How
uv-buildcreates hermetic builds through dependency resolution, environment isolation, build execution in locked environments, and artifact packaging with metadata - When to use
uv-buildfor production deployments, compliance audits, and team collaboration versus regularbuildfor development iteration, prototyping, and solo work - What you gain in full reproducibility, audit trails, and isolation versus what you lose in slower builds, tooling complexity, and lock file management
What are reproducible environments?
A reproducible environment is a Python environment where every dependency, including transitive dependencies, is locked to a specific version. Running code in this environment produces identical results regardless of when or where you execute it.
Xorq integrates with uv, a fast Python package installer, to create these environments. When you run xorq uv-build, Xorq creates an isolated environment with pinned dependencies, builds your expression in that environment, and packages everything for deployment.
# Standard build (uses current environment)
xorq build pipeline.py -e features
# Hermetic build (creates isolated environment)
xorq uv-build pipeline.py -e featuresWhy reproducible environments matter
Without reproducibility, your code works on your laptop but fails in production. Dependencies update, Python versions change, and behavior drifts. What worked yesterday might break today.
This creates four critical problems:
Environment drift causes production failures. Your laptop has pandas 2.1.0, but production has 2.0.0. A method you use, such as DataFrame.map, doesn’t exist in the older version. Your pipeline fails in production despite working locally. Debugging takes hours because the code looks correct, but the environment is different.
Transitive dependency conflicts break environments. You install package A, which depends on package B version 1.0. Later, you install package C, which requires package B version 2.0. Your environment breaks because B can’t satisfy both requirements simultaneously. pip tries to resolve this, fails, and leaves your environment in an inconsistent state.
Time-based failures occur from external updates. Your code worked yesterday. But today, a dependency released a breaking change where pandas 2.2.0 changes groupby behavior. Your deployment fails even though you didn’t change your code. Production breaks from external changes you didn’t anticipate or control.
Missing audit trails prevent reproduction. Six months later, regulators ask “Which dependency versions produced this model?” You can’t remember because you didn’t track pandas 2.1.0, numpy 1.25.0, or pyarrow 13.0.0. Reproduction becomes guesswork. Compliance failures cost money and reputation.
Reproducible environments solve these by locking every dependency to exact versions, ensuring identical behavior across time and machines.
How reproducible environments work
Reproducible environments with uv operate in four stages:
Dependency resolution: When you run uv-build, uv reads your project’s dependencies and resolves them to exact versions. This includes transitive dependencies, which are dependencies of dependencies.
Environment isolation: uv creates an isolated Python environment separate from your system Python. This environment contains only the specified dependencies, nothing more.
Build execution: Xorq runs the build in this isolated environment. The expression compiles using the exact dependency versions specified.
Artifact packaging: The build artifacts include metadata about the environment like Python version. The packaged sdist contains requirements.txt with dependency versions. This supports recreation later.
The isolated environment is ephemeral. uv creates it for the build, then discards it. The build artifacts capture what you need to recreate the environment later.
Reproducible environments separate build-time dependencies from runtime dependencies. You build with locked versions, then execute with the same locked versions. This prevents drift between environments.
uv-build versus regular build
Xorq provides two build commands with different reproducibility guarantees:
xorq build
Uses your current Python environment:
xorq build pipeline.py -e featuresPros
- Fast with no environment creation, typically under one second.
- Simple because it uses your existing environment with no additional tools.
- Good for development iteration with quick build, test, and repeat cycles.
Cons
- Not reproducible because it depends on whatever packages are currently installed.
- Environment drift is possible, so code that works today might break tomorrow.
- No dependency locking means you can’t recreate the exact environment later.
Use when
Use this command for development iteration, prototyping, and solo work where build speed matters more than reproducibility.
xorq uv-build
Creates an isolated, locked environment:
xorq uv-build pipeline.py -e featuresPros
- Fully reproducible because locked dependencies guarantee identical behavior.
- Isolated with no environment pollution from other projects.
- Hermetic and self-contained, including all dependency info.
- Auditable because metadata tracks exactly what was used.
Cons
- Slower because it creates an environment each time, typically 10-30 seconds.
- Requires uv installed as an additional tooling dependency.
- More complex setup requiring pyproject.toml and lock files.
- Lock file management requiring commits and updates to uv.lock.
Use when
Use this command for production deployments, compliance audits, and team collaboration where reproducibility is critical.
Dependency locking
Dependency locking means specifying exact versions for every package, including transitive dependencies. Instead of “pandas >= 2.0”, you specify “pandas == 2.1.0, pyarrow == 13.0.0, numpy == 1.25.0”.
uv handles this automatically:
# uv resolves dependencies and creates lock file
uv lock
# uv-build uses the lock file
xorq uv-build pipeline.py -e featuresThe lock file uv.lock captures the entire dependency tree:
[[package]]
name = "pandas"
version = "2.1.0"
dependencies = [
{ name = "numpy", version = "1.25.0" },
{ name = "pyarrow", version = "13.0.0" },
]
[[package]]
name = "numpy"
version = "1.25.0"This ensures that six months from now, you can recreate the exact same environment.
Environment metadata
uv-builds include environment metadata in metadata.json and dependency information in the packaged sdist:
{
"current_library_version": "0.3.4",
"metadata_version": "0.0.0",
"sys-version-info": [3, 11, 5, "final", 0],
"git_state": {
"commit": "a3f5c9d2e1b4...",
"branch": "main"
}
}The build directory also includes a packaged sdist (.tar.gz file) that contains requirements.txt with all dependency versions. This enables:
Audit trails: Know exactly which versions produced this build by checking the sdist’s requirements.txt, like pandas 2.1.0, not 2.2.0.
Reproduction: Recreate the environment from the lock file and sdist six months later.
Debugging: Identify version-specific bugs by comparing metadata and requirements across builds.
Compliance: Prove which software versions were used for regulatory requirements by inspecting the sdist.
When to use reproducible environments
Deciding when to use reproducible environments depends on your deployment patterns and collaboration needs.
Use reproducible environments when:
- You’re deploying to production with daily scheduled pipelines that need consistency.
- Compliance requires audit trails because regulators ask “which versions?”.
- Multiple people collaborate on the same pipeline and the team needs identical environments.
- You need to recreate results months later for research reproducibility or compliance audits.
- Environment consistency matters more than build speed, prioritizing correctness over convenience.
- Production stakes are high because failures cost money, reputation, and compliance.
Use regular builds when
- You’re doing interactive development with iterate, test, and repeat cycles.
- You’re prototyping and iterating quickly, prioritizing speed over reproducibility.
- You’re working solo with no deployment needs and no production risk.
- Build speed matters more than reproducibility for exploratory work or throwaway code.
- Code never leaves development like notebooks, experiments, or one-off analyses.
Example decision
If you’re exploring data in a notebook, then use regular build or run directly with .execute() for faster iteration. The overhead of uv, lock files, and environment isolation exceeds the benefit when code never leaves your laptop.
Reproducibility guarantees
Reproducible environments provide three levels of guarantee:
Package-level reproducibility
Same package versions across all environments:
# Development
xorq uv-build pipeline.py -e features
# Uses: pandas==2.1.0, numpy==1.25.0
# Production (6 months later)
xorq uv-build pipeline.py -e features
# Uses: pandas==2.1.0, numpy==1.25.0 (same!)Python-level reproducibility
Same Python version across environments:
{
"python_version": "3.11.5"
}uv ensures the build uses the specified Python version, not whatever’s installed on the machine.
System-level reproducibility
Same system dependencies with Nix (optional):
For full system-level reproducibility including system libraries like OpenSSL, combine uv with Nix. This is advanced and typically only needed for maximum reproducibility.
Trade-offs
Reproducible environments provide deployment consistency, but require additional tooling and workflow changes.
Benefits:
- Full reproducibility: Same code + same environment = same results, guaranteed.
- Audit trails: Know exactly which versions were used where pandas 2.1.0, numpy 1.25.0.
- Isolation: No environment pollution from other projects, clean slate every build.
- Time-proof: Builds work the same way months later, immune to external updates.
- Team alignment: Everyone uses identical environments, no “works on my machine.”
Costs:
- Build time: Creating isolated environments takes longer, typically 10-30 seconds versus under 1 second.
- Complexity: Requires understanding uv, pyproject.toml, lock files, and environment isolation.
- Storage: Lock files and metadata consume space, typically 100-500KB per project.
- Tooling dependency: Requires uv installed and configured correctly.
- Lock file management: Need to commit, update, and resolve conflicts in uv.lock.
When the trade-off is worth it
If you’re doing exploratory work that never leaves your laptop like ad-hoc analyses, prototype notebooks, or one-time reports, then the overhead isn’t justified. Regular builds or direct execution provide faster iteration with acceptable risk.
Learning more
Build system explains how uv-build extends regular builds with environment locking. Content-addressed hashing covers how uv-builds are still content-addressed.
Compute catalog discusses how to catalog uv-builds like regular builds.
Build reproducible environments guide provides production workflows with uv. uv-build CLI reference covers complete uv-build documentation.