dev #1

Merged
tom merged 174 commits from dev into main 2025-09-05 10:18:39 +01:00
Owner
No description provided.
tom self-assigned this 2025-09-05 10:18:19 +01:00
tom added 174 commits 2025-09-05 10:18:19 +01:00
Inference build
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 1h14m59s
471d8ce919
Forgejo build now
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
b71cb205c7
Bump dependencies
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
42e23dc098
Implement audio filters
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
8789088c2c
Rename to neuromancer
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 17s
b3a0eb7d4b
Chatterbox GGUF docs
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 37s
502cd7e940
GGUF implementation
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 15s
9d0a24c8c3
Refactor audio service
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 18s
c991a75b4e
Linting
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 1h10m19s
fe363859c8
Tweak readme
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
fdc7a3cb62
UK docstrings
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
154171df44
Bump dependencies
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
3022523a28
Docstring improvements
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
5df927115a
Tidy audio service
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
4eb4646865
Doc on how OpenAPI MCP works
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
7d38bca128
Improve OpenAPI doc
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
2b50de6dda
OpenAPI MCP client
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 1h14m50s
00dddbd55a
Update config defaults
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
78f23ce67d
Agent calling
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
4882771a76
Readme tweaks
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
33e3ff5811
Build test
All checks were successful
Build and Publish Docker Image / build-and-push (push) Successful in 53m40s
31dbc50f6d
Test config
All checks were successful
Build and Publish Docker Image / build-and-push (push) Successful in 19m29s
81fd809e95
Implement Kyutai audio pipeline
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 1h12m39s
0a602e8a27
Linting
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
6d22390aba
Improve docstrings
Some checks failed
Build and Publish Docker Image / build-and-push (push) Has been cancelled
924724ba9f
Retire audio module
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 18s
067e6e80ac
Tidy
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 18s
d4f3b95e45
Model manager
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 16s
42f497de3c
New tools module inside models
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 17s
a9d0163f6c
Refactor API ingress
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 17s
482177b0f0
Refactor voice module
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 22s
071875c5ac
Update module docstrings
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 14s
5deab28adf
Update realtime progress
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 16s
1091a7d30c
Remove extra dependencies
Some checks failed
Build and Publish Docker Image / build-and-push (push) Failing after 26s
dfcc4aee77
Healthcheck and uvicorn
All checks were successful
Build and Publish Docker Image / build-and-push (push) Successful in 20s
0d9e0b8fc0
Clean up build
Some checks failed
Docker Build / Lint (push) Failing after 13s
Docker Build / Build and Push (push) Failing after 0s
Build and Publish Docker Image / build-and-push (push) Successful in 1m3s
93f9835549
Health fixes
All checks were successful
Docker Build and Publish / Lint (push) Successful in 16s
Docker Build and Publish / Build and Push (push) Successful in 1m4s
f9db3870de
Support API keys
All checks were successful
Docker Build and Publish / Lint (push) Successful in 28s
Docker Build and Publish / Build and Push (push) Successful in 57s
ec194d0319
Cleanup old containers
All checks were successful
Docker Build and Publish / Lint (push) Successful in 12s
Docker Build and Publish / Build and Push (push) Successful in 59s
7bd8a54110
Switch to Llama.cpp
All checks were successful
Docker Build and Publish / Lint (push) Successful in 24s
Docker Build and Publish / Build and Push (push) Successful in 56s
903b14e94d
Use new Moshi STT/TTS
All checks were successful
Docker Build and Publish / Lint (push) Successful in 17s
Docker Build and Publish / Build and Push (push) Successful in 1m3s
e81822a2ca
Test model paths
All checks were successful
Docker Build and Publish / Lint (push) Successful in 14s
Docker Build and Publish / Build and Push (push) Successful in 1m4s
c4ffb7c69f
Test voice
All checks were successful
Docker Build and Publish / Lint (push) Successful in 17s
Docker Build and Publish / Build and Push (push) Successful in 58s
bbcf2a67da
Remove vLLM and stream responses
All checks were successful
Docker Build and Publish / Lint (push) Successful in 1m55s
Docker Build and Publish / Build and Push (push) Successful in 1m4s
ddbec62825
Tool/streaming fixes
All checks were successful
Docker Build and Publish / Lint (push) Successful in 13s
Docker Build and Publish / Build and Push (push) Successful in 1m2s
b886d73603
Testing
All checks were successful
Docker Build and Publish / Lint (push) Successful in 11s
Docker Build and Publish / Build and Push (push) Successful in 1m4s
2eec3c1e56
Refactor tool calling
All checks were successful
Docker Build and Publish / Lint (push) Successful in 12s
Docker Build and Publish / Build and Push (push) Successful in 56s
1ceacef42f
Voice fixes
All checks were successful
Docker Build and Publish / Lint (push) Successful in 19s
Docker Build and Publish / Build and Push (push) Successful in 1m24s
c3bd60fa36
Enable mypy for type-checking
Some checks failed
Docker Build and Publish / Lint (push) Failing after 18s
Docker Build and Publish / Build and Push (push) Failing after 0s
8ef8313260
Refactor
All checks were successful
Docker Build and Publish / Lint (push) Successful in 14s
Docker Build and Publish / Build and Push (push) Successful in 1m1s
ec4bdae2e8
Voice and OIDC endpoints
All checks were successful
Docker Build and Publish / Lint (push) Successful in 16s
Docker Build and Publish / Build and Push (push) Successful in 56s
8791d2339e
Better tests
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 21s
Docker Build and Publish / Build and Push (push) Successful in 1m51s
4a59f8e680
Clarify auth requirements in README
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 25s
Docker Build and Publish / Build and Push (push) Successful in 1m29s
4a3d5abd43
Tool calling fixes
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 17s
Docker Build and Publish / Build and Push (push) Successful in 1m36s
c512ada482
Shared directory
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 17s
Docker Build and Publish / Build and Push (push) Successful in 1m30s
60d61576c3
Central Tool Manager
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 18s
Docker Build and Publish / Build and Push (push) Successful in 1m33s
a2f2da906d
Tidy up codebase
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 19s
Docker Build and Publish / Build and Push (push) Successful in 1m40s
6974203066
Expand pytests
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 26s
Docker Build and Publish / Build and Push (push) Successful in 1m32s
54378b7bbb
Ensure tools are handled
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 28s
Docker Build and Publish / Build and Push (push) Successful in 1m38s
a544291e1a
Reorganise tool calling
Some checks failed
Docker Build and Publish / Lint & Test (push) Failing after 25s
Docker Build and Publish / Build and Push (push) Failing after 0s
24ce822d79
Doc/config updates
Some checks failed
Docker Build and Publish / Lint & Test (push) Failing after 27s
Docker Build and Publish / Build and Push (push) Failing after 0s
70d5621c2b
Refactor llama.cpp client
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 30s
Docker Build and Publish / Build and Push (push) Successful in 1m33s
62343ca385
Standardise logging
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 31s
Docker Build and Publish / Build and Push (push) Successful in 1m37s
2e6b84fe0f
Remove unnecessary JSON indents
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 31s
Docker Build and Publish / Build and Push (push) Successful in 1m39s
3b797188e2
refactor: centralise Pydantic models into unified module
Some checks failed
Docker Build and Publish / Lint & Test (push) Failing after 20s
Docker Build and Publish / Build and Push (push) Failing after 0s
3bbd0bcc8d
- Create models package with organised submodules for messages, chat, audio, tools
- Replace parallel API/handler model hierarchies with single source of truth
- Fix missing role field in message serialisation for llama.cpp
Accommodate refactor
Some checks failed
Docker Build and Publish / Lint & Test (push) Failing after 15s
Docker Build and Publish / Build and Push (push) Failing after 0s
46373aa711
More pytest/linting fixes
Some checks failed
Docker Build and Publish / Lint & Test (push) Failing after 14s
Docker Build and Publish / Build and Push (push) Failing after 0s
b3014d709f
fix: ensure role field is always included in message dictionaries for llama.cpp
Some checks failed
Docker Build and Publish / Lint & Test (push) Failing after 19s
Docker Build and Publish / Build and Push (push) Failing after 0s
74de57323a
The BaseMessage.to_dict() method was not consistently including the role field
when converting messages to dictionaries, causing llama.cpp to reject messages
with 'Missing role in message' errors. This fix ensures the role field is
always present in the output dictionary.
fix: resolve CI linting errors and maintain runtime imports for Pydantic
Some checks failed
Docker Build and Publish / Lint & Test (push) Successful in 29s
Docker Build and Publish / Build and Push (push) Failing after 0s
a9a67467d6
- Add noqa directives for legitimate complexity and circular import cases
- Fix TRY300 by using proper try/except/else structure in discovery.py
- Remove TODO with fake GitHub issue link in chat_integration.py
- Add required runtime imports for Pydantic model_rebuild() with TC001/F401 suppressions
- Add PLC0415 suppressions for imports inside functions to avoid circular dependencies
fix: replace gitea context with github context for Forgejo Actions compatibility
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 27s
Docker Build and Publish / Build and Push (push) Successful in 4m10s
408224661e
feat: add logging for complete streaming responses
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 29s
Docker Build and Publish / Build and Push (push) Successful in 1m35s
cd11698c46
- Log accumulated response content after streaming completes
- Truncate long responses to 500 chars for readability
- Include tool call summary if present
- Remove redundant import after ruff auto-fix
fix: preserve anyOf nullable patterns in tool schemas
Some checks failed
Docker Build and Publish / Lint & Test (push) Failing after 15s
Docker Build and Publish / Build and Push (push) Failing after 0s
aa2e452837
Stop recursively processing items inside anyOf/oneOf/allOf composition patterns
to prevent corrupting valid OpenAPI schemas. The previous logic was incorrectly
transforming {type: "null"} to {type: "string"} inside anyOf patterns, breaking
nullable field handling.

Changes:
- Skip recursive processing of composition pattern items
- Don't add spurious type fields to schemas using composition
- Preserve original OpenAPI schema structure for llama.cpp
style: fix line length for ruff linting
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 25s
Docker Build and Publish / Build and Push (push) Successful in 1m38s
038546e9f0
debug: add detailed logging for response accumulation
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 29s
Docker Build and Publish / Build and Push (push) Successful in 1m38s
f9b06b3c82
Add debug and warning logs to understand why response content isn't being
logged at stream completion. This will help diagnose whether messages are
being accumulated properly.
chore: add pre-commit hooks for quality checks
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 26s
Docker Build and Publish / Build and Push (push) Successful in 1m39s
4d2b0b45c5
Configure prek to run pytest, mypy, ruff check, and ruff format
before each commit to ensure code quality standards are met.
fix: properly collapse anyOf nullable patterns for OpenAI format
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 27s
Docker Build and Publish / Build and Push (push) Successful in 1m34s
4d9736377e
Convert OpenAPI-style anyOf patterns with null types into simple types
for OpenAI tool format compatibility. OpenAI uses the 'required' array
to indicate optionality, not anyOf with null. This prevents llama.cpp
crashes from invalid schema patterns.
fix: add debug logging to track assistant response accumulation
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 27s
Docker Build and Publish / Build and Push (push) Successful in 1m36s
6f5c17c32c
- Add logging in ChatChunk.from_json_data to track parsed content
- Add logging in StreamingState.accumulate_chunk to track merging
- Add raw delta logging in parse_chunk to see what llama.cpp sends
- Add detailed debugging in stream completion handler to diagnose why responses aren't logged
- Add noqa comments for unavoidable linting issues
fix: reduce debug logging verbosity for assistant responses
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 26s
Docker Build and Publish / Build and Push (push) Successful in 1m27s
dbbb353d5d
- Remove per-delta logging (too spammy for short chunks)
- Remove per-chunk parsing logs
- Remove per-merge logging
- Keep only the final accumulated response logging
- Add better debugging for cases where no content is accumulated
fix: implement strict OpenAI tool schema validation
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 34s
Docker Build and Publish / Build and Push (push) Successful in 1m38s
dc744c55ad
- Create Pydantic models based on OpenAI SDK type definitions
- Separate validation from transformation with helper functions
- Add collapse_anyof_nullable to handle OpenAPI/Pydantic patterns
- Add filter_to_basic_json_schema for llama.cpp compatibility
- Ensure clean, valid tool schemas for llama.cpp backend
fix: include reasoning_content in ChatChunk content accumulation
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 39s
Docker Build and Publish / Build and Push (push) Successful in 1m41s
266dbaaa10
The critical issue was that reasoning_content (thinking) from llama.cpp
wasn't being accumulated into messages, causing:
- 'No content accumulated' warnings
- Tool calls embedded in thinking never being extracted
- Tool execution failing completely

Now both content and reasoning_content are combined and accumulated,
allowing proper tool call detection and execution.
fix: add missing Any import and fix line length
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 32s
Docker Build and Publish / Build and Push (push) Successful in 1m42s
a956c5527c
feat: dual-parser architecture for tool call handling
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 28s
Docker Build and Publish / Build and Push (push) Successful in 1m33s
d97f4c9024
Implements clean separation of concerns using two StreamParsers:
- Container parser tracks tool_calls_begin/end boundaries
- Tool call parser extracts individual calls
- Immediate execution for standalone calls (low latency)
- Batch execution for calls inside containers

This provides optimal performance using StreamParser while maintaining
clean, modular code that handles both single and multiple tool calls.
fix: handle edge cases and add error handling
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 27s
Docker Build and Publish / Build and Push (push) Successful in 1m24s
4e50deffd0
- Execute tool calls from unclosed containers when stream ends
- Validate JSON syntax before accepting tool calls
- Better error logging for malformed tool calls
- Include content preview in debug logs for failed extractions
- Refactor to reduce nesting depth in stream completion handler
debug: add strategic logging to trace content accumulation
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 26s
Docker Build and Publish / Build and Push (push) Successful in 1m37s
f2bea55299
Add debug logging to understand why content isn't being accumulated:
- Log when chunks are skipped or merged
- Show what content is being merged
- Track reasoning content extraction
- Log final accumulated state

This will help identify where the content flow is breaking.
fix: improve content accumulation in streaming messages
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 27s
Docker Build and Publish / Build and Push (push) Successful in 1m24s
40501186e5
- Fix merge logic to handle empty string initialization
- Add detailed logging to trace accumulation issues
- Handle both None and empty string cases in content merging
feat: implement stream halting for immediate tool execution
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 26s
Docker Build and Publish / Build and Push (push) Successful in 1m39s
d05cb96999
- Add detection for complete tool calls (single and batch)
- Halt stream immediately when tool calls are ready
- Execute tools without waiting for model to finish rambling
- Prevents model from closing thinking tags prematurely
refactor: simplify ToolCall model with response field
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 27s
Docker Build and Publish / Build and Push (push) Successful in 1m34s
25550674d4
- Add public response field to ToolCall (None = unanswered)
- Simplify is_answered property to check response field
- Prepare for cleaner tool response handling in streaming
- Add process_tool_calls method to AssistantMessage
- Support both structured and extracted tool call modes
- Tool responses stored in ToolCall.response field
- AssistantMessage handles formatting and appending outputs
- Extracted tools append formatted outputs to content
- Structured tools return ToolMessage objects
- Break long lines for mode assignment and logging statements
- Extract tool count calculation to separate variable
- Auto-format with ruff for consistency
feat: integrate ConversationHandler architecture into model_handler
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 31s
Docker Build and Publish / Build and Push (push) Successful in 1m33s
a4dbebc71a
Replace LlamaCppStreamingProcessor with ConversationHandler to enable
transparent multi-cycle completion handling for tool execution. The new
architecture orchestrates client streams across multiple backend requests,
executing tools seamlessly without exposing implementation details to clients.

Key changes:
- Replace processor with ConversationHandler in model_handler.py
- Implement proper tool execution using ToolExecutor in conversation.py
- Fix type compatibility with ToolRegistryAdapter pattern
- Make tool marker detection static in stream_turn.py
- Remove obsolete streaming methods from model_handler.py
fix: remove duplicate /v1 from backend URL path
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 29s
Docker Build and Publish / Build and Push (push) Successful in 1m27s
3bf398112f
fix: add tool injection and correct add_generation_prompt in new streaming flow
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 32s
Docker Build and Publish / Build and Push (push) Successful in 1m35s
a3d294b7e2
- Pass tools from model_handler through ConversationHandler to CompletionHandler
- Set add_generation_prompt correctly: true for initial requests, false for continuations
- Tools are now properly included in backend requests
- Fixes issue where models couldn't see available tools
fix: handle reasoning_content in ChatChunk for DeepSeek R1 thinking mode
All checks were successful
Docker Build and Publish / Lint & Test (push) Successful in 27s
Docker Build and Publish / Build and Push (push) Successful in 1m26s
90c5d873d7
DeepSeek R1 sends reasoning_content instead of content in deltas when
in thinking mode. Update ChatChunk.from_json_data to check for both
fields and use whichever is present.
ci: optimise Docker workflow and build caching
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 58s
0672fe245d
Consolidate CI pipeline to reduce runtime and improve build cache efficiency.

Switch to persistent BuildKit container at tcp://buildkit:8125 for shared cache
across builds, eliminating slow registry cache transfers. Fix Docker layer
invalidation by copying only source code instead of entire repository.

Add Python toolchain caching and uv-lock pre-commit hook to ensure consistent
dependencies. Configure prek for comprehensive linting and testing in single job.
fix: set correct ownership for virtual environment in Docker container
Some checks failed
Docker Build and Publish / Build and Push (push) Failing after 7s
3bd09ffc9f
Add --chown=appuser:appuser to ensure runtime user can execute uvicorn
fix: downgrade setup-uv action to v5 for Node 18 compatibility
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m41s
385afa5c5f
The v6 action requires Node 20+ due to File API usage
fix: resolve container startup permission errors for virtual environment
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 5m1s
a24fab3da5
Container was failing with 'Permission denied' when trying to execute uvicorn
from the virtual environment. Changed to keep root ownership of .venv for
security while ensuring all binaries are executable by non-root users.

The uv.lock update includes a minor coverage package version bump.
ci: add markdown linting and automatic hook updates to pre-commit
Some checks failed
Docker Build and Publish / Build and Push (push) Failing after 34s
8594f63f61
Integrate markdownlint-cli2 with comprehensive style rules to enforce
consistent markdown formatting across documentation. Pin specific hook
versions and add automatic update check to detect outdated dependencies.

The autoupdate hook runs first to ensure all tools are current before
executing quality checks, preventing version drift in CI pipeline.
docs: fix markdown formatting to comply with linting standards
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 2m7s
fb2d6b4303
Resolve all markdownlint violations across documentation files following the
newly added linting configuration. Changes include wrapping long lines at 100
characters, converting asterisk lists to dashes, and adding language
specifiers to code blocks.
refactor: use specific message types instead of generic BaseMessage
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 3m52s
993c7be03b
Replace direct BaseMessage instantiation with appropriate concrete message
types (UserMessage, AssistantMessage, SystemMessage, ToolMessage) throughout
the codebase. This follows the architectural principle of preferring explicit
types over generic base classes for better type safety and code clarity.

Extract message building logic in realtime session to reduce method complexity,
addressing ruff C901 violation. Enhance httpx mock fixture to support GET
requests alongside existing streaming functionality for comprehensive testing.
ci: optimise pre-commit hook execution in CI pipeline
Some checks failed
Docker Build and Publish / Build and Push (push) Failing after 12s
6d36fa764a
Configure prek to run with CI-specific settings using --hook-stage flag,
allowing selective hook execution. Skip autoupdate check in CI environments
to reduce noise and unnecessary network calls during automated builds.

Add --no-progress flag to autoupdate hook for cleaner output when checking
repository updates during local commits.
fix: correct pre-commit stage configuration for CI compatibility
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 4m32s
d5aedd7a61
Fix invalid 'ci' stage value that caused workflow failures. Use 'manual' stage
for CI runs and 'pre-commit' for local commits. Configure autoupdate hook to
only run during local commits, preventing unnecessary network calls in CI.

All quality checks (pytest, mypy, ruff) now run in both local and CI contexts,
while autoupdate remains local-only for efficiency.
fix: improve Docker build caching and optimise container build time
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m10s
b957c3b130
Enable BuildKit registry caching to persist uv package downloads between CI runs,
preventing redundant downloads. Add sharing=locked to cache mounts to prevent
concurrent build corruption.

Optimise chmod operation to only make binaries executable rather than recursively
processing the entire virtual environment, significantly reducing build time.
ci: reduce pytest verbosity in pre-commit hooks
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 53s
3a961f5fab
Replace verbose pytest output (-xvs) with quiet mode (-x -q) to show only
failures and summary. Reduces CI log noise while maintaining visibility of
actual test failures.
ci: suppress ffmpeg warning during pytest runs
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m0s
14cd8346a1
Add filter for the RuntimeWarning about missing ffmpeg/avconv from pydub.
This is expected behaviour as ffmpeg is only installed in the Docker
container, not in the CI runner where tests execute.
ci: use uv run instead of uvx for ruff pre-commit hooks
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 47s
379fbd73bb
Switch from uvx to uv run for ruff commands to leverage existing dependency
caching. Since ruff is installed via uv sync, using uvx causes redundant
downloads as a separate tool, defeating our caching strategy.
fix: add cache-bust ARGs to force rebuild with corrected BuildKit umask
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m46s
2df273f161
BuildKit container now has umask 022 configured, but cached layers still have
the old 770 permissions. Adding ARG CACHE_BUST forces fresh layer builds.
fix: explicitly create /app with correct permissions
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 2m10s
9415f5f14e
Even with BuildKit umask fixed, WORKDIR creates directories with wrong
permissions. Explicitly creating /app with chmod 755 before WORKDIR.
feat(pre-commit): enhance hooks with comprehensive code quality checks
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 2m2s
0988cd3143
Add conventional commit message validation, typo checking, and markdown
linting to enforce consistent code and documentation standards. Remove
default_stages restriction that was preventing commit-msg hooks from
running. Fix duplicate reasoning content accumulation in streaming.

Also remove appuser from Dockerfile as container now runs as root,
eliminating permission issues with BuildKit.
fix(streaming): allow reasoning content with tool markers to stream to client
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m10s
f48498bee8
DeepSeek R1 models generate reasoning content that should be visible to
clients wrapped in <think> tags. Previously, any content containing tool
markers was suppressed entirely, preventing reasoning from being displayed.

Now, when in thinking mode, content is always streamed to the client even
if it contains tool markers, ensuring reasoning visibility while still
suppressing actual tool call invocations.
fix(streaming): remove duplicate reasoning content accumulation
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m5s
ed527095db
ChatChunk.from_json_data already converts reasoning_content to content when
there's no regular content. The additional accumulation logic was causing
content to be doubled internally (OkayOkay, the the), which was interfering
with tool call extraction.

Simplified by removing the redundant second accumulation stage entirely.
feat(streaming): add comprehensive logging for tool call detection
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m17s
47187bfeb0
- Add WARNING level logs when tool calls are detected in various forms
- Add DEBUG_SHOW_ALL_TOKENS flag to disable suppression for debugging
- Log final accumulated content with length
- Log when tool_calls appear in delta
- Log when content is yielded despite is_tool_calling being true

This will help identify why the stream appears to pause when processing
DeepSeek R1 reasoning that mentions tools.
fix(streaming): properly attach tool calls from delta to ChatChunk message
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 56s
ecd51b3788
Tool calls were being detected and logged but not actually attached to
the message object, preventing accumulation and execution. Now converts
delta tool_calls to ToolCall objects and attaches them to the message.
feat(streaming): add comprehensive logging for tool call detection
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 59s
7ff30aa0fe
Added detailed logging to track tool call creation and attachment to messages.
Also added fallback detection for tool_calls on message when has_tool_calls
flag is false, which shouldn't happen but helps with debugging.
fix(streaming): properly handle partial tool call deltas from streaming API
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 58s
9ba005bbc7
Tool calls arrive as fragments across multiple deltas (first with id/name,
then subsequent chunks with argument pieces). Now correctly creates partial
ToolCall objects that get merged by the accumulation logic.
refactor(chat): unify streaming backend for all chat completions
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 59s
8934a9cac6
Refactor non-streaming chat handler to use the streaming backend internally
and collect responses. This ensures consistent tool handling and model
configuration loading whether clients request streaming or not.

Extract response accumulation to separate helper function to reduce complexity.
Fix test mocks to properly handle async iteration of streaming chunks.
fix(streaming): add logging to debug tool call accumulation in non-streaming mode
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 54s
de1e874b75
fix(streaming): transfer tool calls from ChatChunk to streaming delta
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 56s
eacae9b99f
Tool calls detected in ChatChunk messages weren't being included in the
streaming delta when converting to ChatStreamingChunk format. This caused
tool calls to be lost when using non-streaming mode, despite being properly
detected and logged during chunk processing.
fix(streaming): add diagnostics to trace tool call handling from llama.cpp
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 57s
911c036551
Add targeted logging to understand exact format of tool call deltas from llama.cpp.
Logs full JSON when tool_calls field detected in delta, and tracks extraction
results to identify why DeepSeek R1 tool calls within think tags aren't being
properly parsed and forwarded to clients.
fix(streaming): yield accumulated tool calls for server-side execution
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 54s
06dc977c5a
When llama.cpp sends tool_calls as streaming deltas, they are accumulated
into current_message.tool_calls but never yielded back through the streaming
pipeline. This causes _collect_streaming_response to receive no tool_calls
and create an AssistantMessage with tool_calls=None.

Add logic to yield a final streaming chunk containing the accumulated tool
calls when the stream completes, ensuring they can be executed server-side.
fix(streaming): ensure message is created when tool_calls are present
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 55s
0a5227d856
ChatChunk.from_json_data was only creating a message when content was present
in the delta. When llama.cpp sends tool_calls without content, no message was
created, preventing tool_calls from being attached. This caused accumulated
tool_calls to be lost.

Now create an AssistantMessage whenever tool_calls are present in the delta,
ensuring they can be properly accumulated via the merge method.
fix(streaming): merge tool call deltas by index to properly accumulate arguments
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 58s
ed563240d4
refactor(streaming): eliminate inefficient message merging pattern
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 57s
48280c1459
Replace the creation and merging of hundreds of AssistantMessage objects
per streaming response with a single message that accumulates deltas directly.
Tool calls now properly accumulate by index without creating intermediate objects.

Breaking change: ChatChunk transformed from message factory to lightweight delta
carrier. ChatStreamingDelta.tool_calls changed from ToolCall objects to raw dicts
for efficiency.

Removes all merge() methods following YAGNI principle - direct accumulation is
simpler and more performant.
fix(streaming): enable server-side tool execution in conversation flow
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 58s
53f4786bc1
Tool calls were being detected but not properly triggering server-side execution.
The CompletionHandler now checks for pending tool calls after stream completion
and signals the ConversationHandler to handle them before continuing.

Added source tracking to ToolCall model to distinguish between structured
(API-provided) and extracted (content-parsed) tool calls, ensuring correct
processing mode for each type.
fix(streaming): add assistant message before tool messages in structured mode
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 53s
3168b8b2fb
The conversation handler was not adding the assistant message containing
tool calls before adding the tool response messages. This caused llama.cpp
to reject subsequent requests with 'Cannot have 2 or more assistant
messages at the end of the list' error.

Now properly adds both the assistant message with tool calls and the
subsequent tool messages to maintain correct conversation flow.
fix(streaming): remove duplicate assistant message addition in completion handler
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 56s
95c83a69ff
The completion handler was adding the assistant message to the conversation
before signaling tool_execution_needed, and then conversation handler was
also adding it when processing structured tool calls. This caused llama.cpp
to reject the request with 'Cannot have 2 or more assistant messages at
the end of the list' error.

Now only conversation.py handles adding messages to maintain single
responsibility and avoid duplication.
fix(streaming): enable server-side tool execution in conversation flow
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m3s
4debd032f6
Refactored tool message handling to properly maintain conversation structure:
- AssistantMessage.process_tool_calls now returns complete message sequence
- For structured mode: returns [AssistantMessage, ToolMessage1, ToolMessage2, ...]
- Preserves tool_calls on AssistantMessage (no longer cleared)
- Conversation handler now extends messages with the complete sequence

Also updated ruff config to globally ignore C901 and PLR1702 rules that
were being consistently overridden with noqa comments.

This ensures the correct message order for llama.cpp compatibility while
maintaining tool call information on assistant messages.
fix(streaming): halt immediately on tool detection and filter malformed markers
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 57s
0fce773e27
- Stop stream processing immediately when tool calls are detected
- Don't close thinking tags when halting for tools (continue after execution)
- Filter out malformed tool call markers with Unicode characters
- Add detection for additional malformed tool call variants
- Prevent model-generated malformed tool syntax from leaking to users
fix(llamacpp): disable native tool parsing to prevent DeepSeek R1 crashes
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 57s
4807f53f5c
Adds parse_tool_calls: false to all llama.cpp requests to prevent the server from
attempting to parse tool calls with its grammar system. DeepSeek R1 and other
models use custom tool formats that conflict with llama.cpp's grammar parser,
causing crashes when the server tries to constrain output.

Our extraction method handles all tool formats correctly, so we bypass llama.cpp's
native parsing entirely. Also documents other available llama.cpp parameters for
future use (reasoning_format, thinking_forced_open, chat_template_kwargs).
fix(streaming): improve tool extraction to handle structured separators
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 54s
71e9468232
Enhanced tool call extraction to recognise explicit separator tokens like
<|tool▁sep|> used by models to delineate function names. Now extracts all
tool calls including placeholder names (e.g. FUNCTION_NAME) so the execution
layer can return appropriate errors, guiding models to correct behaviour.

Also defaults to empty JSON object when no parameters are present, as many
tools have optional parameters. Added logging to track separator detection
for debugging model-specific formats.
refactor(messages): remove embed_tool_responses config and fix continuation handling
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 55s
f4bf2cdedd
Since we now handle all tool parsing ourselves with parse_tool_calls: false,
the embed_tool_responses config is unnecessary. Tool handling is determined
purely by whether calls are extracted (embedded in assistant) or structured
(separate messages).

Fixed duplicate assistant message issue by preventing _add_assistant_starter
from adding new messages during continuations. The assistant message already
exists with embedded tool responses for extracted mode, so we just continue
it rather than creating a new one.
refactor(logging): improve tool call and conversation logging clarity
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 55s
6581665b82
Replace misleading warning-level logs with appropriate debug/info levels throughout
the streaming pipeline. Focus logging on actionable issues and state transitions
rather than implementation details.

Key improvements:
- Add state machine tracking for tool extraction process
- Remove noisy warnings for expected conditions (e.g. word 'tool' in content)
- Log tool batches with names for better execution visibility
- Track conversation cycles and continuation states clearly
- Reserve warnings for genuine issues (unknown tools, malformed JSON)
- Simplify accumulated content logging to reduce debug spam
refactor(llamacpp): unify tool handling via inject_tools_in_prompt config
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 56s
403296efda
Simplify tool extraction control by using inject_tools_in_prompt as the
single configuration point. When enabled, tools are injected into the prompt
using DeepSeek R1's exact format and parse_tool_calls is disabled to prevent
llama.cpp from parsing them natively.

This eliminates the need for special-case logic and provides a consistent
approach for models requiring custom tool formats like DeepSeek R1 and Gemma.
fix(streaming): ensure extracted tool calls are processed through unified pipeline
Some checks failed
Docker Build and Publish / Build and Push (push) Failing after 25s
a22768e25f
Extracted tool calls were being generated but never added to the streaming
state's pending_tool_calls, causing them to be skipped during execution.
Now properly stores extracted calls for processing alongside structured ones.

Includes duplicate detection to handle malformed model outputs where the same
tool call appears multiple times with incorrect separators.
fix(llamacpp): inject tools into system message instead of user message
Some checks failed
Docker Build and Publish / Build and Push (push) Failing after 16s
0a82d4afde
Tools should always be injected into the system message to provide context
without affecting the user's actual query. Creates a system message if the
first message isn't already a system message, ensuring proper separation
of tool instructions from user content.
ci(hooks): remove problematic typos hook from pre-commit configuration
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m43s
d2d078b039
The typos hook was causing CI authentication and Python version conflicts
during the build process. Removing this hook resolves runner issues and
allows proper execution of other quality checks.
fix(thinking): synchronise think tag injection between model and user
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m15s
58e79d56b0
Previously, the model received <think> tags via assistant starter messages but
users only saw the closing </think> tag. This caused confusion as the opening
tag was missing from the user's perspective.

Now both model and user receive identical <think> tags when enforce_thinking
is enabled. The assistant starter conditionally includes <think> based on the
backend configuration, and the streaming handler manually injects the same tag
to the user's stream.
refactor(streaming): remove deprecated LlamaCppStreamingProcessor
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m18s
e53a873822
The old streaming processor was replaced by ConversationHandler in commit
a4dbebc7 but never removed. This processor duplicated thinking tag injection
logic and was not used in production.

ConversationHandler now handles all streaming with proper tool execution
and thinking tag synchronisation between model and user.
fix(streaming): preserve conversation state across tool execution cycles
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m37s
6d7d8044bc
Shared ConversationContext between completion cycles prevents creating fresh
AssistantMessage on continuation. This allows proper accumulation of new
content after tool execution, fixing infinite loop where tool_calls_end
markers were incorrectly re-detected in subsequent streaming passes.

Also restructures pre-commit hooks for better performance and reliability
by running format checks before expensive test suites.
fix(streaming): clear tool state after execution to prevent re-detection
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m6s
49b050294d
Tool calls were persisting in ConversationState.current_message after
execution, causing continuations to immediately re-detect and re-execute
the same tools. Now properly clears tool_calls, pending states, and flags
after processing to allow clean continuation.
fix(streaming): append assistant message before tool execution in extracted mode
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m8s
1bd7f02d67
Messages were being lost during tool execution because the assistant message
wasn't added to the conversation before being modified with tool results.
This caused the message count to reset on continuation, breaking the
conversation flow after tool calls.
fix(streaming): process only new chunks through StreamParser, not accumulated content
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m17s
fc4d6b8c52
StreamParser was being misused - instead of feeding it new chunks incrementally
as designed, the entire accumulated content was being re-processed on every
chunk. This caused old tool call markers to be re-detected on continuation
cycles, resulting in infinite loops with repeated tool execution.

Modified _handle_content_chunk to feed only new content chunks to the parsers,
allowing them to maintain state properly. Removed _extract_complete_tool_calls
method entirely as it was the source of the re-scanning behaviour.
fix(streaming): clean up output and improve thinking block tool handling
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m17s
4e097cd839
Removed DEBUG_SHOW_ALL_TOKENS flag that was leaking all content to users,
causing duplicated thinking content and malformed tool markers in output.

Enhanced think tag buffering to keep tool calls inside thinking blocks when
they immediately follow </think> tags, allowing models to continue reasoning
after tool execution before presenting final output to users.
test(streaming): add comprehensive raw content logging for tool call diagnosis
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m14s
a1a61c5ade
Log the raw content from llama.cpp before any processing to identify why
tool calls aren't being generated or extracted. Tracks content through
filtering decisions to reveal whether the model produces tool syntax that
gets incorrectly suppressed.
feat(vllm): add vLLM backend support as alternative to llama.cpp
Some checks failed
Docker Build and Publish / Build and Push (push) Failing after 49s
dd83621e36
Implement vLLM as an additional backend provider option alongside llama.cpp.
vLLM provides an OpenAI-compatible API with native GGUF support, offering
an alternative approach to handling tool calls and model inference.

- Create vLLM service lifecycle handler for container management
- Add unified LLM handler factory to route between providers
- Update configuration to use vLLM for tools and agent models
- Maintain backwards compatibility with existing llama.cpp backends
fix(types): resolve mypy errors in vLLM integration
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m44s
9f31cd88a7
Fixed type checking issues that bypassed pre-commit hooks in the vLLM feature
commit. Corrected ServiceType enum usage, added type ignore for legitimate
inheritance override, and made UnifiedLLMHandler properly extend base class
to satisfy ModelManager's type requirements.
fix(vllm): handle missing backend config attributes safely
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m17s
4c04a925bb
Prevent AttributeError when optional vLLM backend configuration fields
are not specified. Uses getattr with sensible defaults for quantisation
and tensor_parallel_size to allow minimal configurations to work.
fix(vllm): prevent NoneType comparison for tensor parallelism config
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m12s
6a0e66c646
The tensor_parallel_size attribute may not exist on BackendConfig objects.
When missing, getattr returns None which cannot be compared with integers.
Added explicit None check before numerical comparison to prevent TypeError.
fix(vllm): ensure GPU count defaults to 1 when tensor_parallel_size is None
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m10s
6f88406d69
The tensor_parallel_size attribute may be None after getattr with None default.
Using 'or 1' ensures we always pass a valid GPU count to Docker device requests,
preventing vLLM container startup failures when GPU detection is required.
fix(vllm): use v0.8.5 image for CUDA 12.4 compatibility
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m19s
836eed42d2
The latest vLLM images require CUDA 12.8 but TrueNAS driver 550.142 only
supports CUDA 12.4. Pin to v0.8.5 which maintains CUDA 12.4 compatibility
whilst providing stable tool calling functionality.
fix(vllm): add .gguf extension to GGUF model paths for proper file resolution
All checks were successful
Docker Build and Publish / Build and Push (push) Successful in 1m18s
a108f74b31
GGUF models require explicit file paths with .gguf extension rather than
directory paths. vLLM was failing to load models because the constructed
path pointed to a directory instead of the actual .gguf file.
ci(vllm): add Docker image build pipeline with CUDA 12.4 support
Some checks failed
Docker Build and Publish / Build and Push (push) Failing after 1m41s
Build vLLM Images / build-vllm (devel, 12.4.1, ubuntu22.04) (push) Has been cancelled
545115a147
Create dedicated CI workflow to build vLLM images compatible with TrueNAS
CUDA 12.4 driver constraints. Avoids upstream version incompatibilities by
compiling vLLM against specific CUDA toolkit versions.

Multi-stage Dockerfile minimises runtime image size whilst including all
necessary CUDA libraries. Weekly rebuilds ensure latest vLLM improvements
whilst maintaining driver compatibility.
refactor(ci): consolidate Dockerfiles and optimise CI triggers
Some checks failed
Docker Build and Publish / Build and Push (push) Failing after 1m24s
Build vLLM Images / build-vllm (devel, 12.4.1, ubuntu22.04) (push) Failing after 9m47s
2fc66b5b11
Move all Dockerfiles to dedicated docker/ directory for better organisation.
Configure workflows to use shared buildkit container for faster builds.

Add path-specific triggers to main Docker workflow to prevent unnecessary
rebuilds when only documentation or non-code files change. Both workflows
now use the same shared buildkit infrastructure for consistency.
fix(docker): limit vLLM build parallelization to prevent OOM
Some checks failed
Build vLLM Images / build-vllm (devel, 12.4.1, ubuntu22.04) (push) Failing after 2m17s
11b0b4bc47
- Set MAX_JOBS=8 to limit cmake parallel compilation
- Add --no-build-isolation flag for better dependency control
- Add cmake to build dependencies
- Prevents build failures on high-core count systems
fix(docker): add setuptools-scm build dependency for vLLM
Some checks failed
Build vLLM Images / build-vllm (devel, 12.4.1, ubuntu22.04) (push) Failing after 3m27s
046f67c6b2
vLLM requires setuptools-scm for version management during the build process.
Without it, the build fails with ModuleNotFoundError when compiling from source.
perf(docker): add BuildKit cache mounts for pip wheels
Some checks failed
Build vLLM Images / build-vllm (devel, 12.4.1, ubuntu22.04) (push) Failing after 3m16s
7dac61e934
Enable Docker BuildKit cache for all pip install operations to significantly
speed up rebuilds. PyTorch alone is ~2GB of downloads that will now be cached
between builds, reducing both build time and bandwidth usage.
fix(docker): install CMake 4.0.3 to meet vLLM requirements
Some checks failed
Build vLLM Images / build-vllm (4.0.3, devel, 12.4.1, ubuntu22.04) (push) Failing after 17m55s
ff42234e04
vLLM requires CMake 3.26+ but Ubuntu 22.04 ships with 3.22.1. Install latest
CMake 4.0.3 from official binary distribution. Made version configurable via
build argument for easier future updates.
fix(docker): pin PyTorch 2.4.0 and optimise GPU architecture targets
Some checks failed
Build vLLM Images / build-vllm (4.0.3, devel, 12.4.1, ubuntu22.04) (push) Failing after 15m34s
115547b8dd
vLLM requires PyTorch 2.4.0 but was pulling 2.6.0, causing version mismatch
errors. Also targets only modern GPU architectures (Ampere and newer) to
reduce build time and binary size by ~60%. Supports RTX 3090 and all newer
NVIDIA GPUs including Ada Lovelace and Hopper architectures.
feat(docker): build PyTorch from source for CUDA 12.4 compatibility
Some checks failed
Build vLLM Images / build-vllm (4.0.3, devel, 12.4.1, ubuntu22.04, v2.8.0) (push) Failing after 3m51s
3b53b6c1ab
Build PyTorch v2.8.0 from source to get Float8_e8m0fnu support required by
latest vLLM. Uses ccache with BuildKit cache mounts to speed up rebuilds.
Limited to 8 parallel jobs to avoid memory exhaustion on the build server.
fix(docker): add python-is-python3 package for PyTorch build
Some checks failed
Build vLLM Images / build-vllm (4.0.3, devel, 12.4.1, ubuntu22.04, v2.8.0) (push) Failing after 5m27s
f705b94b83
PyTorch build scripts expect 'python' command to be available. Ubuntu 22.04
provides python-is-python3 package which creates the necessary symlinks.
fix(docker): use ccache PATH method to avoid NCCL build failures
Some checks failed
Build vLLM Images / build-vllm (4.0.3, devel, 12.4.1, ubuntu22.04, v2.8.0) (push) Failing after 29s
23aa887d06
- Replace CC/CXX environment variables with PATH manipulation
- Create symlinks in /usr/lib/ccache for all compilers
- This avoids NCCL's Makefile mishandling ccache wrapper syntax
fix(docker): remove redundant ccache symlinks
Some checks failed
Build vLLM Images / build-vllm (4.0.3, devel, 12.4.1, ubuntu22.04, v2.8.0) (push) Has been cancelled
09976ff1ee
Ubuntu's ccache package already creates symlinks in /usr/lib/ccache
feat(docker): increase PyTorch build parallelization to 16 jobs
Some checks failed
Build vLLM Images / build-vllm (4.0.3, devel, 12.4.1, ubuntu22.04, v2.8.0) (push) Has been cancelled
ac7c46f847
PyTorch compilation is CPU-bound and uses under 5GB RAM with 8 jobs,
so we can safely increase to 16 jobs while keeping vLLM builds at 8
fix(docker): reduce PyTorch build parallelization to 12 jobs
Some checks failed
Build vLLM Images / build-vllm (4.0.3, devel, 12.4.1, ubuntu22.04, v2.8.0) (push) Has been cancelled
09ef8be941
Build with 16 jobs peaked at over 37GB RAM, causing OOM failures on the
64GB build server. Reducing to 12 jobs should provide better balance
between build speed and memory usage.
feat(docker): implement PyTorch wheel build and package registry workflow
Some checks failed
Build vLLM Images / build-vllm (12.4.1, ubuntu22.04, 2.8.0) (push) Failing after 57s
Docker Build and Publish / Build and Push (pull_request) Has been skipped
afe2d2eb4c
Create separate PyTorch wheel builder that compiles once and publishes to
Forgejo's generic package registry. This avoids recompiling PyTorch for
every vLLM build and handles both dev/runtime image requirements.

The wheel is built via workflow dispatch and stored in the package registry,
then referenced by URL in downstream Dockerfiles. This provides versioned,
cached PyTorch builds that persist beyond build cache expiry.
tom merged commit 3eab905674 into main 2025-09-05 10:18:39 +01:00
tom deleted branch dev 2025-09-05 10:18:39 +01:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tom/neuromancer#1
No description provided.