AI/Ollama: Difference between revisions

From Chorke Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 69: Line 69:
{|class='wikitable mw-collapsible mw-collapsed'
{|class='wikitable mw-collapsible mw-collapsed'
!scope='col' style='text-align:left'|
!scope='col' style='text-align:left'|
[https://editor.plantuml.com/uml/TLHTRzf047pthzYtmX5mrSUggf92lYIQHg6ewkFwl8OBzu7kpYWexd_FsWSgfAujsPPtPdPcx-pB4D77W3fcOQFi5ZqQA51kLjxLjfmwxJn4ZpPmYUmfggGAQnrld8qFQ0YCiuvHaihiDNlzESv6EVp_QE39kGWtIXCizY6IoKOZ8V8w-5mxYJelbCs5-8k_VY5PHxKZ-AOiUaM16IEdDHhix_0ZrHRaT-HXi8XeIzJE7XJsmrP0QgnBwXUOfjhKjPVvVN98KUQXSrmUxibcqpLIyjJN9AHNvx8fRmd7OiL3IDexvFIgo9EfiUJ6dNW33zX5LvONZAMZShRZ1HZSQbUW7YQ5jJEKVwbP-kl9z5g59R_x4E2E0J1JdcHqVjzQ5MAIt9uHpaOumuZdfD5QXVQvTsppUvYkCS9TmdHu8GodqQ6doG9tLCBthjnXHmL72ZtRfu5Vf5Rh6428mMeRnow4pvy-5aCX4j6WLHM5-9xS97am5B7aC8VSQSuzbavNlYUyTBPIgybpu5Lunxnwl8Vbq_m2DhGF5x3rxfc9m8n8Bx5Z6p8ir1llaNpWeV6N1WcLZ9hz-21qxhM1twxs_2dkIBkjQMcLqoNvhiynNb-VWwsm9Qag9NalngZiAir_EhzQpgudl4N_8TxMggJ2kGqtyY1ht_uLZYcpFtnUaYtR7yeR Sturcture]
[https://editor.plantuml.com/uml/TLJHRjem57tFLzo7IMsTGgOzxgaTq1M9RHKCJNjqx1jmiNsHxQ2Yid-V4uSAQ9f8IPHxpZttdDYvyu6xuBaEcTyek-ME6oYvsAmSrLQEI9E3uBZriOJs52MnuhKE3sJ3JsuG35cAA84oUyrUVm-odB62CUW_3Z0d2l2WDC9YxmEQB6y8jNU59i5rKIfxGZ_0v0L57NWP2T-KvKwXZvISjEQ6zqlyIhK5kXqwk5u4RYNNPF4cYM4GXM_5PNTFYeTk9DbIbJXL6lhqwbpfY3yMg-WhQQp9OF5fkPZCpyCum3A2hpmyHW2CbKCHoEqR8SQ6VH6SaZu88nUQr_9ijZVcA9L6r5ncqquecNhtdTwo6_F0pwVBryetpxqQiocCrZp0OyAqUCPCrDI-HsJ1ToZXHrjkiNcfgVH5N-y_g5Rhu86nwzKs3CZxhry-bpUC9QBXLbNemoNvaDg3mS1bDDFvJa7rZ7GQKuzvGRPIg-6p9tl9l7-Qmd8-kuKDxlqjR1qzHo94Hi2NqB8DcjYerzuJEXz3YtiC4YeDQlQ3hj7bIlXBjOkRS8UQjgQXLQGbkbRdQAzFfnlBRr6eIecuP4DGTfNo7ytkb-F9CAxaTyJljP9O4csYa0DHk-P_S7IP_Old7LhP_4h-0m00 Sturcture]
|-
|-
|valign='top'|
|valign='top'|
Line 78: Line 78:
{
{
{T-
{T-
+/                           | Root File System
+**/**                      | Root File System
++**/usr/local/bin/**        | Executive Binaries
++**/usr/local/bin/**        | Executable Binaries
+++ollama                    | Ollama Server (Standalone Binary)
+++ollama                    | Ollama Server (Standalone)
+++claude                    | Claude Code CLI
++**/etc/systemd/system/**  | Systemd Services
++**/etc/systemd/system/**  | Services
+++ollama.service            | Systemd service file
+++ollama.service            | Systemd service file
++**/var/lib/claude-code/** | Native installation files (Global)
++**/home/$USER/**           | User's Home Directory
++**/home/<user>/**         | User's Home Directory
+++**.local/bin/**           | User's Executable Binaries
++++claude                  | Claude Code CLI
+++**.ollama/**              | Ollama Data Directory
+++**.ollama/**              | Ollama Data Directory
++++history                  | CLI Chat History
++++history                  | CLI Chat History

Revision as of 09:30, 1 March 2026

curl -fsSL https://ollama.com/install.sh | sh
ollama pull gpt-oss:20b
ollama --version
ollama ls

curl -fsSL https://claude.ai/install.sh  | bash
ollama launch claude --model gpt-oss:20b
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export OLLAMA_NUM_CTX=32768
export OLLAMA_KEEP_ALIVE=5m

claude --model gpt-oss:20b

Diagram

@startuml
autonumber
skinparam backgroundColor    transparent
skinparam DefaultFontName    Helvetica
skinparam actorStyle         awesome
skinparam ParticipantPadding 20
skinparam BoxPadding         10

title Claude ↔ Local FS ↔ Ollama ↔ GPT-OSS:20b

actor "Developer"                  as dev

box "Local Development PC" #LightBlue
    participant "Claude Code CLI"  as claude
    participant "Local Filesystem" as fs
end box

box "Kubernetes Cluster (K3s)" #Yellow
    participant "Ollama Service"   as ollama
    participant "GPT-OSS:20b"      as model
end box

dev      -> claude : Runs "claude --model gpt-oss:20b"
claude   -> fs     : Scans repository context
fs      --> claude : File contents / Git history

claude   -> ollama : POST /v1/messages (Anthropic API)
note right: Payload includes system prompt \nand local code context

ollama   -> model  : Load weights into GPU VRAM
model   --> ollama : Inference processing...

ollama -->> claude : Streamed Response (Tokens)
claude   -> dev    : Displays suggested code changes

@enduml

Sturcture

@startsalt
skinparam backgroundColor transparent
skinparam defaultFontName monospaced
{
{T-
+**/**                       | Root File System
++**/usr/local/bin/**        | Executable Binaries
+++ollama                    | Ollama Server (Standalone)
++**/etc/systemd/system/**   | Systemd Services
+++ollama.service            | Systemd service file
++**/home/$USER/**           | User's Home Directory
+++**.local/bin/**           | User's Executable Binaries
++++claude                   | Claude Code CLI
+++**.ollama/**              | Ollama Data Directory
++++history                  | CLI Chat History
++++**models/**              | Saved Models
+++++blobs/                  | Weights **(gpt-oss:20b)**
+++++manifests/              | Model metadata
+++**.claude/**              | Claude Code Data Directory
++++config.json              | API URL, keys, project context
++++memory/                  | Persistent memory
+++**my-project/**           | Your development folder
++++.claude/                 | Project specific settings
++++CLAUDE.md                | Guidebook for current project
}
}
@endsalt

Optimization

Optimization

Yoga Pro 7i (G9 + U7 155H + 32GB + 1TB)

Variable Value Impact
OLLAMA_FLASH_ATTENTION 1 Reduces memory usage and speeds up processing for long code files. Highly recommended for coding.
OLLAMA_KV_CACHE_TYPE q8_0 or q4_0 Compresses the short-term memory cache. q8_0 saves space with almost no quality loss; q4_0 saves even more space.
OLLAMA_NUM_PARALLEL 1 Crucial for 32GB RAM. Limits Ollama to one task at a time to prevent Out of Memory crashes when using a 20B model.
OLLAMA_KEEP_ALIVE 30m Keeps the 20B model in your RAM for 30 minutes after use so you don't have to wait 20 seconds for it to reload every time.
OLLAMA_NUM_CTX 6384 to 32768 The most important. Controls the brain capacity. 32k is standard for Claude Code but uses ~3GB more RAM than the default 4k.
OLLAMA_NUM_GPU 999 Forces Ollama to offload as many layers as possible to your Intel Arc iGPU instead of the slower CPU.

References

References