AI/Ollama: Difference between revisions
Jump to navigation
Jump to search
| Line 117: | Line 117: | ||
!scope='col'| Value | !scope='col'| Value | ||
!scope='col'| Impact | !scope='col'| Impact | ||
|- | |- | ||
!scope='row' style='text-align:left' | <code>OLLAMA_FLASH_ATTENTION</code> | !scope='row' style='text-align:left' | <code>OLLAMA_FLASH_ATTENTION</code> | ||
| Line 127: | Line 124: | ||
| <code>q8_0</code> or <code>q4_0</code> || Compresses the '''short-term memory''' cache. <code>q8_0</code> saves space with almost no quality loss; <code>q4_0</code> saves even more space. | | <code>q8_0</code> or <code>q4_0</code> || Compresses the '''short-term memory''' cache. <code>q8_0</code> saves space with almost no quality loss; <code>q4_0</code> saves even more space. | ||
|- | |- | ||
!scope='row' style='text-align:left' | <code> | !scope='row' style='text-align:left' | <code>OLLAMA_NUM_PARALLEL</code> | ||
| <code> | | <code>1</code> || '''Crucial for 32GB RAM.''' Limits Ollama to one task at a time to prevent '''Out of Memory''' crashes when using a 20B model. | ||
|- | |- | ||
!scope='row' style='text-align:left' | <code>OLLAMA_KEEP_ALIVE</code> | !scope='row' style='text-align:left' | <code>OLLAMA_KEEP_ALIVE</code> | ||
| <code>30m</code> || Keeps the <code>20B</code> model in your RAM for <code>30</code> minutes after use so you don't have to wait <code>20</code> seconds for it to '''reload''' every time. | | <code>30m</code> || Keeps the <code>20B</code> model in your RAM for <code>30</code> minutes after use so you don't have to wait <code>20</code> seconds for it to '''reload''' every time. | ||
|- | |- | ||
!scope='row' style='text-align:left' | !scope='row' style='text-align:left' | <code>OLLAMA_NUM_CTX</code> | ||
| <code> | | <code>6384</code> to <code>32768</code> || '''The most important.''' Controls the '''brain capacity.''' <code>32k</code> is standard for Claude Code but uses <code>~3GB</code> more RAM than the default <code>4k</code>. | ||
|- | |||
!scope='row' style='text-align:left' | <code>OLLAMA_NUM_GPU</code> | |||
| <code>999</code> || Forces Ollama to offload as many layers as possible to your Intel Arc iGPU instead of the slower CPU. | |||
|} | |} | ||
|} | |} | ||
Revision as of 01:32, 1 March 2026
curl -fsSL https://ollama.com/install.sh | sh
ollama pull model gpt-oss:20b
ollama --version
ollama ls
curl -fsSL https://claude.ai/install.sh | bash
ollama launch claude --model gpt-oss:20b
|
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export OLLAMA_NUM_CTX=32768
export OLLAMA_KEEP_ALIVE=5m
claude --model gpt-oss:20b
| ||||
| |||||
Optimization
|
Optimization | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||
References
|
References | ||
|---|---|---|