Open-source language models have transformed coding and content creation. Running these **open source LLMs** locally offers privacy, control, and no API fees. Whether for **code LLM** programming or **writing model** creative work, this guide covers top **open source language models** today.
This list highlights popular **open-source LLMs for coding and writing**, organized by recency and popularity. Each model entry shows key attributes, use cases, tags, and pros and cons. These models work with tools like **Ollama**, **LM Studio**, **Jan**, and other local AI platforms.
Review the [How to Choose the Right Model](#how-to-choose-the-right-open-source-llm) section for selection guidance.
_⚠️ Hardware requirements vary by model size. Smaller models (1B-7B) can run on CPU-only systems with modest RAM, while larger models require significant GPU or CPU resources._
{{< llm-cards data="data/ollama-llms.jsonc" minwidth="300px" >}}
## How to Choose the Right Open Source LLM
Selecting an **open source LLM** depends on hardware capabilities, use case, and performance requirements.
**Hardware Constraints:**
- For limited resources: Consider smaller models like `stable-code` (3B), `codegemma` (2B/7B), `qwen2.5-coder` (0.5B-7B), `phi3` (3.8B), or `llama3.2` (1B/3B)
- For high-end hardware: Larger models like `qwen3-coder` (480B), `deepseek-coder-v2` (236B), `llama3.1` (405B), or `codellama` (70B)
**Use Case - Coding:**
- **General coding**: `qwen2.5-coder`, `codellama`, or `deepseek-coder`
- **SQL-specific**: `sqlcoder`
- **Long context/agentic**: `qwen3-coder`
- **Code completion**: `stable-code`, `codegeex4`
- **Multi-language**: `starcoder` or `starcoder2`
- **Versatile (coding + writing)**: `qwen2.5`, `llama3.1`, `mistral`, `mixtral`
**Use Case - Writing:**
- **Creative writing**: `llama3`, `llama3.1`, `mistral`, `gemma2`
- **Long-form content**: `deepseek-r1`
- **Fiction/roleplay**: `dolphin-llama3`, `dolphin-mistral`, `dolphin3`
- **Conversational**: `vicuna`
- **Lightweight writing**: `phi3`, `gemma3`, `llama3.2`
- **Versatile (writing + coding)**: `qwen2.5`, `llama3.1`, `mistral`, `mixtral`
**Popularity and Reliability:**
- Most tested: `qwen2.5` (12.3M pulls), `qwen2.5-coder` (10.1M pulls), `llama3.1` (8.5M pulls), `llama3` (6.2M pulls), `mistral` (5.8M pulls)
- Newest features: `qwen3-coder` (3 months), `llama3.2` (recent), `gemma3` (recent)
## Benefits of Running Open Source LLMs Locally
Running **open source language models** locally has these characteristics compared to cloud-based APIs:
* **Privacy**: Your code and conversations never leave your machine
* **Cost**: No per-token API fees or subscription costs
* **Control**: Full control over model versions, parameters, and data
* **Offline Access**: Work without internet connectivity
* **Customization**: Fine-tune models for your specific needs
* **No Rate Limits**: Generate as much content as your hardware allows
## Getting Started with Local LLMs
You can run **open-source LLMs locally** using several tools and platforms:
**GUI-based tools:**
- **LM Studio** - Interface for downloading and chatting with models
- **Jan** - Open-source ChatGPT alternative
- **GPT4All** - General-purpose application with document chat capabilities
**Command-line tools:**
- **Ollama** - Simple command-line tool for running models locally
- **llama.cpp** - Lightweight C++ implementation that runs models efficiently on CPUs
- Direct model loading via Python frameworks like Hugging Face Transformers, PyTorch, or TensorFlow
**Web interfaces:**
If you want a ChatGPT-like experience, you can pair these backends with interfaces like **LobeChat**, **Open WebUI**, or **LibreChat**.
**LM Studio** or **Jan** provide model downloads and chat interfaces without command-line work. They support the same model formats (GGUF) that Ollama uses.
## Code LLMs vs Writing Models: What's the Difference?
Differences between **code LLMs** and **writing models**:
**Code LLMs** (like `qwen2.5-coder`, `codellama`, `deepseek-coder`) are trained on code repositories and handle:
* Code generation and completion
* Debugging and error fixing
* Code explanation and documentation
* Multi-language programming support
* Understanding code context and syntax
**Writing Models** (like `llama3.1`, `mistral`, `gemma2`) are designed for natural language tasks:
* Creative writing and storytelling
* Content generation and editing
* Conversational AI and chat
* Long-form content creation
* General language understanding
**Versatile Models** (like `qwen2.5`, `llama3.1`, `mistral`) handle both coding and writing tasks.
### Using Ollama for Local LLM Deployment
**Ollama** provides a command-line interface and API for running **open source LLMs** locally. Example usage:
```bash
# Pull a model (coding example)
ollama pull qwen2.5-coder:7b
# Pull a model (writing example)
ollama pull llama3.1:8b
# Run a model
ollama run qwen2.5-coder:7b
ollama run llama3.1:8b
# Or use in your application (coding)
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "Write a Python function to calculate Fibonacci numbers."
}'
# Or use in your application (writing)
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Write a short story about a robot learning to paint."
}'
```
## Popular Use Cases for Open Source LLMs
**Open source language models** are being used across various domains:
* **Code Generation**: Automate boilerplate code, generate functions, and complete code snippets
* **Code Review**: Analyze code for bugs, security issues, and best practices
* **Documentation**: Generate API docs, README files, and technical documentation
* **Creative Writing**: Draft stories, articles, and creative content
* **Content Editing**: Improve grammar, style, and clarity of written content
* **Conversational AI**: Build chatbots and virtual assistants
* **Data Analysis**: Generate SQL queries and analyze datasets
* **Learning**: Understand programming concepts and get coding help
## Running Local LLMs
Considerations for running **open source LLMs**:
* **Start Small**: Begin with smaller models (3B-7B parameters) to test your hardware
* **Monitor Resources**: Use system monitoring tools to track GPU/CPU and memory usage
* **Experiment with Quantization**: Use quantized models (Q4, Q5, Q8) to reduce memory requirements
* **Try Multiple Models**: Different **code LLMs** and **writing models** perform differently on various tasks
* **Use Appropriate Context Windows**: Match model context length to your use case
* **Keep Models Updated**: Regularly pull updated versions for bug fixes and improvements
## References and Resources
- [Ollama Library](https://ollama.com/library) - Available **open source LLMs**
- [Ollama Library - Code Models](https://ollama.com/library?sort=newest&q=code) - **Code LLMs**
- [Ollama Documentation](https://ollama.com/docs) - Ollama documentation
- [LM Studio](https://lmstudio.ai/) - GUI for **local LLM** management
- [Jan](https://jan.ai/) - Open-source ChatGPT alternative
- [GPT4All](https://gpt4all.io/) - **Local AI** application
- [llama.cpp](https://github.com/ggerganov/llama.cpp) - CPU-based **LLM** inference