Are you looking for an easy way to install DeepSeek offline or searching for free DeepSeek-R1 offline installation? Running powerful models like DeepSeek-R1 locally has become a game-changer for developers, researchers, and AI enthusiasts. Most of the advanced users run LLMs setup locally to gain full control over data, security and thus it also helps the LLMs to function to its full potential. This guide covers four proven methods to install DeepSeek-R1 locally on Mac, Windows, or Linux—using Ollama’s simplicity, Python’s flexibility, Docker’s reliability, or llama.cpp’s optimization. Choose the method that suits your workflow and hardware.
Table of Contents
Advantages of Running DeepSeek Locally
Running DeepSeek locally offers several advantages, especially for users concerned with performance, privacy, and control. Here’s a breakdown of the key benefits:
Advantage | Description |
---|---|
Data Privacy & Security | Full control over your data, ensuring sensitive information remains on your machine without third-party access. |
Offline Functionality | Operate without an internet connection, reducing dependency on cloud services and ensuring availability in remote areas. |
Customization & Flexibility | Ability to fine-tune the model, customize settings, and integrate with local applications or workflows. |
Performance & Speed | Faster response times due to reduced latency and full utilization of local CPU/GPU resources. |
Cost Efficiency | Avoid cloud subscription fees and API usage costs; scale workloads without additional expenses. |
Experimentation & Development | Freedom to experiment, iterate quickly, and maintain version control without external restrictions. |
Enhanced Security for Sensitive Applications | Ideal for industries with strict regulatory requirements (e.g., healthcare, finance) by running in secure, controlled environments. |
Prerequisites
- Operating System Required:
- macOS (Intel or Apple Silicon)
- Linux (x86_64 or ARM64) | Ubuntu 24.04
- Windows (via Windows Subsystem for Linux [WSL 2])
- Hardware Required:
- Minimum 8GB but 16GB+ RAM (recommended for optimal performance).
- 10GB+ free storage space.
- A compatible GPU (optional but recommended for faster inference).
- Softwares Required:
- Terminal access (Command Prompt/PowerShell for Windows via WSL).
- Basic Tools: Python 3.10+,
pip
, andgit
.
Types of DeepSeek Installation – Comparison and Which one is Easy?
Users can install DeepSeek-R1 locally using Four Methods for free. For most of the users, Ollama is the easiest method, while Python/Hugging Face offers maximum flexibility. Here is the detailed comparison of all four installation methods:
Installation Method | Ease of Installation | Hardware | Customization |
---|---|---|---|
Ollama | GPU/CPU | Low | |
Python | GPU/CPU | High | |
Docker | GPU/CPU | Medium | |
llama.cpp | CPU/GPU (slow) | Medium |
How to Install DeepSeek-R1 Locally Using Ollama
Step 1: Install Ollama
Ollama simplifies running LLMs locally. Follow these steps to install it:
For macOS
- Visit Ollama.ai and download the macOS app.
- Drag the Ollama icon to your Applications folder.
- Open the app to start the Ollama background service.
For Linux/Ubuntu 24.04/WSL (Windows)
Run the below installation script in your terminal:
curl -fsSL https://ollama.ai/install.sh | sh
Then, Start the Ollama service
ollama serve
Verify Ollama Installation
Check if Ollama is installed:
ollama --version
If successful, you’ll see the version number (e.g., ollama version 0.1.25
).
Step 2: Download and Install DeepSeek-R1
DeepSeek-R1 might not be directly available in Ollama’s default library. Use one of these methods:
Method 1: Pull from Ollama (If Available)
First, Check if the model exists:
ollama list
If available in Ollama’s library:
ollama pull deepseek-r1
Please be patient during this process: Downloading a large language model, which can be several gigabytes in size, requires a stable internet connection. The download time will vary depending on your internet speed, faster connections will result in quicker downloads, while slower connections may take several minutes or more.
If deepseek-r1
isn’t listed, proceed to Method 2.
Method 2: Manual Setup Using a Modelfile
1. Download the Model
- Obtain the DeepSeek-R1 model file in GGUF format (e.g.,
deepseek-r1.Q4_K_M.gguf
) from sources like Hugging Face or the official DeepSeek repository. - Save it to a dedicated folder (e.g.,
~/models
).
2. Create a Modelfile
In the same folder, create a file named Modelfile
with:
FROM ./deepseek-r1.Q4_K_M.gguf
Replace the filename with your actual GGUF file.
3. Build the Model
ollama create deepseek-r1 -f Modelfile
Step 3: Run DeepSeek-R1
Start the Chat with the model
ollama run deepseek-r1
Example prompt:
>>> Write a Python function to calculate Fibonacci numbers.
Step 4: Verify Installation (Optional)
Confirm the model is active:
ollama list
Now you will see deepseek-r1
listed. Test inference speed and response quality with sample prompts.
Step 5: Run DeepSeek in a Web UI
While Ollama offers command-line interaction with models like DeepSeek, a web-based interface can provide a more easy and user-friendly experience same as you are launching DeepSeek on a Web Browser. Ollama Web UI offers such an interface, simplifying the process of interacting with and managing your Ollama models.
Note: This graphical interface can be especially helpful for users less comfortable with command-line tools, or for tasks where visual interaction is beneficial.
1. Create a Virtual Environment
First, create a virtual environment that isolates your Python dependencies from the system-wide Python installation.
sudo apt install python3-venv
python3 -m venv ~/open-webui-venv
source ~/open-webui-venv/bin/activate
2. Install Open WebUI
Now Install Open WebUI using pip
pip install open-webui
3. Start the Server
After installing Open WebUI, now start the server using the below command
open-webui serve
Open your web browser and navigate to http://localhost:8080
– you should see the Ollama Web UI interface.
DeepSeek Ollama Troubleshooting Tips
1. Model Not Found:
- Ensure the model name is correct or use the manual GGUF setup.
- Check Ollama’s Model Registry for alternative DeepSeek models (e.g.,
deepseek-coder
).
2. Performance Issues:
- Allocate more RAM/VRAM.
- Simplify prompts for faster responses.
3. WSL Errors:
- Update WSL:
wsl --update
. - Restart the Ollama service.
For the latest updates, refer to:
Install and Run DeepSeek via Python & Hugging Face
Interacting with DeepSeek via Python and the Hugging Face Transformers library offers a powerful and flexible approach:
Step 1: Install Dependencies
First, ensure you have Python installed. Then, install the required libraries using pip:
pip install torch transformers accelerate
transformers
: Provides access to pre-trained models and tools for working with them.accelerate
: Helps optimize model execution, especially for larger models and GPUs.
Step 2: Download the Model
- Find
DeepSeek-R1
on Hugging Face Model Hub. - Clone the repository using the below command:
git clone https://huggingface.co/deepseek-ai/deepseek-r1
Step 3: Run Inference
Create a Python script inference.py
:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1")
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0]))
Step 4: Execute
python inference.py
DeepSeek via Python & Hugging Face Troubleshooting Tips
- Out-of-Memory Errors: Add
device_map="auto"
tofrom_pretrained()
. - Slow Performance: Use
quantization
(e.g.,load_in_4bit=True
).
Install DeepSeek Locally via Docker
Running DeepSeek locally using Docker offers a easy and reliable environment, removing away many of the complexities of manual installation:
Step 1: Install Docker
Ensure you have Docker and Docker Compose installed on your system. You can download and install them from the official Docker website (https://www.docker.com/).
- Windows/macOS: Download Docker Desktop from docker.com
- Linux (Ubuntu/Debian): Use your package manager:
sudo apt-get update && sudo apt-get install docker.io
Step 2: Pull the DeepSeek Docker Image
Pull the official image from the registry (replace deepseek-image:tag
with the actual image name from DeepSeek’s documentation):
docker pull deepseek/deepseek-llm:latest # Example image name
Step 3: Run the DeepSeek Container
Start the container with appropriate resources:
docker run -d --name deepseek-container -p 8080:8080 deepseek/deepseek:latest
This command starts the container in detached mode (-d
), names it deepseek-container
, and maps port 8080 of the container to port 8080 on your local machine.
Step 4: Verify Installation
Check if the container is running:
docker ps -a | grep deepseek-container
Step 5: Interact with the Model
Send a test request via API:
curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, DeepSeek!", "max_tokens": 50}'
Important Notes:
- GPU Support: Requires NVIDIA drivers and NVIDIA Container Toolkit
- Model Weights: Some models require separate weight downloads. Check DeepSeek’s documentation.
- Configuration: You may need to set additional environment variables for:
- Model parameters
- API security
- Resource allocation
DeepSeek Manual Setup with llama.cpp
For CPU-only or lightweight GPU usage.
Prerequisites
- Hardware:
- CPU: Modern x86-64 or ARM (Apple Silicon).
- GPU (optional): NVIDIA (CUDA), AMD (ROCm), or Apple Metal.
- C++ Compiler: Ensure you have a compatible C++ compiler installed (e.g.,
g++
). - CMake: Required for building
llama.cpp
. - Git: To clone the repository.
- Python (optional): For Python bindings if needed.
Step 1: Install or Clone llama.cpp Repository
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make
Step 2: Build llama.cpp
For Windows (using CMake)
mkdir build
cd build
cmake ..
cmake --build . --config Release
This will create an executable in the build/bin
directory.
For Linux/macOS:
make clean && make LLAMA_METAL=1 # Enable Metal for Apple GPUs
# or for CUDA (NVIDIA GPUs):
make clean && make LLAMA_CUBLAS=1
Step 3: Download the DeepSeek GGUF Model
Option 1: Download a pre-converted GGUF model from Hugging Face:
- Search for
deepseek-gguf
on Hugging Face Hub, For Example (adjust for your model version):
Option 2: Convert the raw model to GGUF yourself (advanced):
# Convert PyTorch/Safetensors to GGUF
python3 convert.py --ctx-size 4096 --outtype f16 /path/to/deepseek-model-dir
# Quantize the model (e.g., Q4_K_M for 4-bit):
./quantize /path/to/deepseek-model.gguf /path/to/deepseek-model-Q4_K_M.gguf Q4_K_M
Step 4: Run the Model
Use the main
executable to interact with the model:
# For CPU
./main -m /path/to/deepseek-r1.Q4_K_M.gguf -p "Hello, DeepSeek!" -n 512
# For GPU Acceleration (e.g., NVIDIA CUDA)
./main -m /path/to/deepseek-r1.Q4_K_M.gguf -p "Hello, DeepSeek!" -n 512 --ngl 50
Step 5: Use the API Server (Optional)
Run the model as an OpenAI-compatible API server:
./server -m /path/to/deepseek-r1.Q4_K_M.gguf --port 8000 --host 0.0.0.0 --ctx-size 4096
Send requests via curl
curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"prompt": "Explain AI alignment",
"max_tokens": 200
}'
DeepSeek Setup with llama.cpp Troubleshooting Tips
- Model Compatibility: Ensure the DeepSeek model is compatible with
llama.cpp
. You might need to convert the model using appropriate tools if it’s in a different format. - Memory Issues: If you encounter memory errors, try using quantized versions of the model (
.ggml.q4_0
,.gguf.q5_1
, etc.) to reduce resource usage. - Performance: For better performance, use GPU acceleration if supported on your system.