Grow your business with

Help of ai

World's biggest Ai Content creation with powerful Ai technology

bg1
Run DeepSeek-R1 (AI) Locally for Free on Mac/Windows/Linux

Run DeepSeek-R1 (AI) Locally for Free on Mac/Windows/Linux

No Comments

Photo of author

By [email protected]

Are you looking for an easy way to install DeepSeek offline or searching for free DeepSeek-R1 offline installation? Running powerful models like DeepSeek-R1 locally has become a game-changer for developers, researchers, and AI enthusiasts. Most of the advanced users run LLMs setup locally to gain full control over data, security and thus it also helps the LLMs to function to its full potential. This guide covers four proven methods to install DeepSeek-R1 locally on Mac, Windows, or Linux—using Ollama’s simplicity, Python’s flexibility, Docker’s reliability, or llama.cpp’s optimization. Choose the method that suits your workflow and hardware.

Advantages of Running DeepSeek Locally

Running DeepSeek locally offers several advantages, especially for users concerned with performance, privacy, and control. Here’s a breakdown of the key benefits:

AdvantageDescription
Data Privacy & SecurityFull control over your data, ensuring sensitive information remains on your machine without third-party access.
Offline FunctionalityOperate without an internet connection, reducing dependency on cloud services and ensuring availability in remote areas.
Customization & FlexibilityAbility to fine-tune the model, customize settings, and integrate with local applications or workflows.
Performance & SpeedFaster response times due to reduced latency and full utilization of local CPU/GPU resources.
Cost EfficiencyAvoid cloud subscription fees and API usage costs; scale workloads without additional expenses.
Experimentation & DevelopmentFreedom to experiment, iterate quickly, and maintain version control without external restrictions.
Enhanced Security for Sensitive ApplicationsIdeal for industries with strict regulatory requirements (e.g., healthcare, finance) by running in secure, controlled environments.

Prerequisites

  1. Operating System Required:
    • macOS (Intel or Apple Silicon)
    • Linux (x86_64 or ARM64) | Ubuntu 24.04
    • Windows (via Windows Subsystem for Linux [WSL 2])
  2. Hardware Required:
    • Minimum 8GB but 16GB+ RAM (recommended for optimal performance).
    • 10GB+ free storage space.
    • A compatible GPU (optional but recommended for faster inference).
  3. Softwares Required:
    • Terminal access (Command Prompt/PowerShell for Windows via WSL).
    • Basic Tools: Python 3.10+, pip, and git.

Types of DeepSeek Installation – Comparison and Which one is Easy?

Users can install DeepSeek-R1 locally using Four Methods for free. For most of the users, Ollama is the easiest method, while Python/Hugging Face offers maximum flexibility. Here is the detailed comparison of all four installation methods:

Installation MethodEase of InstallationHardwareCustomization
OllamaGPU/CPULow
PythonGPU/CPUHigh
DockerGPU/CPUMedium
llama.cppCPU/GPU (slow)Medium

How to Install DeepSeek-R1 Locally Using Ollama

Step 1: Install Ollama

Ollama simplifies running LLMs locally. Follow these steps to install it:

For macOS

  1. Visit Ollama.ai and download the macOS app.
  2. Drag the Ollama icon to your Applications folder.
  3. Open the app to start the Ollama background service.
For Linux/Ubuntu 24.04/WSL (Windows)

Run the below installation script in your terminal:

curl -fsSL https://ollama.ai/install.sh | sh

Then, Start the Ollama service

ollama serve
Verify Ollama Installation

Check if Ollama is installed:

ollama --version

If successful, you’ll see the version number (e.g., ollama version 0.1.25).

Step 2: Download and Install DeepSeek-R1

DeepSeek-R1 might not be directly available in Ollama’s default library. Use one of these methods:

Method 1: Pull from Ollama (If Available)

First, Check if the model exists:

ollama list

If available in Ollama’s library:

ollama pull deepseek-r1  

Please be patient during this process: Downloading a large language model, which can be several gigabytes in size, requires a stable internet connection. The download time will vary depending on your internet speed, faster connections will result in quicker downloads, while slower connections may take several minutes or more.

If deepseek-r1 isn’t listed, proceed to Method 2.

Method 2: Manual Setup Using a Modelfile

1. Download the Model

  • Obtain the DeepSeek-R1 model file in GGUF format (e.g., deepseek-r1.Q4_K_M.gguf) from sources like Hugging Face or the official DeepSeek repository.
  • Save it to a dedicated folder (e.g., ~/models).

2. Create a Modelfile

In the same folder, create a file named Modelfile with:

FROM ./deepseek-r1.Q4_K_M.gguf

Replace the filename with your actual GGUF file.

3. Build the Model

ollama create deepseek-r1 -f Modelfile

Step 3: Run DeepSeek-R1

Start the Chat with the model

ollama run deepseek-r1

Example prompt:

>>> Write a Python function to calculate Fibonacci numbers.

Step 4: Verify Installation (Optional)

Confirm the model is active:

ollama list

Now you will see deepseek-r1 listed. Test inference speed and response quality with sample prompts.

Step 5: Run DeepSeek in a Web UI

While Ollama offers command-line interaction with models like DeepSeek, a web-based interface can provide a more easy and user-friendly experience same as you are launching DeepSeek on a Web Browser. Ollama Web UI offers such an interface, simplifying the process of interacting with and managing your Ollama models.

Note: This graphical interface can be especially helpful for users less comfortable with command-line tools, or for tasks where visual interaction is beneficial.

1. Create a Virtual Environment

First, create a virtual environment that isolates your Python dependencies from the system-wide Python installation.

sudo apt install python3-venv
python3 -m venv ~/open-webui-venv
source ~/open-webui-venv/bin/activate

2. Install Open WebUI

Now Install Open WebUI using pip

pip install open-webui

3. Start the Server

After installing Open WebUI, now start the server using the below command

open-webui serve

Open your web browser and navigate to http://localhost:8080 – you should see the Ollama Web UI interface.

DeepSeek Ollama Troubleshooting Tips

1. Model Not Found:

  • Ensure the model name is correct or use the manual GGUF setup.
  • Check Ollama’s Model Registry for alternative DeepSeek models (e.g., deepseek-coder).

2. Performance Issues:

  • Allocate more RAM/VRAM.
  • Simplify prompts for faster responses.

3. WSL Errors:

  • Update WSL: wsl --update.
  • Restart the Ollama service.

For the latest updates, refer to:

Install and Run DeepSeek via Python & Hugging Face

Interacting with DeepSeek via Python and the Hugging Face Transformers library offers a powerful and flexible approach:

Step 1: Install Dependencies

First, ensure you have Python installed. Then, install the required libraries using pip:

pip install torch transformers accelerate
  • transformers: Provides access to pre-trained models and tools for working with them.
  • accelerate: Helps optimize model execution, especially for larger models and GPUs.

Step 2: Download the Model

git clone https://huggingface.co/deepseek-ai/deepseek-r1  

Step 3: Run Inference

Create a Python script inference.py:

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1")
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0]))

Step 4: Execute

python inference.py

DeepSeek via Python & Hugging Face Troubleshooting Tips

  • Out-of-Memory Errors: Add device_map="auto" to from_pretrained().
  • Slow Performance: Use quantization (e.g., load_in_4bit=True).

Install DeepSeek Locally via Docker

Running DeepSeek locally using Docker offers a easy and reliable environment, removing away many of the complexities of manual installation:

Step 1: Install Docker

Ensure you have Docker and Docker Compose installed on your system. You can download and install them from the official Docker website (https://www.docker.com/).

  • Windows/macOS: Download Docker Desktop from docker.com
  • Linux (Ubuntu/Debian): Use your package manager:
sudo apt-get update && sudo apt-get install docker.io

Step 2: Pull the DeepSeek Docker Image

Pull the official image from the registry (replace deepseek-image:tag with the actual image name from DeepSeek’s documentation):

docker pull deepseek/deepseek-llm:latest  # Example image name

Step 3: Run the DeepSeek Container

Start the container with appropriate resources:

docker run -d --name deepseek-container -p 8080:8080 deepseek/deepseek:latest

This command starts the container in detached mode (-d), names it deepseek-container, and maps port 8080 of the container to port 8080 on your local machine.

Step 4: Verify Installation

Check if the container is running:

docker ps -a | grep deepseek-container

Step 5: Interact with the Model

Send a test request via API:

curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, DeepSeek!", "max_tokens": 50}'

Important Notes:

  1. GPU Support: Requires NVIDIA drivers and NVIDIA Container Toolkit
  2. Model Weights: Some models require separate weight downloads. Check DeepSeek’s documentation.
  3. Configuration: You may need to set additional environment variables for:
    • Model parameters
    • API security
    • Resource allocation

DeepSeek Manual Setup with llama.cpp

For CPU-only or lightweight GPU usage.

Prerequisites

  • Hardware:
    • CPU: Modern x86-64 or ARM (Apple Silicon).
    • GPU (optional): NVIDIA (CUDA), AMD (ROCm), or Apple Metal.
  • C++ Compiler: Ensure you have a compatible C++ compiler installed (e.g., g++).
  • CMake: Required for building llama.cpp.
  • Git: To clone the repository.
  • Python (optional): For Python bindings if needed.

Step 1: Install or Clone llama.cpp Repository

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

Step 2: Build llama.cpp

For Windows (using CMake)

mkdir build
cd build
cmake ..
cmake --build . --config Release

This will create an executable in the build/bin directory.

For Linux/macOS:

make clean && make LLAMA_METAL=1  # Enable Metal for Apple GPUs
# or for CUDA (NVIDIA GPUs):
make clean && make LLAMA_CUBLAS=1

Step 3: Download the DeepSeek GGUF Model

Option 1: Download a pre-converted GGUF model from Hugging Face:

  • Search for deepseek-gguf on Hugging Face Hub, For Example (adjust for your model version):

Option 2: Convert the raw model to GGUF yourself (advanced):

# Convert PyTorch/Safetensors to GGUF
python3 convert.py --ctx-size 4096 --outtype f16 /path/to/deepseek-model-dir
# Quantize the model (e.g., Q4_K_M for 4-bit):
./quantize /path/to/deepseek-model.gguf /path/to/deepseek-model-Q4_K_M.gguf Q4_K_M

Step 4: Run the Model

Use the main executable to interact with the model:

# For CPU
./main -m /path/to/deepseek-r1.Q4_K_M.gguf -p "Hello, DeepSeek!" -n 512
# For GPU Acceleration (e.g., NVIDIA CUDA)
./main -m /path/to/deepseek-r1.Q4_K_M.gguf -p "Hello, DeepSeek!" -n 512 --ngl 50

Step 5: Use the API Server (Optional)

Run the model as an OpenAI-compatible API server:

./server -m /path/to/deepseek-r1.Q4_K_M.gguf --port 8000 --host 0.0.0.0 --ctx-size 4096

Send requests via curl

curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"prompt": "Explain AI alignment",
"max_tokens": 200
}'

DeepSeek Setup with llama.cpp Troubleshooting Tips

  • Model Compatibility: Ensure the DeepSeek model is compatible with llama.cpp. You might need to convert the model using appropriate tools if it’s in a different format.
  • Memory Issues: If you encounter memory errors, try using quantized versions of the model (.ggml.q4_0.gguf.q5_1, etc.) to reduce resource usage.
  • Performance: For better performance, use GPU acceleration if supported on your system.

Leave a Comment