AI-2: How to Use GPT-OSS-20B Locally in Spring AI application with Ollama and RunPod GPU.

Last AI series we have already seen create Generative AI Locally using Spring AI with Ollama and Meta's llama3

https://www.teachlea.com/2024/08/ai-1-create-generative-ai-locally-using.html

In this AI series post we are going to create Generative AI Locally using Spring AI with Ollama and GPT-OSS-20B running as POD inside a GPU on RunPOD.

Introduction

This proof of concept demonstrates how to deploy a complete local generative AI solution using Spring AI, Ollama, and OpenAI's GPT-OSS-20B model running in a containerized environment on RunPOD's GPU infrastructure. This setup provides enterprise-level AI capabilities while maintaining data privacy and cost control through local deployment.

Architecture Overview

The solution leverages three key components:

Spring AI Framework - Provides seamless AI integration for Java applications with vendor-agnostic APIs and familiar Spring Boot patterns. Spring AI supports multiple model providers and offers features like chat completion, embeddings, and function calling capabilities.

Ollama - Acts as a local LLM runtime that simplifies model deployment and management. Ollama provides an OpenAI-compatible API endpoint, making integration straightforward while enabling offline operation.

GPT-OSS-20B Model - OpenAI's open-weight reasoning model with 21 billion parameters but only 3.6 billion active parameters per token using mixture-of-experts architecture. The model supports MXFP4 quantization, enabling deployment on systems with just 16GB of memory.

Technical Implementation

RunPOD Pod Configuration

For optimal performance with GPT-OSS-20B, the following RunPOD configuration is recommended:

Component	Specification
GPU Memory	Minimum 16GB VRAM
System RAM	24-32GB recommended
Storage	50GB+ for model and dependencies
GPU Type	NVIDIA RTX 3090, RTX 4090, or A100

Spring AI Configuration

The Spring AI application requires minimal configuration to integrate with Ollama:

Maven dependencies for Spring AI with Ollama support:

<groupId>org.springframework.ai</groupId>

<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>

</dependency>

Gradle dependencies for Spring AI with Ollama support:

Go To RunPod and SignUp if not already , Add some Credit from Billing Section.

Then select a GPU for your requirement

Configure the POD and Ensure that the Ollama port (default 11434) is exposed in your Runpod environment. When setting up your Runpod GPU Pod, you need to add 11434 to the list of exposed ports and ensure OLLAMA_HOST environment variable is set to 0.0.0.0 within the Runpod container. Obtain Runpod Public IP/Hostname.

And Then Deploy On Demand

Once the POD reach at Running State click on "Connect" you should able to see as shown below:

Click on "Open Web Terminal" it will open a terminal in the Next Browser tab

cd workspace and install ollama using below curl command:

curl -fsSL https://ollama.com/install.sh | sh

Once Ollama installed start Ollama with below command:

ollama serve

Open second Web Terminal and download GPT-OSS-20B Model inside ollama

ollama pull gpt-oss:20b

Quick test before IntelliJ

curl https://<pod-id>-11434.proxy.runpod.net/api/tags

You will get this Response if all good

root@a79279cafdb4:/# curl http://localhost:11434/api/tags

{"models":[{"name":"gpt-oss:20b","model":"gpt-oss:20b","modified_at":"2025-08-13T14:21:24.115037591Z","size":13780173724,"digest":"aa4295ac10c3afb60e6d711289fc6896f5aef82258997b9efdaed6d0cc4cd8b8","details":{"parent_model":"","format":"gguf","family":"gptoss","families":["gptoss"],"parameter_size":"20.9B","quantization_level":"MXFP4"}}]}root@a79279cafdb4:/#

Demo:

Now its Time to Test our local generative AI solution using Spring AI, Ollama, and OpenAI's GPT-OSS-20B model running in a containerized environment on RunPOD's GPU infrastructure through Rest APIs,

Here we will POSTMAN to send the message and get the Responses from our locally.

Conclusion

This proof of concept successfully demonstrates that enterprise-grade generative AI capabilities can be deployed locally using Spring AI, Ollama, and GPT-OSS-20B on RunPOD infrastructure. The solution provides the privacy, control, and cost benefits of local deployment while maintaining the performance and scalability needed for production applications.

The combination of Spring AI's developer-friendly abstractions, Ollama's simplified model management, and RunPOD's scalable GPU infrastructure creates a powerful platform for organizations seeking to implement AI capabilities without compromising on data sovereignty or operational control.

The containerized architecture ensures consistent deployment across environments, while the open-source foundation provides flexibility for customization and extension. This approach represents a significant step toward democratizing advanced AI capabilities for organizations of all sizes.

Please feel free to provide your valuable comments, Thanks !!

Ticker

AI-2: How to Use GPT-OSS-20B Locally in Spring AI application with Ollama and RunPod GPU.

Posted by Rajiv Singh

Post a Comment

0 Comments

Most Popular