Last AI series we have already seen create Generative AI Locally using Spring AI with Ollama and Meta's llama3
https://www.teachlea.com/2024/08/ai-1-create-generative-ai-locally-using.html
In this AI series post we are going to create Generative AI Locally using Spring AI with Ollama and GPT-OSS-20B running as POD inside a GPU on RunPOD.
Introduction
This proof of concept demonstrates how to deploy a complete local generative AI solution using Spring AI, Ollama, and OpenAI's GPT-OSS-20B model running in a containerized environment on RunPOD's GPU infrastructure. This setup provides enterprise-level AI capabilities while maintaining data privacy and cost control through local deployment.
Architecture Overview
The solution leverages three key components:
Spring AI Framework - Provides seamless AI integration for Java applications with vendor-agnostic APIs and familiar Spring Boot patterns. Spring AI supports multiple model providers and offers features like chat completion, embeddings, and function calling capabilities.
Ollama - Acts as a local LLM runtime that simplifies model deployment and management. Ollama provides an OpenAI-compatible API endpoint, making integration straightforward while enabling offline operation.
GPT-OSS-20B Model - OpenAI's open-weight reasoning model with 21 billion parameters but only 3.6 billion active parameters per token using mixture-of-experts architecture. The model supports MXFP4 quantization, enabling deployment on systems with just 16GB of memory.
Technical Implementation
RunPOD Pod Configuration
For optimal performance with GPT-OSS-20B, the following RunPOD configuration is recommended:
Component | Specification |
---|---|
GPU Memory | Minimum 16GB VRAM |
System RAM | 24-32GB recommended |
Storage | 50GB+ for model and dependencies |
GPU Type | NVIDIA RTX 3090, RTX 4090, or A100 |
Spring AI Configuration
The Spring AI application requires minimal configuration to integrate with Ollama:
Maven dependencies for Spring AI with Ollama support:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
Go To RunPod and SignUp if not already , Add some Credit from Billing Section.
Then select a GPU for your requirement
Quick test before IntelliJ
curl https://<pod-id>-11434.proxy.runpod.net/api/tags
You will get this Response if all good
root@a79279cafdb4:/# curl http://localhost:11434/api/tags
{"models":[{"name":"gpt-oss:20b","model":"gpt-oss:20b","modified_at":"2025-08-13T14:21:24.115037591Z","size":13780173724,"digest":"aa4295ac10c3afb60e6d711289fc6896f5aef82258997b9efdaed6d0cc4cd8b8","details":{"parent_model":"","format":"gguf","family":"gptoss","families":["gptoss"],"parameter_size":"20.9B","quantization_level":"MXFP4"}}]}root@a79279cafdb4:/#
0 Comments