Implementing AI Features with Ollama

Last Modified: January 5, 2024

Ollama represents a significant advancement in making AI accessible to developers, allowing for local deployment of powerful language models. This guide will show you how to implement AI features in your applications using Ollama.

Getting Started with Ollama

Installation and Setup

First, install Ollama on your system:

# macOS
curl https://ollama.ai/install.sh | sh

# Linux
curl https://ollama.ai/install.sh | sh

Basic Model Management

Pull and run your first model:

# Pull the model
ollama pull llama2

# Run a basic query
ollama run llama2 "Explain how to implement a binary search"

Basic Implementation

REST API Integration

Create a basic API wrapper:

// api/ollama.js
async function queryModel(prompt, model = 'llama2') {
    const response = await fetch('http://localhost:11434/api/generate', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            model,
            prompt,
            stream: false
        })
    });

    return await response.json();
}

Simple Chat Interface

Implement a basic chat interface:

// components/Chat.js
export function Chat() {
    const [messages, setMessages] = useState([]);
    const [input, setInput] = useState('');

    async function handleSubmit(e) {
        e.preventDefault();

        const response = await queryModel(input);
        setMessages(prev => [...prev, 
            { role: 'user', content: input },
            { role: 'assistant', content: response.response }
        ]);

        setInput('');
    }

    return (
        <div className="chat-container">
            <div className="messages">
                {messages.map((msg, i) => (
                    <div key={i} className={msg.role}>
                        {msg.content}
                    </div>
                ))}
            </div>
            <form onSubmit={handleSubmit}>
                <input
                    value={input}
                    onChange={e => setInput(e.target.value)}
                    placeholder="Ask something..."
                />
                <button type="submit">Send</button>
            </form>
        </div>
    );
}

Advanced Features

Streaming Responses

Implement streaming for real-time responses:

async function streamResponse(prompt, onChunk) {
    const response = await fetch('http://localhost:11434/api/generate', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            model: 'llama2',
            prompt,
            stream: true
        })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
        const {value, done} = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
            if (line.trim()) {
                const json = JSON.parse(line);
                onChunk(json.response);
            }
        }
    }
}

Context Management

Implement context tracking:

class ConversationManager {
    constructor() {
        this.context = [];
    }

    addMessage(role, content) {
        this.context.push({ role, content });
    }

    async getResponse(prompt) {
        const fullContext = this.context
            .map(msg => `${msg.role}: ${msg.content}`)
            .join('\n');

        const response = await queryModel(
            `${fullContext}\nuser: ${prompt}`
        );

        this.addMessage('user', prompt);
        this.addMessage('assistant', response.response);

        return response.response;
    }
}

Real-World Applications

Code Generation Assistant

Create a code generation feature:

async function generateCode(specification) {
    const prompt = `
        Generate code based on the following specification:
        ${specification}

        Please provide:
        1. Implementation
        2. Usage example
        3. Error handling
    `;

    const response = await queryModel(prompt, 'codellama');
    return response.response;
}

Content Summarization

Implement document summarization:

async function summarizeText(text) {
    const prompt = `
        Summarize the following text concisely:
        ${text}

        Provide:
        1. Main points
        2. Key takeaways
        3. Important details
    `;

    const response = await queryModel(prompt);
    return response.response;
}

Best Practices

Error Handling

Implement robust error handling:

async function safeQueryModel(prompt, model = 'llama2') {
    try {
        const response = await queryModel(prompt, model);

        if (!response.response) {
            throw new Error('Empty response from model');
        }

        return response.response;
    } catch (error) {
        console.error('Model query failed:', error);
        throw new Error('Failed to get AI response');
    }
}

Rate Limiting

Implement rate limiting:

class RateLimiter {
    constructor(maxRequests, timeWindow) {
        this.requests = [];
        this.maxRequests = maxRequests;
        this.timeWindow = timeWindow;
    }

    async checkLimit() {
        const now = Date.now();
        this.requests = this.requests.filter(
            time => now - time < this.timeWindow
        );

        if (this.requests.length >= this.maxRequests) {
            throw new Error('Rate limit exceeded');
        }

        this.requests.push(now);
    }
}

Performance Optimization

Response Caching

Implement response caching:

class ResponseCache {
    constructor() {
        this.cache = new Map();
    }

    async getResponse(prompt, model) {
        const key = `${model}:${prompt}`;

        if (this.cache.has(key)) {
            return this.cache.get(key);
        }

        const response = await queryModel(prompt, model);
        this.cache.set(key, response);

        return response;
    }
}

Batch Processing

Implement batch processing:

async function processBatch(prompts, model = 'llama2') {
    const batchSize = 5;
    const results = [];

    for (let i = 0; i < prompts.length; i += batchSize) {
        const batch = prompts.slice(i, i + batchSize);
        const promises = batch.map(prompt => queryModel(prompt, model));

        const batchResults = await Promise.all(promises);
        results.push(...batchResults);
    }

    return results;
}

Conclusion

Ollama provides a powerful platform for implementing AI features in your applications. Key takeaways:

Start with basic implementations
Use streaming for better UX
Implement proper error handling
Consider rate limiting and caching
Optimize for performance

Remember to:

Handle errors gracefully
Manage system resources
Monitor performance
Test thoroughly
Keep security in mind

As AI continues to evolve, Ollama offers a flexible and powerful way to integrate AI capabilities into your applications.