Ollama represents a significant advancement in making AI accessible to developers, allowing for local deployment of powerful language models. This guide will show you how to implement AI features in your applications using Ollama.
Getting Started with Ollama
Installation and Setup
First, install Ollama on your system:
# macOS
curl https://ollama.ai/install.sh | sh
# Linux
curl https://ollama.ai/install.sh | sh
Basic Model Management
Pull and run your first model:
# Pull the model
ollama pull llama2
# Run a basic query
ollama run llama2 "Explain how to implement a binary search"
Basic Implementation
REST API Integration
Create a basic API wrapper:
// api/ollama.js
async function queryModel(prompt, model = 'llama2') {
const response = await fetch('http://localhost:11434/api/generate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model,
prompt,
stream: false
})
});
return await response.json();
}
Simple Chat Interface
Implement a basic chat interface:
// components/Chat.js
export function Chat() {
const [messages, setMessages] = useState([]);
const [input, setInput] = useState('');
async function handleSubmit(e) {
e.preventDefault();
const response = await queryModel(input);
setMessages(prev => [...prev,
{ role: 'user', content: input },
{ role: 'assistant', content: response.response }
]);
setInput('');
}
return (
<div className="chat-container">
<div className="messages">
{messages.map((msg, i) => (
<div key={i} className={msg.role}>
{msg.content}
</div>
))}
</div>
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={e => setInput(e.target.value)}
placeholder="Ask something..."
/>
<button type="submit">Send</button>
</form>
</div>
);
}
Advanced Features
Streaming Responses
Implement streaming for real-time responses:
async function streamResponse(prompt, onChunk) {
const response = await fetch('http://localhost:11434/api/generate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'llama2',
prompt,
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const {value, done} = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.trim()) {
const json = JSON.parse(line);
onChunk(json.response);
}
}
}
}
Context Management
Implement context tracking:
class ConversationManager {
constructor() {
this.context = [];
}
addMessage(role, content) {
this.context.push({ role, content });
}
async getResponse(prompt) {
const fullContext = this.context
.map(msg => `${msg.role}: ${msg.content}`)
.join('\n');
const response = await queryModel(
`${fullContext}\nuser: ${prompt}`
);
this.addMessage('user', prompt);
this.addMessage('assistant', response.response);
return response.response;
}
}
Real-World Applications
Code Generation Assistant
Create a code generation feature:
async function generateCode(specification) {
const prompt = `
Generate code based on the following specification:
${specification}
Please provide:
1. Implementation
2. Usage example
3. Error handling
`;
const response = await queryModel(prompt, 'codellama');
return response.response;
}
Content Summarization
Implement document summarization:
async function summarizeText(text) {
const prompt = `
Summarize the following text concisely:
${text}
Provide:
1. Main points
2. Key takeaways
3. Important details
`;
const response = await queryModel(prompt);
return response.response;
}
Best Practices
Error Handling
Implement robust error handling:
async function safeQueryModel(prompt, model = 'llama2') {
try {
const response = await queryModel(prompt, model);
if (!response.response) {
throw new Error('Empty response from model');
}
return response.response;
} catch (error) {
console.error('Model query failed:', error);
throw new Error('Failed to get AI response');
}
}
Rate Limiting
Implement rate limiting:
class RateLimiter {
constructor(maxRequests, timeWindow) {
this.requests = [];
this.maxRequests = maxRequests;
this.timeWindow = timeWindow;
}
async checkLimit() {
const now = Date.now();
this.requests = this.requests.filter(
time => now - time < this.timeWindow
);
if (this.requests.length >= this.maxRequests) {
throw new Error('Rate limit exceeded');
}
this.requests.push(now);
}
}
Performance Optimization
Response Caching
Implement response caching:
class ResponseCache {
constructor() {
this.cache = new Map();
}
async getResponse(prompt, model) {
const key = `${model}:${prompt}`;
if (this.cache.has(key)) {
return this.cache.get(key);
}
const response = await queryModel(prompt, model);
this.cache.set(key, response);
return response;
}
}
Batch Processing
Implement batch processing:
async function processBatch(prompts, model = 'llama2') {
const batchSize = 5;
const results = [];
for (let i = 0; i < prompts.length; i += batchSize) {
const batch = prompts.slice(i, i + batchSize);
const promises = batch.map(prompt => queryModel(prompt, model));
const batchResults = await Promise.all(promises);
results.push(...batchResults);
}
return results;
}
Conclusion
Ollama provides a powerful platform for implementing AI features in your applications. Key takeaways:
- Start with basic implementations
- Use streaming for better UX
- Implement proper error handling
- Consider rate limiting and caching
- Optimize for performance
Remember to:
- Handle errors gracefully
- Manage system resources
- Monitor performance
- Test thoroughly
- Keep security in mind
As AI continues to evolve, Ollama offers a flexible and powerful way to integrate AI capabilities into your applications.