Documentation
Getting Started
Using SupaScale
Welcome to SupaScale! Here's how to get started:
- Navigate to the API page.
- Select your GPU and AI model based on your requirements.
- Sign up to get your API key and start computing instantly.
- Integrate our simple API into your application.
Ollama Models
Our platform supports running Ollama models. Here are the recommended models for our RTX 3050 GPU with 4GB VRAM:
| Model | Parameters | VRAM Required | Best For |
|---|---|---|---|
| Llama-2-1B | 1B | ~2GB | Fast text generation with good quality |
| Phi-2 | 2B | ~3GB | High-quality 2B parameter model |
| TinyLlama | 1.1B | ~2GB | Efficient small language model |
| Mistral-7B-instruct-v0.2-q4_0 | 7B (quantized) | ~2GB | Quantized version of Mistral 7B |
Code Examples
Basic Ollama Example
import ollama
# Simple text generation with Ollama
response = ollama.generate(
model='llama2:1b',
prompt='Write a short poem about artificial intelligence',
)
print(response['response'])
Using Phi-2 Model
import ollama
# Using Phi-2 model for inference
response = ollama.generate(
model='phi2',
prompt='Explain how neural networks work in 3 sentences',
options={'temperature': 0.7, 'top_p': 0.9}
)
print("Phi-2 response:")
print(response['response'])
Using Llama.cpp
import subprocess
import json
def run_llama_inference(prompt, model="tinyllama:1.1b", temp=0.7):
"""Run inference using llama.cpp CLI with a smaller model"""
cmd = [
"llama-cpp-python",
"-m", model,
"--temp", str(temp),
"--prompt", prompt,
"--n_predict", "100"
]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout
# Example usage
prompt = "Explain the concept of machine learning to a child:"
response = run_llama_inference(prompt)
print(response)
Limitations
RTX 3050 GPU Limitations
The RTX 3050 has 4GB of VRAM, which limits the size of models you can run:
- Models over 2B parameters might require quantization
- Maximum context length may be limited
- Some complex operations may be slower compared to higher-end GPUs
- Memory-intensive tasks may cause out-of-memory errors
For best performance, stick to models under 2B parameters or properly quantized larger models.
Frequently Asked Questions
The RTX 3050 with 4GB VRAM works best with models under 2B parameters. We recommend Llama-2-1B, Phi-2, TinyLlama, or quantized versions of larger models like Mistral-7B-q4.
Currently, you cannot install custom packages. The platform comes with pre-installed packages including Ollama, llama.cpp, and common ML libraries.
Code execution is limited to 5 minutes per run. If your code exceeds this limit, it will be automatically terminated.
Currently, there is no built-in save functionality. We recommend copying your code to a local file before closing the browser tab. We plan to add code saving in a future update.