Documentation

Getting Started

Using SupaScale

Welcome to SupaScale! Here's how to get started:

Navigate to the API page.
Select your GPU and AI model based on your requirements.
Sign up to get your API key and start computing instantly.
Integrate our simple API into your application.

Ollama Models

Our platform supports running Ollama models. Here are the recommended models for our RTX 3050 GPU with 4GB VRAM:

Model	Parameters	VRAM Required	Best For
Llama-2-1B	1B	~2GB	Fast text generation with good quality
Phi-2	2B	~3GB	High-quality 2B parameter model
TinyLlama	1.1B	~2GB	Efficient small language model
Mistral-7B-instruct-v0.2-q4_0	7B (quantized)	~2GB	Quantized version of Mistral 7B

Code Examples

Basic Ollama Example

import ollama

# Simple text generation with Ollama
response = ollama.generate(
    model='llama2:1b', 
    prompt='Write a short poem about artificial intelligence',
)

print(response['response'])

Using Phi-2 Model

import ollama

# Using Phi-2 model for inference
response = ollama.generate(
    model='phi2', 
    prompt='Explain how neural networks work in 3 sentences',
    options={'temperature': 0.7, 'top_p': 0.9}
)

print("Phi-2 response:")
print(response['response'])

Using Llama.cpp

import subprocess
import json

def run_llama_inference(prompt, model="tinyllama:1.1b", temp=0.7):
    """Run inference using llama.cpp CLI with a smaller model"""
    cmd = [
        "llama-cpp-python",
        "-m", model,
        "--temp", str(temp),
        "--prompt", prompt,
        "--n_predict", "100"
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout

# Example usage
prompt = "Explain the concept of machine learning to a child:"
response = run_llama_inference(prompt)
print(response)

Limitations

RTX 3050 GPU Limitations

The RTX 3050 has 4GB of VRAM, which limits the size of models you can run:

Models over 2B parameters might require quantization
Maximum context length may be limited
Some complex operations may be slower compared to higher-end GPUs
Memory-intensive tasks may cause out-of-memory errors

For best performance, stick to models under 2B parameters or properly quantized larger models.

Frequently Asked Questions

The RTX 3050 with 4GB VRAM works best with models under 2B parameters. We recommend Llama-2-1B, Phi-2, TinyLlama, or quantized versions of larger models like Mistral-7B-q4.

Currently, you cannot install custom packages. The platform comes with pre-installed packages including Ollama, llama.cpp, and common ML libraries.

Code execution is limited to 5 minutes per run. If your code exceeds this limit, it will be automatically terminated.

Currently, there is no built-in save functionality. We recommend copying your code to a local file before closing the browser tab. We plan to add code saving in a future update.

Getting Started

Using SupaScale

Ollama Models

Code Examples

Limitations

RTX 3050 GPU Limitations

Frequently Asked Questions

What models can I run on the RTX 3050 GPU?

How do I install or import custom Python packages?

What is the execution time limit?

Can I save my code for later use?