# Quick Start Guide - NexaSci Agent Kit

This guide shows you how to run the complete agent system across three tmux panes.

## Architecture Overview

The system runs in three coordinated tmux panes **on the GPU box**:

1. **Agent Tmux (GPU Box)** - Loads and serves the merged NexaSci model via HTTP API (port 8001)
2. **Tool Server Tmux** - FastAPI server exposing tools (Python sandbox, paper search, etc.) on port 8000
3. **Agent Window Tmux** - Executes the agent controller, connects to model server and tool server

**Important:** All three panes should be on the **same GPU box** (SSH into the box first).

## Setup Instructions

### Prerequisites

- GPU server with CUDA (RTX 5090 with 32GB VRAM recommended)
- Python 3.10+ with dependencies installed
- Three tmux panes/sessions **on the GPU box**

### Step 0: SSH into GPU Box

```bash
ssh root@38.80.152.77 -p 30785
```

**All commands below should be run on the GPU box after SSH'ing in.**

### Step 1: Merge the Model (One-time setup)

In your **agent tmux pane**:

```bash
cd /root/Agent_kit
source .venv/bin/activate

# Merge the post-trained LoRA adapter with the distilled model
python scripts/merge_model.py \
  --base-model "Allanatrix/Nexa_Sci_distilled_Falcon-10B" \
  --adapter-path models/adapter_model.safetensors \
  --output-dir models/merged \
  --torch-dtype bfloat16
```

This will:
- Download the distilled model (~20GB, first time only)
- Detect the LoRA rank (32) from your adapter
- Merge the adapter into the model
- Save to `models/merged/`

**Update config to use merged model:**
```bash
# Edit agent/config.yaml and set:
# merged_path: "./models/merged"
```

### Step 2: Start the Model Server (Agent Tmux - GPU Box)

In your **agent tmux pane** (where GPU is):

```bash
cd /root/Agent_kit
source .venv/bin/activate

# Start model server (loads model and serves via HTTP)
uvicorn agent.model_server:app --host 0.0.0.0 --port 8001
```

You should see:
```
Loading NexaSci model (this may take 30-60 seconds)...
✓ CUDA available: NVIDIA GeForce RTX 5090
✓ Model loaded on GPU: cuda:0
Model server ready! Listening on http://0.0.0.0:8001
```

**Keep this running** - this is where the model lives.

### Step 3: Start the Tool Server

In your **tool server tmux pane**:

```bash
cd /root/Agent_kit
source .venv/bin/activate

# Start FastAPI tool server
uvicorn tools.server:app --host 0.0.0.0 --port 8000
```

You should see:
```
INFO:     Started server process
INFO:     Uvicorn running on http://0.0.0.0:8000
```

**Keep this running** - the agent will connect to it.

### Step 4: Enable Remote Model in Config

The config should already have:
```yaml
model_server:
  base_url: "http://127.0.0.1:8001"
  enabled: true
```

**Important:** If running from the same box (which you should), `127.0.0.1` is correct.

### Step 5: Test Model Server Connection

In your **agent window tmux pane**:

```bash
cd /root/Agent_kit
source .venv/bin/activate

# Test the connection
python examples/test_model_server.py
```

You should see:
```
✓ Health check passed
✓ Generation successful
✓ Remote client works
✓ All tests passed!
```

### Step 6: Run the Agent (Agent Window Tmux)

In your **agent window tmux pane**:

```bash
cd /root/Agent_kit
source .venv/bin/activate

# Run the interactive demo (connects to model server)
python examples/demo_agent.py \
  --prompt "Design a Python simulation to model nanoparticle diffusion. Include visualization and cite relevant literature."
```

The agent will:
- Connect to model server (port 8001) - no local model loading
- Connect to tool server (port 8000)
- Show all reasoning, tool calls, and results

## Troubleshooting

### Connection Refused

If you get "Connection refused" when testing:

1. **Make sure you're on the GPU box:**
   ```bash
   # SSH into the box first
   ssh root@38.80.152.77 -p 30785
   ```

2. **Check if servers are running:**
   ```bash
   # Check model server
   curl http://127.0.0.1:8001/health
   
   # Check tool server
   curl http://127.0.0.1:8000/docs
   ```

3. **Verify servers are listening:**
   ```bash
   netstat -tlnp | grep -E '8000|8001'
   ```

### GPU Not Available

If `gpu_available: false`:

1. Check CUDA:
   ```bash
   python scripts/check_cuda.py
   nvidia-smi
   ```

2. Reinstall PyTorch with CUDA:
   ```bash
   pip uninstall -y torch torchvision torchaudio
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
   ```

### Model Server Not Loading

- Check GPU memory: `nvidia-smi`
- Check logs in agent tmux for errors
- Verify model path in `agent/config.yaml`

## Architecture Diagram

```
┌─────────────────┐
│  Agent Window   │
│  (Controller)   │
└────────┬────────┘
         │
         ├──→ Remote Client ──→ Model Server (port 8001) ──→ GPU Box
         │
         └──→ Tool Client ──→ Tool Server (port 8000)
                              ├── python.run
                              ├── papers.search
                              ├── papers.fetch
                              └── papers.search_corpus
```