### 1. Purpose

The **NexaSci Agent Kit** is a self-contained, local-first agent stack built around:

-   **NexaSci Assistant** — a 10B post-trained scientific reasoning model
    
-   **SPECTER (or similar)** — a scientific paper embedding model
    
-   **Tool Server** — FastAPI-based tool-calling backend
    
-   **Sandbox Environment** — controlled Python execution + scientific libraries
    
-   **Simple Web UI** — local interface for interactive use
    

The kit is designed to:

-   Let technical users run the **full scientific agent locally** (on their own GPU)
    
-   Provide a **reusable template** for future agents (e.g., SWE, bio, materials)
    
-   Integrate **reasoning, retrieval, code, and scientific tools** in one place
    
-   Avoid any requirement for hosted services / managed SaaS
    

----------

### 2. High-Level Architecture

**Components:**

1.  **LLM: NexaSci Assistant**
    
    -   10B model
        
    -   Post-trained for:
        
        -   tool calling (JSON ToolCall / ToolResult protocol)
            
        -   structured scientific outputs (hypothesis, methodology, limitations, etc.)
            
        -   paper usage + citations
            
        -   self-assessment (“I’m not sure → call tools”)
            
2.  **Embedding Model: SPECTER (or similar)**
    
    -   Scientific document embedding model
        
    -   Used to:
        
        -   embed paper abstracts / sections
            
        -   perform semantic search over a local corpus
            
        -   support similarity queries for the agent
            
    -   Runs on CPU or GPU (optional acceleration)
        
3.  **Tool Server (FastAPI)**
    
    -   Exposes tools to NexaSci:
        
        -   `python.run`: sandboxed Python executor
            
        -   `papers.search`: query external APIs or local index
            
        -   `papers.fetch`: get metadata/abstracts
            
        -   `papers.search_corpus`: query SPECTER-based local corpus (optional)
            
    -   Can be extended with:
        
        -   chemistry engines (e.g., RDKit-ish workflows)
            
        -   PDE solvers (e.g., Fenics-like wrappers)
            
        -   quantum simulation stubs
            
4.  **Agent Controller**
    
    -   Orchestrates the agent loop:
        
        -   send user prompt + history to LLM
            
        -   parse tool calls
            
        -   call tool server
            
        -   feed back results
            
        -   stop on `final` message
            
    -   Stateless, minimal, and reusable across agents
        
5.  **Web UI**
    
    -   Lightweight, local-only UI
        
    -   Provides:
        
        -   input box
            
        -   streaming output
            
        -   optional view of tool traces
            
    -   Built with something simple (e.g. FastAPI + HTML/JS, or Gradio/Streamlit)
        

----------

### 3. Repository Layout

Proposed repo structure:

`nexa-sci-agent-kit/
  SPEC.md
  README.md

  docker/
    Dockerfile            # GPU-accelerated base image
    docker-compose.yml    # optional, for combined agent+tools+ui

  agent/
    controller.py         # agent loop (LLM ↔ tools)
    client_llm.py         # NexaSci loading + chat interface (transformers/vLLM)
    tool_client.py        # HTTP client for FastAPI tools
    config.yaml           # model + server config (ports, endpoints, HF repo)

  tools/
    server.py             # FastAPI app exposing tools
    schemas.py            # Pydantic models for ToolCall/ToolResult
    python_sandbox.py     # sandboxing helpers
    paper_sources/
      arxiv_client.py
      pubmed_client.py
      corpus_search.py    # SPECTER-based local search

  webui/
    app.py                # minimal web server (can be Gradio/Streamlit/FastAPI)
    static/               # JS/CSS assets (if needed)
    templates/            # optional HTML templates

  examples/
    run_local_agent.py    # CLI demo (no UI)
    sample_prompts.md     # curated example prompts

  scripts/
    download_models.py    # pull NexaSci + SPECTER weights
    init_corpus.py        # optional: build local paper index
    install.sh            # convenience installer

  requirements.txt` 

This layout is **reusable**: swap `client_llm.py` + tools, and you have a SWE agent kit.

----------

### 4. Models

#### 4.1 NexaSci Assistant (LLM)

-   **Weights:** hosted on Hugging Face (e.g. `darkstar/nexa-sci-10b`)
    
-   **Form:** merged distilled + tool-calling QLoRA
    
-   **Capabilities:**
    
    -   Hypothesis + methodology generation
        
    -   Tool calling (Python, paper search)
        
    -   Structured JSON final reports
        
    -   Uncertainty detection → calls tools when unsure
        

**Load options:**

-   **Transformers** (`AutoModelForCausalLM`) for simplicity
    
-   **vLLM** for GPU-accelerated inference with long contexts / parallel requests
    

Config in `agent/config.yaml`:

`model_repo:  "darkstar/nexa-sci-10b"  backend:  "vllm"  # or "transformers"  max_tokens:  1024  temperature:  0.3  top_p:  0.9  tool_prefix:  "~~~toolcall"  tool_suffix:  "~~~"  final_prefix:  "~~~final"  final_suffix:  "~~~"` 

#### 4.2 Embedding Model (SPECTER or similar)

-   **Weights:** e.g. a SPECTER HF repo
    
-   **Use:**
    
    -   embed titles/abstracts/sections
        
    -   populate FAISS / similar index
        
    -   support `papers.search_corpus` tool
        

Config in `agent/config.yaml`:

`embedding_model_repo:  "allenai/specter2_base"  # example  embedding_device:  "cuda"  # or "cpu"` 

----------

### 5. Tool Server & Sandbox

#### 5.1 FastAPI Tool Server

`tools/server.py`:

-   Endpoint examples:
    
    -   `POST /tools/python.run`
        
        -   Input: `{ "code": "...", "timeout_s": 5 }`
            
        -   Output: `{ "stdout": "...", "stderr": "...", "artifacts": [] }`
            
    -   `POST /tools/papers.search`
        
        -   Input: `{ "query": "...", "top_k": 10 }`
            
        -   Output: `[ { "title": "...", "abstract": "...", "doi": "...", "year": 2020 } ]`
            
    -   `POST /tools/papers.fetch`
        
        -   Input: `{ "doi": "10.XXXX/..." }`
            
        -   Output: `{ "title": "...", "abstract": "...", "bibtex": "...", ... }`
            
    -   `POST /tools/papers.search_corpus` (optional, embedding-based)
        
        -   Input: `{ "query": "...", "top_k": 20 }`
            
        -   Output: `[ { "paper_id": "...", "title": "...", "abstract": "...", "score": 0.87 } ]`
            

#### 5.2 Python Sandbox

`tools/python_sandbox.py` handles:

-   Execution in a restricted namespace:
    
    -   `numpy`, `scipy`, `pandas`, `matplotlib` available
        
    -   optional domain libs: `sympy`, `rdkit`, `ase`, simple PDE solvers
        
-   Constraints:
    
    -   time limit (e.g. 5–10 seconds)
        
    -   memory limit (via resource module)
        
    -   no file system access outside a temp dir
        
    -   no network
        
-   Returns:
    
    -   stdout / stderr
        
    -   optional artifact paths (e.g. plots in `/tmp/artifacts`)
        

This gives the agent a **safe-ish** playground for:

-   simple chemistry calcs
    
-   ODE/PDE toy simulations
    
-   statistical summaries
    
-   plotting
    

(Domain-heavy engines can be added as specialized tools later.)

----------

### 6. Agent Controller

`agent/controller.py` implements the core loop:

1.  Initialize messages with:
    
    -   system prompt (scientific assistant, tool protocol)
        
    -   user prompt
        
2.  Call `client_llm.generate(messages)`
    
3.  Parse output:
    
    -   If it contains a ToolCall block → parse JSON → dispatch via `tool_client.py`
        
    -   Append a `tool` message with the tool result
        
4.  Repeat until a Final block is produced
    
5.  Return final JSON + pretty-render (for UI)
    

Design goals:

-   Keep controller **stateless** and **minimal**
    
-   Use a small set of message roles: `system`, `user`, `assistant`, `tool`
    
-   Make it trivial to plug in a different LLM backend
    

----------

### 7. Web UI

`webui/app.py`:

-   Provides a local web interface:
    
    -   text area for prompt
        
    -   dropdown for “mode” (e.g. “Explain paper”, “Design experiment”, “Run simulation”)
        
    -   button to run agent
        
    -   area to show:
        
        -   final answer
            
        -   optional tool trace (expandable)
            
-   Implementation options:
    
    -   **Gradio**: fastest way to get a web UI
        
    -   **Streamlit**: also easy, nice for scientists
        
    -   Or a simple HTML/JS frontend served via FastAPI
        

This is _local-only_ by default.

----------

### 8. Docker & GPU Acceleration

#### 8.1 Dockerfile

`docker/Dockerfile` (conceptual spec):

-   Base image: `nvidia/cuda:12.x-cudnn-runtime-ubuntu20.04`
    
-   Install:
    
    -   Python 3.10+
        
    -   `pip`, `uv` or `conda` (your call)
        
    -   `torch` + CUDA
        
    -   `transformers`, `vllm` (optional)
        
    -   `fastapi`, `uvicorn`
        
    -   `sentence-transformers` or `specter` deps
        
    -   `numpy`, `scipy`, `pandas`, `matplotlib`
        
    -   any light scientific deps you want in v1
        
-   Copy repo
    
-   `pip install -r requirements.txt`
    
-   Default `CMD`:
    
    -   either start tool server OR start web UI
        
    -   Docker Compose can spin both
        

#### 8.2 Usage

Example:

`docker build -t nexa-sci-agent-kit -f docker/Dockerfile .
docker run --gpus all -p 8000:8000 -p 7860:7860 nexa-sci-agent-kit` 

This should bring up:

-   tool server on port 8000
    
-   web UI on port 7860
    

----------

### 9. Reusability / Template Design

The kit is meant to be cloned as:

-   `nexa-sci-agent-kit` → scientific agent
    
-   `nexa-swe-agent-kit` → SWE/debugging agent
    
-   etc.
    

To create a new agent kit, you:

-   swap `model_repo` in `config.yaml`
    
-   swap or extend tools in `tools/server.py`
    
-   adjust system prompt in `agent/client_llm.py`
    
-   optionally adjust UI text
    

Everything else stays the same.