### 1. Purpose The **NexaSci Agent Kit** is a self-contained, local-first agent stack built around: - **NexaSci Assistant** — a 10B post-trained scientific reasoning model - **SPECTER (or similar)** — a scientific paper embedding model - **Tool Server** — FastAPI-based tool-calling backend - **Sandbox Environment** — controlled Python execution + scientific libraries - **Simple Web UI** — local interface for interactive use The kit is designed to: - Let technical users run the **full scientific agent locally** (on their own GPU) - Provide a **reusable template** for future agents (e.g., SWE, bio, materials) - Integrate **reasoning, retrieval, code, and scientific tools** in one place - Avoid any requirement for hosted services / managed SaaS ---------- ### 2. High-Level Architecture **Components:** 1. **LLM: NexaSci Assistant** - 10B model - Post-trained for: - tool calling (JSON ToolCall / ToolResult protocol) - structured scientific outputs (hypothesis, methodology, limitations, etc.) - paper usage + citations - self-assessment (“I’m not sure → call tools”) 2. **Embedding Model: SPECTER (or similar)** - Scientific document embedding model - Used to: - embed paper abstracts / sections - perform semantic search over a local corpus - support similarity queries for the agent - Runs on CPU or GPU (optional acceleration) 3. **Tool Server (FastAPI)** - Exposes tools to NexaSci: - `python.run`: sandboxed Python executor - `papers.search`: query external APIs or local index - `papers.fetch`: get metadata/abstracts - `papers.search_corpus`: query SPECTER-based local corpus (optional) - Can be extended with: - chemistry engines (e.g., RDKit-ish workflows) - PDE solvers (e.g., Fenics-like wrappers) - quantum simulation stubs 4. **Agent Controller** - Orchestrates the agent loop: - send user prompt + history to LLM - parse tool calls - call tool server - feed back results - stop on `final` message - Stateless, minimal, and reusable across agents 5. **Web UI** - Lightweight, local-only UI - Provides: - input box - streaming output - optional view of tool traces - Built with something simple (e.g. FastAPI + HTML/JS, or Gradio/Streamlit) ---------- ### 3. Repository Layout Proposed repo structure: `nexa-sci-agent-kit/ SPEC.md README.md docker/ Dockerfile # GPU-accelerated base image docker-compose.yml # optional, for combined agent+tools+ui agent/ controller.py # agent loop (LLM ↔ tools) client_llm.py # NexaSci loading + chat interface (transformers/vLLM) tool_client.py # HTTP client for FastAPI tools config.yaml # model + server config (ports, endpoints, HF repo) tools/ server.py # FastAPI app exposing tools schemas.py # Pydantic models for ToolCall/ToolResult python_sandbox.py # sandboxing helpers paper_sources/ arxiv_client.py pubmed_client.py corpus_search.py # SPECTER-based local search webui/ app.py # minimal web server (can be Gradio/Streamlit/FastAPI) static/ # JS/CSS assets (if needed) templates/ # optional HTML templates examples/ run_local_agent.py # CLI demo (no UI) sample_prompts.md # curated example prompts scripts/ download_models.py # pull NexaSci + SPECTER weights init_corpus.py # optional: build local paper index install.sh # convenience installer requirements.txt` This layout is **reusable**: swap `client_llm.py` + tools, and you have a SWE agent kit. ---------- ### 4. Models #### 4.1 NexaSci Assistant (LLM) - **Weights:** hosted on Hugging Face (e.g. `darkstar/nexa-sci-10b`) - **Form:** merged distilled + tool-calling QLoRA - **Capabilities:** - Hypothesis + methodology generation - Tool calling (Python, paper search) - Structured JSON final reports - Uncertainty detection → calls tools when unsure **Load options:** - **Transformers** (`AutoModelForCausalLM`) for simplicity - **vLLM** for GPU-accelerated inference with long contexts / parallel requests Config in `agent/config.yaml`: `model_repo: "darkstar/nexa-sci-10b" backend: "vllm" # or "transformers" max_tokens: 1024 temperature: 0.3 top_p: 0.9 tool_prefix: "~~~toolcall" tool_suffix: "~~~" final_prefix: "~~~final" final_suffix: "~~~"` #### 4.2 Embedding Model (SPECTER or similar) - **Weights:** e.g. a SPECTER HF repo - **Use:** - embed titles/abstracts/sections - populate FAISS / similar index - support `papers.search_corpus` tool Config in `agent/config.yaml`: `embedding_model_repo: "allenai/specter2_base" # example embedding_device: "cuda" # or "cpu"` ---------- ### 5. Tool Server & Sandbox #### 5.1 FastAPI Tool Server `tools/server.py`: - Endpoint examples: - `POST /tools/python.run` - Input: `{ "code": "...", "timeout_s": 5 }` - Output: `{ "stdout": "...", "stderr": "...", "artifacts": [] }` - `POST /tools/papers.search` - Input: `{ "query": "...", "top_k": 10 }` - Output: `[ { "title": "...", "abstract": "...", "doi": "...", "year": 2020 } ]` - `POST /tools/papers.fetch` - Input: `{ "doi": "10.XXXX/..." }` - Output: `{ "title": "...", "abstract": "...", "bibtex": "...", ... }` - `POST /tools/papers.search_corpus` (optional, embedding-based) - Input: `{ "query": "...", "top_k": 20 }` - Output: `[ { "paper_id": "...", "title": "...", "abstract": "...", "score": 0.87 } ]` #### 5.2 Python Sandbox `tools/python_sandbox.py` handles: - Execution in a restricted namespace: - `numpy`, `scipy`, `pandas`, `matplotlib` available - optional domain libs: `sympy`, `rdkit`, `ase`, simple PDE solvers - Constraints: - time limit (e.g. 5–10 seconds) - memory limit (via resource module) - no file system access outside a temp dir - no network - Returns: - stdout / stderr - optional artifact paths (e.g. plots in `/tmp/artifacts`) This gives the agent a **safe-ish** playground for: - simple chemistry calcs - ODE/PDE toy simulations - statistical summaries - plotting (Domain-heavy engines can be added as specialized tools later.) ---------- ### 6. Agent Controller `agent/controller.py` implements the core loop: 1. Initialize messages with: - system prompt (scientific assistant, tool protocol) - user prompt 2. Call `client_llm.generate(messages)` 3. Parse output: - If it contains a ToolCall block → parse JSON → dispatch via `tool_client.py` - Append a `tool` message with the tool result 4. Repeat until a Final block is produced 5. Return final JSON + pretty-render (for UI) Design goals: - Keep controller **stateless** and **minimal** - Use a small set of message roles: `system`, `user`, `assistant`, `tool` - Make it trivial to plug in a different LLM backend ---------- ### 7. Web UI `webui/app.py`: - Provides a local web interface: - text area for prompt - dropdown for “mode” (e.g. “Explain paper”, “Design experiment”, “Run simulation”) - button to run agent - area to show: - final answer - optional tool trace (expandable) - Implementation options: - **Gradio**: fastest way to get a web UI - **Streamlit**: also easy, nice for scientists - Or a simple HTML/JS frontend served via FastAPI This is _local-only_ by default. ---------- ### 8. Docker & GPU Acceleration #### 8.1 Dockerfile `docker/Dockerfile` (conceptual spec): - Base image: `nvidia/cuda:12.x-cudnn-runtime-ubuntu20.04` - Install: - Python 3.10+ - `pip`, `uv` or `conda` (your call) - `torch` + CUDA - `transformers`, `vllm` (optional) - `fastapi`, `uvicorn` - `sentence-transformers` or `specter` deps - `numpy`, `scipy`, `pandas`, `matplotlib` - any light scientific deps you want in v1 - Copy repo - `pip install -r requirements.txt` - Default `CMD`: - either start tool server OR start web UI - Docker Compose can spin both #### 8.2 Usage Example: `docker build -t nexa-sci-agent-kit -f docker/Dockerfile . docker run --gpus all -p 8000:8000 -p 7860:7860 nexa-sci-agent-kit` This should bring up: - tool server on port 8000 - web UI on port 7860 ---------- ### 9. Reusability / Template Design The kit is meant to be cloned as: - `nexa-sci-agent-kit` → scientific agent - `nexa-swe-agent-kit` → SWE/debugging agent - etc. To create a new agent kit, you: - swap `model_repo` in `config.yaml` - swap or extend tools in `tools/server.py` - adjust system prompt in `agent/client_llm.py` - optionally adjust UI text Everything else stays the same.