Policy Caliper
AI-powered risk analysis for Terms of Service and Privacy Policies. Upload a policy (or two versions), get a structured risk report with citations, exportable to PDF, and MCP-friendly tooling.
Hackathon Submission
- Track:
building-mcp-track-enterprise&mcp-in-action-track-enterprise - Hackathon: MCP 1st Birthday Hackathon 2025
- Organization: MCP-1st-Birthday
- Author: https://huggingface.co/Warcos
- Pitch: fast, MCP-native risk radar for policies with agentic workflow and PDF export.
- Repo: https://huggingface.co/Warcos/PolicyCaliper
Live Demo
Try it: https://huggingface.co/spaces/MCP-1st-Birthday/Policy_Caliper
Demo Video
Demo video: https://drive.google.com/file/d/159M7GqItbvtp4UmsgypwqgNcxs2e_wd5/view
Social Post
Official post: https://www.linkedin.com/posts/marcosgarest_mcps1stbirthdayhackathon-activity-7400896699755470848-uE23
What is Policy Caliper?
Policy Caliper is a Gradio app + MCP server that reads Terms/Privacy Policies and produces a risk report with topic-level severities, rationales, and citations. Unlike a generic “chat with your PDF” approach, the LLM is constrained to the structured MCP report (topics, risks, citations) instead of the raw policy text, making answers easier to audit and less prone to hallucinations. It supports:
- Single mode: radiography of one policy.
- Compare mode: diff between two versions (A/B), e.g. an older ToS vs the updated one after a "we've updated our terms" email, with topic changes and risk deltas.
Built to showcase an end-to-end MCP workflow: typed tools, LlamaIndex pipeline, and a polished UI with streaming progress and PDF export.
Key Features
- Dual modes: single-policy risk scan or A/B compare with semantic diff.
- Typed MCP tools: parse, profile, diff, assess risk, and generate report via FastMCP (
policy-drift-mcp). - LlamaIndex pipeline: LlamaParse/WebPageReader for PDFs/URLs, topic‑level VectorStoreIndex and structured citations that act as the single knowledge layer between raw policies and the MCP tools.
- Streaming UX: live logs, per-step progress with spinner, and elapsed timer.
- Filter & export: filter risks by severity/topic, export the HTML report to PDF.
- Before/after clarity: compare mode highlights A/B snippets, severity deltas, and LLM rationales so changes read as “before vs now” (old policy vs updated ToS).
- Safety rails: stricter prompt rubric (no default “medium”), JSON-only parsing with fallbacks to avoid noisy errors.
- Assistant chat: grounded Q&A on the latest single/compare report (findings, topics, sections, diff); answers via Gemini with fallback, no tool re-run.
How it works
- User uploads PDF(s) or enters URLs in the Gradio UI and clicks Analyze.
- Orchestrator validates inputs and decides single vs compare.
- MCP tools run the pipeline:
parse_policy→ parse PDF/URL into documents.profile_policy→ LlamaIndex index + topic summaries and citations.diff_policies(compare mode) → topic-level change types.assess_risk→ batch LLM scoring (0–4) mapped to low/medium/high with rubric.generate_report→ HTML/JSON report and optional PDF export.
- UI streams logs/progress, then shows the full report; export to PDF on demand.
- Assistant tab (optional): uses the latest
report_json(findings, topics/citations, sections A/B, diff) for grounded answers; it does not re-run tools.
MCP Details
Server: policy-drift-mcp (FastMCP).
- parse_policy(input_source: str)
Detects URL vs file path, parses via LlamaParse or WebPageReader. If norun_idis provided, one is generated. By default it caches the full document server-side and returns only metadata +run_id(setlightweight=Falseto return the documents). - profile_policy(document: Dict, lightweight=True)
Builds LlamaIndexVectorStoreIndex, queries per topic, returnsPolicyProfile. Ifrun_idis present (argument or in the document), it will fetch cached docs fromparse_policy. Withlightweight=True(default for MCP), it trims the payload (one citation per topic, summaries trimmed, sections without body). Setlightweight=Falseif you need the full profile (used internally by Gradio). - diff_policies(profile_a, profile_b)
Topic-level change classification (added/removed/modified/unchanged) with executive summary. - assess_risk(profile_or_diff)
Batch prompt (0–4 rubric) → normalized severities andRiskSummary/DetailedRiskSummary. - generate_report(profile_or_diff_with_risk, risk_summary, profile_a?, profile_b?)
Returnsreport_html,report_json, and PDF path via WeasyPrint.
Recommended MCP flow (to avoid large payloads): call parse_policy first and reuse the returned run_id in profile_policy / diff_policies / assess_risk / generate_report. Keep lightweight=True (default) unless you explicitly need the full document inline.
Assistant mode (MCP in Action)
- The orchestrator decides single vs compare and runs the full MCP tool chain once (parse → profile → diff → assess → generate) to build a structured
report_json. - The Assistant then answers questions by interpreting the query, selecting relevant findings/topics/diffs from that MCP output, and responding with severities and citations.
- By design, it reuses the latest MCP analysis instead of re-parsing the PDF on every turn: the LLM only sees the structured MCP outputs (findings, topics, diffs, citations), not the full policy text, which keeps answers grounded and auditable.
- If no report exists yet (UX Gradio), run the workflow (single/compare) first; in this demo the Assistant does not auto-trigger tools—users run the workflow, then the agent focuses on grounded Q&A to keep latency and costs predictable.
How LlamaIndex is used
- Parsers:
parse_policytries LlamaParse for PDFs (falls back toSimpleDirectoryReader), andSimpleWebPageReaderfor URLs; results stay in-memory perrun_id. - Chunking + Index:
profile_policyusesSentenceSplitterto build nodes and aVectorStoreIndexwith Gemini embeddings (gemini-embedding-001, 1536 dims) for retrieval. - Per-topic querying: For each taxonomy topic, a LlamaIndex query engine (Gemini 2.5 Flash Lite) retrieves relevant nodes; responses feed topic summaries and citation anchors.
- Citations: Source nodes from LlamaIndex supply section IDs and previews, reused in reports and diffs to show “Before/After” snippets.
- In-memory caching:
run_contextcaches documents, sections, and indices keyed byrun_id, so repeated calls in the same session avoid re-parsing/re-indexing.
Taken together, this makes LlamaIndex the core of the analysis pipeline: MCP tools never hit the raw PDF directly, they always operate on the structured view that LlamaIndex builds and maintains.
MCP client snippet (Claude Code)
claude mcp add --transport http policy-drift-mcp http://127.0.0.1:9100/mcp/
Quick Start
1) Use online
- Open the Space: https://huggingface.co/spaces/MCP-1st-Birthday/Policy_Caliper
- Upload a PDF or paste a policy URL.
- (Optional) Add Policy B for comparison.
- Click Analyze Policy and watch the live progress.
- Filter by severity/topic or export to PDF.
Notes for the Hugging Face demo:
- LlamaParse: if free credits run out the app falls back to
SimpleDirectoryReaderand PDF parsing quality may be slightly lower. - Gemini 2.5 Flash Lite: free quota is ~1000 requests/day; if it is exhausted, please try again the next day.
- Gemini embeddings: embeddings use their own Gemini free quota (~1000 requests/day); if it is exhausted, please try again the next day.
2) Run locally
git clone https://huggingface.co/Warcos/PolicyCaliper.git
cd PolicyCaliper
pip install -r requirements.txt
cp .env.example .env # create the file and fill your keys
python app.py
Environment vars (see Configuration below) must be set before running.
Architecture
User (browser)
-> Gradio UI (ui/gradio_app.py)
-> Orchestrator (agents/orchestrator.py)
-> MCP client
-> FastMCP server (policy-drift-mcp)
-> Tools: parse_policy, profile_policy, diff_policies, assess_risk, generate_report
-> LlamaIndex parsers + vector index + topic queries
-> Report HTML/JSON + PDF export (WeasyPrint)
For a deeper architecture/design overview (Spanish), see Extra/Guias/; they’re useful as extra context if you want an LLM to answer repo questions without browsing the code.
Example interactions
Workflow tab
Single
Upload a PDF or paste a policy URL and click Analyze to generate a structured risk report by topic and severity.Compare
Upload the previous ToS as Policy A and the updated ToS as Policy B and click Analyze to see topic-level changes and risk deltas between versions.For example, you can try these public General Terms and Conditions from Serrala to see how the tool surfaces changes over time:
Assistant tab (after running a workflow)
Once the workflow has produced a report, the Assistant can answer grounded questions on top of that analysis. For example:
- “Summarize the high severity risks in this policy and explain them in plain language.”
- “Explain why children’s data is rated as high risk.”
- “Between the old and the new policy, what changed for AI training on user data?”
How this project meets the judging criteria
- Design and UX
Gradio workflow with explicit single and compare modes, streaming progress per MCP tool, filter chips for severity and topic, and on‑demand PDF export of the report. - MCP implementation and functionality
Typed FastMCP server (policy-drift-mcp) with a full toolchain (parse, profile, diff, assess, report),run_idbased caching, and lightweight payloads designed to work both in the Space and in external MCP clients (Claude Code). - Agentic behavior
An orchestrator decides between single and compare, runs the MCP pipeline once to build a structuredreport_json, and the Assistant then performs grounded Q&A over that report instead of hitting the raw PDF on every turn. - Real‑world use case
Focused on legal and privacy teams that need to quickly surface risks and policy changes (e.g. when a service updates its ToS and you want to compare the previous version vs the new one) without reading the full documents, while keeping citations and diffs visible for manual review.
Tech Stack
- Gradio: UI/UX with streaming progress, filters, PDF export.
- MCP (FastMCP):
policy-drift-mcpserver with typed tools for parsing, profiling, diff, risk, reporting. - LlamaIndex: parsers (PDF/URL), vector index, topic queries, citations.
- Google Gemini 2.5 Flash Lite: LLM via LiteLLM for risk scoring and grounded Assistant answers.
- Gemini embeddings (gemini-embedding-001, 1536 dims): vector store for topic retrieval.
- WeasyPrint: HTML → PDF export.
Configuration
Environment variables (.env):
GEMINI_API_KEY— for LLM (gemini/gemini-2.5-flash-lite) and embeddings (gemini-embedding-001).- Optional:
STRICT_LOCAL_MODE=1to force local parsing and block remote URLs.
Settings (settings.yaml):
llm.model:gemini/gemini-2.5-flash-literate_limit.llm.requests_per_minute: e.g., 15embeddings.model:gemini-embedding-001embeddings.output_dimensionality: 1536embeddings.document_task_type:RETRIEVAL_DOCUMENTembeddings.query_task_type:RETRIEVAL_QUERY
Project Structure (high level)
app.py— Gradio entrypoint.ui/— Gradio UI and styling.agents/— orchestrator and tool runners.mcp_server/— FastMCP server + tool definitions.domain/— taxonomy, rules engine, report templates.llama_index_integration/— models and parsers.persistence/— in-memory session cache.Extra/Guias/— internal guides (Spanish).
Notes
- The code defaults to Google Gemini 2.5 Flash Lite (via LiteLLM) for both risk scoring and grounded Assistant answers. You can switch to any provider/model that supports tool-calling style interfaces and JSON responses; update
settings.yamland env vars accordingly.
Team (Solo)
- Marcos Garcia (https://huggingface.co/Warcos)
License
Apache 2.0