Lineage-graph-accelerator

Running

App Files Files Community

aamanlamba commited on 23 days ago

Commit

60ac2eb

1 Parent(s): 5bb0a78

first version - lineage extractor

Browse files

Files changed (15) hide show

.env.example +17 -0
.gitignore +49 -0
.python-version +1 -0
DEPLOYMENT.md +226 -0
LOCAL_SETUP.md +474 -0
SETUP_GUIDE.md +225 -0
app.py +228 -0
integration_example.py +149 -0
main.py +6 -0
memories/agent.md +162 -0
memories/graph_visualizer/agent.md +84 -0
memories/subagents/agent.md +60 -0
pyproject.toml +7 -0
requirements.txt +6 -0
test_setup.py +221 -0

.env.example ADDED Viewed

	@@ -0,0 +1,17 @@

+# Example environment variables for Lineage Graph Extractor
+# Copy this file to .env and fill in your actual values
+# Anthropic API Key (for Claude AI agent)
+ANTHROPIC_API_KEY=your_anthropic_api_key_here
+# Google Cloud (for BigQuery integration)
+GOOGLE_CLOUD_PROJECT=your-gcp-project-id
+GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account-key.json
+# Optional: Custom API endpoints
+# METADATA_API_URL=https://your-metadata-api.com
+# Optional: MCP Server Configuration
+# MCP_SERVER_URL=https://your-mcp-server.com
+# MCP_API_KEY=your_mcp_api_key

.gitignore ADDED Viewed

	@@ -0,0 +1,49 @@

+# Environment variables
+.env
+.venv
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+env/
+ENV/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Credentials
+*.json
+service-account-*.json
+credentials.json
+# Logs
+*.log

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.12

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,226 @@

+# Quick Deployment Guide
+Follow these steps to deploy the Lineage Graph Extractor to Hugging Face Spaces:
+## Quick Start (5 minutes)
+### 1. Create Space
+```bash
+# Go to: https://huggingface.co/new-space
+# Choose: Gradio SDK
+# Hardware: CPU Basic (free)
+```
+### 2. Upload Files
+Upload these files from `/hf_space/` to your Space:
+- ✅ `app.py`
+- ✅ `requirements.txt`
+- ✅ `README.md`
+- ⚠️ `.env.example` (optional reference)
+- ⚠️ `SETUP_GUIDE.md` (optional)
+### 3. Add Secrets
+In Space Settings → Repository Secrets, add:
+- `ANTHROPIC_API_KEY` - Your Claude API key (**required**)
+- `GOOGLE_CLOUD_PROJECT` - For BigQuery (optional)
+### 4. Wait for Build
+- Space will automatically build (2-3 minutes)
+- Check "Logs" tab for any errors
+- Once ready, the app will be live!
+## Detailed Step-by-Step
+### Method 1: Web Interface (Easiest)
+1. **Create Space**
+   - Go to https://huggingface.co/spaces
+   - Click "Create new Space"
+   - Name: `lineage-graph-extractor`
+   - SDK: Gradio
+   - Click "Create Space"
+2. **Upload Files**
+   - Click "Files and versions"
+   - Click "Add file" → "Upload files"
+   - Select all files from `/hf_space/`
+   - Click "Commit changes"
+3. **Configure Secrets**
+   - Click "Settings"
+   - Scroll to "Repository secrets"
+   - Add `ANTHROPIC_API_KEY` with your API key
+   - Save
+4. **Verify Deployment**
+   - Go to "App" tab
+   - Wait for build to complete
+   - Test the interface
+### Method 2: Git CLI (For Developers)
+```bash
+# Clone your Space
+git clone https://huggingface.co/spaces/YOUR_USERNAME/lineage-graph-extractor
+cd lineage-graph-extractor
+# Copy files (adjust path to where you saved the files)
+cp /path/to/hf_space/app.py .
+cp /path/to/hf_space/requirements.txt .
+cp /path/to/hf_space/README.md .
+# Commit and push
+git add .
+git commit -m "Initial deployment"
+git push
+```
+Then add secrets via the web interface (Settings → Repository secrets).
+### Method 3: Hugging Face CLI
+```bash
+# Install Hugging Face CLI
+pip install huggingface_hub
+# Login
+huggingface-cli login
+# Create Space
+huggingface-cli repo create lineage-graph-extractor --type space --space_sdk gradio
+# Upload files
+huggingface-cli upload lineage-graph-extractor /path/to/hf_space/ .
+```
+## Important: Connect Your Agent
+⚠️ **The template needs your agent integration!**
+The `app.py` file contains placeholder functions. You need to integrate your actual agent:
+### Quick Integration Example
+Edit `app.py` and replace the `extract_lineage_from_text` function:
+```python
+import anthropic
+import os
+client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
+def extract_lineage_from_text(metadata_text, source_type, viz_format):
+    """Extract lineage using Claude AI agent."""
+    prompt = f"""
+    You are a lineage extraction expert. Extract data lineage from this {source_type} metadata
+    and create a {viz_format} visualization.
+    Metadata:
+    {metadata_text}
+    Return:
+    1. The visualization code
+    2. A brief summary
+    """
+    response = client.messages.create(
+        model="claude-3-5-sonnet-20241022",
+        max_tokens=4000,
+        messages=[{"role": "user", "content": prompt}]
+    )
+    # Parse response to extract visualization and summary
+    text = response.content[0].text
+    # Simple parsing (improve this based on your needs)
+    parts = text.split("---")
+    visualization = parts[0] if len(parts) > 0 else text
+    summary = parts[1] if len(parts) > 1 else "Lineage extracted successfully"
+    return visualization.strip(), summary.strip()
+```
+### Using Agent Memory Files
+To use your full agent configuration:
+1. Copy `/memories/` directory to Space:
+```bash
+cp -r /memories /path/to/space/
+```
+2. Reference agent instructions in your code:
+```python
+with open("memories/agent.md") as f:
+    agent_instructions = f.read()
+# Use instructions in prompts
+```
+## Post-Deployment
+### Test Functionality
+1. ✅ Text/File extraction works
+2. ✅ BigQuery integration (if configured)
+3. ✅ URL fetching works
+4. ✅ Visualizations render correctly
+### Optimize Performance
+- Upgrade hardware if needed (Settings → Hardware)
+- Add caching for repeated queries
+- Implement rate limiting
+### Share Your Space
+- Make it public (Settings → Visibility)
+- Share URL: `https://huggingface.co/spaces/YOUR_USERNAME/lineage-graph-extractor`
+- Add to your profile or collection
+## Costs
+- **Basic CPU**: Free forever ✅
+- **Upgraded CPU**: ~$0.03/hour
+- **GPU**: ~$0.60/hour (if needed for heavy processing)
+- **API costs**: Anthropic Claude API usage (pay-as-you-go)
+## Troubleshooting
+### Build Fails
+- Check requirements.txt for incompatible versions
+- Review logs for specific error messages
+- Ensure Python 3.9+ compatibility
+### App Won't Load
+- Verify `app.py` has no syntax errors
+- Check that `demo.launch()` is called
+- Review Space logs
+### API Errors
+- Verify `ANTHROPIC_API_KEY` is set correctly
+- Check API key has proper permissions
+- Monitor API usage and rate limits
+### Visualization Issues
+- Test Mermaid syntax at https://mermaid.live/
+- Ensure proper code block formatting
+- Check browser console for rendering errors
+## Support
+- **Hugging Face Docs**: https://huggingface.co/docs/hub/spaces
+- **Gradio Docs**: https://gradio.app/docs
+- **Community Forum**: https://discuss.huggingface.co/
+## Next Steps
+1. ✅ Deploy to Hugging Face Spaces
+2. 🔧 Integrate your agent backend
+3. 🧪 Test with real metadata
+4. 🎨 Customize UI/UX
+5. 📊 Add analytics
+6. 🚀 Share with community
+---
+**Ready to deploy?** Start with Method 1 (Web Interface) - it's the easiest!

LOCAL_SETUP.md ADDED Viewed

	@@ -0,0 +1,474 @@

+# Local Setup Guide - Lineage Graph Extractor
+This guide provides detailed instructions for setting up and running the Lineage Graph Extractor agent locally.
+## Table of Contents
+1. [System Requirements](#system-requirements)
+2. [Installation Methods](#installation-methods)
+3. [Configuration](#configuration)
+4. [Usage Scenarios](#usage-scenarios)
+5. [Advanced Configuration](#advanced-configuration)
+6. [Troubleshooting](#troubleshooting)
+## System Requirements
+### Minimum Requirements
+- **OS**: Windows 10+, macOS 10.15+, or Linux
+- **Python**: 3.9 or higher
+- **Memory**: 2GB RAM minimum
+- **Disk Space**: 100MB for agent files
+### Recommended Requirements
+- **Python**: 3.10+
+- **Memory**: 4GB RAM
+- **Internet**: Stable connection for API calls
+## Installation Methods
+### Method 1: Standalone Use (Recommended)
+This method uses the agent configuration files with any platform that supports the Anthropic API.
+1. **Download the agent**
+   ```bash
+   # If you have a git repository
+   git clone <repository-url>
+   cd local_clone
+   # Or extract from downloaded archive
+   unzip lineage-graph-extractor.zip
+   cd lineage-graph-extractor
+   ```
+2. **Set up environment**
+   ```bash
+   # Copy environment template
+   cp .env.example .env
+   ```
+3. **Edit .env file**
+   ```bash
+   # Edit with your preferred editor
+   nano .env
+   # or
+   vim .env
+   # or
+   code .env  # VS Code
+   ```
+   Add your credentials:
+   ```
+   ANTHROPIC_API_KEY=sk-ant-your-key-here
+   GOOGLE_CLOUD_PROJECT=your-gcp-project
+   GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
+   ```
+4. **Install Python dependencies** (optional, for examples)
+   ```bash
+   pip install anthropic google-cloud-bigquery requests pyyaml
+   ```
+### Method 2: Claude Desktop Integration
+If you're using Claude Desktop or similar platforms:
+1. **Locate your agent configuration directory**
+   - Claude Desktop: `~/.config/claude/agents/` (Linux/Mac) or `%APPDATA%\claude\agents\` (Windows)
+   - Other platforms: Check platform documentation
+2. **Copy the memories folder**
+   ```bash
+   # Linux/Mac
+   cp -r memories ~/.config/claude/agents/lineage-extractor/
+   # Windows
+   xcopy /E /I memories %APPDATA%\claude\agents\lineage-extractor\
+   ```
+3. **Configure API credentials** in your platform's settings
+4. **Restart the application**
+### Method 3: Python Integration
+To integrate into your own Python application:
+1. **Install dependencies**
+   ```bash
+   pip install anthropic python-dotenv
+   ```
+2. **Use the integration example**
+   ```python
+   from anthropic import Anthropic
+   from dotenv import load_dotenv
+   import os
+   # Load environment variables
+   load_dotenv()
+   # Initialize client
+   client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+   # Load agent configuration
+   with open("memories/agent.md", "r") as f:
+       system_prompt = f.read()
+   # Use the agent
+   response = client.messages.create(
+       model="claude-3-5-sonnet-20241022",
+       max_tokens=4000,
+       system=system_prompt,
+       messages=[{
+           "role": "user",
+           "content": "Extract lineage from this metadata: ..."
+       }]
+   )
+   print(response.content[0].text)
+   ```
+## Configuration
+### API Keys Setup
+#### Anthropic API Key
+1. Go to https://console.anthropic.com/
+2. Create an account or sign in
+3. Navigate to API Keys
+4. Create a new key
+5. Copy to `.env` file
+#### Google Cloud (for BigQuery)
+1. Go to https://console.cloud.google.com/
+2. Create a project or select existing
+3. Enable BigQuery API
+4. Create a service account:
+   - Go to IAM & Admin → Service Accounts
+   - Create service account
+   - Grant "BigQuery Data Viewer" role
+   - Create JSON key
+5. Download JSON and reference in `.env`
+#### Tavily (for web search)
+1. Go to https://tavily.com/
+2. Sign up for an account
+3. Get your API key
+4. Add to `.env` file
+### Tool Configuration
+Edit `memories/tools.json` to customize available tools:
+```json
+{
+  "tools": [
+    "bigquery_execute_query",      // Query BigQuery
+    "read_url_content",             // Fetch from URLs
+    "google_sheets_read_range",     // Read Google Sheets
+    "tavily_web_search"             // Web search
+  ],
+  "interrupt_config": {
+    "bigquery_execute_query": false,
+    "read_url_content": false,
+    "google_sheets_read_range": false,
+    "tavily_web_search": false
+  }
+}
+```
+**Available Tools:**
+- `bigquery_execute_query`: Execute SQL queries on BigQuery
+- `read_url_content`: Fetch content from URLs/APIs
+- `google_sheets_read_range`: Read data from Google Sheets
+- `tavily_web_search`: Perform web searches
+### Subagent Configuration
+Customize subagents by editing their configuration files:
+**Metadata Parser** (`memories/subagents/metadata_parser/`)
+- `agent.md`: Instructions for parsing metadata
+- `tools.json`: Tools available to parser
+**Graph Visualizer** (`memories/subagents/graph_visualizer/`)
+- `agent.md`: Instructions for creating visualizations
+- `tools.json`: Tools available to visualizer
+## Usage Scenarios
+### Scenario 1: BigQuery Lineage Extraction
+```python
+from anthropic import Anthropic
+import os
+client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+with open("memories/agent.md", "r") as f:
+    system_prompt = f.read()
+response = client.messages.create(
+    model="claude-3-5-sonnet-20241022",
+    max_tokens=4000,
+    system=system_prompt,
+    messages=[{
+        "role": "user",
+        "content": "Extract lineage from BigQuery project: my-project, dataset: analytics"
+    }]
+)
+print(response.content[0].text)
+```
+### Scenario 2: File-Based Metadata
+```python
+# Read metadata from file
+with open("dbt_manifest.json", "r") as f:
+    metadata = f.read()
+response = client.messages.create(
+    model="claude-3-5-sonnet-20241022",
+    max_tokens=4000,
+    system=system_prompt,
+    messages=[{
+        "role": "user",
+        "content": f"Extract lineage from this dbt manifest:\n\n{metadata}"
+    }]
+)
+```
+### Scenario 3: API Metadata
+```python
+response = client.messages.create(
+    model="claude-3-5-sonnet-20241022",
+    max_tokens=4000,
+    system=system_prompt,
+    messages=[{
+        "role": "user",
+        "content": "Extract lineage from API: https://api.example.com/metadata"
+    }]
+)
+```
+## Advanced Configuration
+### Custom Visualization Formats
+To add custom visualization formats, edit `memories/subagents/graph_visualizer/agent.md`:
+```markdown
+### 4. Custom Format
+Generate a custom format with:
+- Your specific requirements
+- Custom styling rules
+- Special formatting needs
+```
+### Adding New Metadata Sources
+To support new metadata sources:
+1. Add tool to `memories/tools.json`
+2. Update `memories/agent.md` with source-specific instructions
+3. Update `memories/subagents/metadata_parser/agent.md` if needed
+### MCP Integration
+To integrate with Model Context Protocol servers:
+1. Check if MCP tools are available: `/tools` directory
+2. Add MCP tools to `memories/tools.json`
+3. Configure MCP server connection
+4. See `memories/mcp_integration.md` (if available)
+## Troubleshooting
+### Common Issues
+#### 1. Authentication Errors
+**Problem**: API authentication fails
+**Solutions**:
+- Verify API key is correct in `.env`
+- Check key hasn't expired
+- Ensure environment variables are loaded
+- Try regenerating the API key
+```bash
+# Test Anthropic API key
+python -c "from anthropic import Anthropic; import os; from dotenv import load_dotenv; load_dotenv(); client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY')); print('✓ API key works')"
+```
+#### 2. BigQuery Access Issues
+**Problem**: Cannot access BigQuery
+**Solutions**:
+- Verify service account has BigQuery permissions
+- Check project ID is correct
+- Ensure JSON key file path is correct
+- Test credentials:
+```bash
+# Test BigQuery access
+gcloud auth activate-service-account --key-file=/path/to/key.json
+bq ls --project_id=your-project-id
+```
+#### 3. Import Errors
+**Problem**: `ModuleNotFoundError`
+**Solutions**:
+```bash
+# Install missing packages
+pip install anthropic google-cloud-bigquery requests pyyaml python-dotenv
+# Or install all at once
+pip install -r requirements.txt  # if you create one
+```
+#### 4. Environment Variables Not Loading
+**Problem**: `.env` file not being read
+**Solutions**:
+```python
+# Explicitly load .env
+from dotenv import load_dotenv
+load_dotenv()
+# Or specify path
+load_dotenv(".env")
+# Verify loading
+import os
+print(os.getenv("ANTHROPIC_API_KEY"))  # Should not be None
+```
+#### 5. File Path Issues
+**Problem**: Cannot find `memories/agent.md`
+**Solutions**:
+```python
+# Use absolute path
+import os
+base_dir = os.path.dirname(os.path.abspath(__file__))
+agent_path = os.path.join(base_dir, "memories", "agent.md")
+# Or change working directory
+os.chdir("/path/to/local_clone")
+```
+### Performance Issues
+#### Slow Response Times
+**Causes**:
+- Large metadata files
+- Complex lineage graphs
+- Network latency
+**Solutions**:
+- Break large metadata into chunks
+- Use filtering to focus on specific entities
+- Increase API timeout settings
+- Cache frequently used results
+### Debugging Tips
+1. **Enable verbose logging**
+   ```python
+   import logging
+   logging.basicConfig(level=logging.DEBUG)
+   ```
+2. **Test each component separately**
+   - Test API connection first
+   - Test metadata retrieval
+   - Test parsing separately
+   - Test visualization separately
+3. **Validate metadata format**
+   - Ensure JSON is valid
+   - Check for required fields
+   - Verify structure matches expected format
+4. **Check agent configuration**
+   - Verify `memories/agent.md` is readable
+   - Check `tools.json` syntax
+   - Ensure subagent files exist
+## Getting Help
+### Documentation
+- Agent instructions: `memories/agent.md`
+- Subagent docs: `memories/subagents/*/agent.md`
+- Anthropic API: https://docs.anthropic.com/
+### Testing Your Setup
+Run this complete test:
+```python
+from anthropic import Anthropic
+from dotenv import load_dotenv
+import os
+# Load environment
+load_dotenv()
+# Test 1: API Connection
+try:
+    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+    print("✓ Anthropic API connection successful")
+except Exception as e:
+    print(f"✗ API connection failed: {e}")
+    exit(1)
+# Test 2: Load Agent Config
+try:
+    with open("memories/agent.md", "r") as f:
+        system_prompt = f.read()
+    print("✓ Agent configuration loaded")
+except Exception as e:
+    print(f"✗ Failed to load agent config: {e}")
+    exit(1)
+# Test 3: Simple Query
+try:
+    response = client.messages.create(
+        model="claude-3-5-sonnet-20241022",
+        max_tokens=1000,
+        system=system_prompt,
+        messages=[{
+            "role": "user",
+            "content": "Hello, what can you help me with?"
+        }]
+    )
+    print("✓ Agent response successful")
+    print(f"\nAgent says: {response.content[0].text}")
+except Exception as e:
+    print(f"✗ Agent query failed: {e}")
+    exit(1)
+print("\n✓ All tests passed! Your setup is ready.")
+```
+Save as `test_setup.py` and run:
+```bash
+python test_setup.py
+```
+## Next Steps
+1. ✅ Complete setup
+2. ✅ Test with sample metadata
+3. 📊 Extract your first lineage
+4. 🎨 Customize visualization preferences
+5. 🔧 Integrate with your workflow
+---
+**Setup complete?** Try the usage examples in README.md or run your own lineage extraction!

SETUP_GUIDE.md ADDED Viewed

	@@ -0,0 +1,225 @@

+# Setup Guide for Lineage Graph Extractor Space
+This guide will help you deploy the Lineage Graph Extractor as a Hugging Face Space.
+## Prerequisites
+1. A Hugging Face account (create one at https://huggingface.co/join)
+2. API credentials for the services you want to integrate:
+   - Anthropic API key (for Claude AI)
+   - Google Cloud credentials (for BigQuery, optional)
+   - Other service credentials as needed
+## Step 1: Create a New Space
+1. Go to https://huggingface.co/spaces
+2. Click "Create new Space"
+3. Fill in the details:
+   - **Name**: `lineage-graph-extractor` (or your preferred name)
+   - **License**: MIT (or your choice)
+   - **SDK**: Gradio
+   - **Hardware**: CPU Basic (free tier) or upgrade for better performance
+   - **Visibility**: Public or Private (your choice)
+## Step 2: Upload Files
+You need to upload these files to your Space repository:
+### Required Files
+- `app.py` - Main application file
+- `requirements.txt` - Python dependencies
+- `README.md` - Space description and documentation
+### Optional Files
+- `.env.example` - Example environment variables
+- `SETUP_GUIDE.md` - This setup guide
+### Upload Methods
+**Option A: Web Interface**
+1. Click "Files and versions" in your Space
+2. Click "Add file" → "Upload files"
+3. Upload all the files from `/hf_space/` directory
+**Option B: Git**
+```bash
+# Clone your Space repository
+git clone https://huggingface.co/spaces/YOUR_USERNAME/lineage-graph-extractor
+cd lineage-graph-extractor
+# Copy files
+cp /path/to/hf_space/* .
+# Commit and push
+git add .
+git commit -m "Initial commit: Lineage Graph Extractor"
+git push
+```
+## Step 3: Configure Secrets
+For security, store sensitive credentials as Space secrets:
+1. Go to your Space settings
+2. Click "Repository secrets"
+3. Add the following secrets:
+### Required Secrets
+- `ANTHROPIC_API_KEY`: Your Claude API key from https://console.anthropic.com/
+### Optional Secrets (based on features you need)
+- `GOOGLE_CLOUD_PROJECT`: Your GCP project ID
+- `GOOGLE_APPLICATION_CREDENTIALS_JSON`: Service account JSON (as a string)
+- `MCP_SERVER_URL`: MCP server endpoint (if using MCP)
+- `MCP_API_KEY`: MCP authentication key
+### Accessing Secrets in Code
+Update `app.py` to read from environment variables:
+```python
+import os
+ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY")
+GOOGLE_CLOUD_PROJECT = os.environ.get("GOOGLE_CLOUD_PROJECT")
+```
+## Step 4: Integrate the Agent Backend
+The current `app.py` is a template. You need to connect it to your actual agent:
+### Option A: Use Anthropic SDK
+```python
+import anthropic
+client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
+def extract_lineage_from_text(metadata_text, source_type, viz_format):
+    # Call your agent with metadata_parser and graph_visualizer workers
+    response = client.messages.create(
+        model="claude-3-5-sonnet-20241022",
+        max_tokens=4000,
+        messages=[{
+            "role": "user",
+            "content": f"Extract lineage from this {source_type} metadata and visualize as {viz_format}: {metadata_text}"
+        }]
+    )
+    return response.content[0].text, "Processed successfully"
+```
+### Option B: Use Agent API Endpoint
+If you have your agent deployed as an API:
+```python
+import requests
+def extract_lineage_from_text(metadata_text, source_type, viz_format):
+    response = requests.post(
+        "https://your-agent-api.com/extract",
+        json={
+            "metadata": metadata_text,
+            "source_type": source_type,
+            "format": viz_format
+        }
+    )
+    return response.json()["visualization"], response.json()["summary"]
+```
+### Option C: Bundle Agent Files
+Include your agent configuration directly in the Space:
+1. Copy `/memories/` directory to Space
+2. Copy `/subagents/` if needed
+3. Import and use agent logic in `app.py`
+## Step 5: Test Your Space
+1. Once deployed, Hugging Face will automatically build and run your Space
+2. Check the "Logs" tab for any errors
+3. Test each feature:
+   - Text/File metadata extraction
+   - BigQuery integration (if configured)
+   - URL/API fetching
+## Step 6: Customize and Enhance
+### Add Authentication
+For production use, add authentication:
+```python
+demo.launch(auth=("username", "password"))
+```
+Or integrate with Hugging Face authentication:
+```python
+demo.launch(auth_required=True)
+```
+### Improve Error Handling
+Add try-catch blocks and user-friendly error messages:
+```python
+try:
+    result = extract_lineage_from_text(metadata_text, source_type, viz_format)
+    return result
+except Exception as e:
+    return "", f"Error: {str(e)}"
+```
+### Add More Features
+- File upload support
+- Export visualizations as images
+- History/session management
+- Batch processing
+## Troubleshooting
+### Space won't start
+- Check logs for error messages
+- Verify all dependencies in `requirements.txt`
+- Ensure Python version compatibility
+### API errors
+- Verify secrets are correctly set
+- Check API key validity and permissions
+- Review rate limits
+### Slow performance
+- Upgrade to better hardware (CPU or GPU)
+- Optimize metadata parsing logic
+- Add caching for repeated queries
+## Security Best Practices
+1. **Never commit API keys** to the repository
+2. **Use Space secrets** for all credentials
+3. **Validate user input** to prevent injection attacks
+4. **Use read-only credentials** when possible
+5. **Add rate limiting** to prevent abuse
+6. **Enable authentication** for production use
+## Getting Help
+- Hugging Face Spaces docs: https://huggingface.co/docs/hub/spaces
+- Gradio documentation: https://gradio.app/docs
+- Anthropic API docs: https://docs.anthropic.com/
+## Next Steps
+1. Test the Space thoroughly
+2. Share with your team or community
+3. Collect feedback and iterate
+4. Consider upgrading hardware for production workloads
+5. Add analytics to track usage
+---
+**Need help?** Check the Hugging Face community forums or reach out to support.

app.py ADDED Viewed

	@@ -0,0 +1,228 @@

+"""
+Lineage Graph Extractor - Hugging Face Space
+A Gradio-based web interface for extracting and visualizing data lineage from various sources.
+"""
+import gradio as gr
+import json
+import os
+from typing import Optional, Tuple
+# Note: This is a template. You'll need to integrate with your actual agent backend.
+# This could be through an API, Claude SDK, or other agent framework.
+def extract_lineage_from_text(
+    metadata_text: str,
+    source_type: str,
+    visualization_format: str
+) -> Tuple[str, str]:
+    """
+    Extract lineage from provided metadata text.
+    Args:
+        metadata_text: Raw metadata content
+        source_type: Type of metadata source (BigQuery, dbt, Airflow, etc.)
+        visualization_format: Desired output format (Mermaid, DOT, Text)
+    Returns:
+        Tuple of (visualization_code, summary_text)
+    """
+    # TODO: Integrate with your agent backend
+    # This is where you'd call your agent with the metadata_parser and graph_visualizer workers
+    return (
+        "graph TD\n    A[Sample Node] --> B[Output Node]",
+        f"Processed {source_type} metadata. Found X nodes and Y relationships."
+    )
+def extract_lineage_from_bigquery(
+    project_id: str,
+    query: str,
+    api_key: str,
+    visualization_format: str
+) -> Tuple[str, str]:
+    """
+    Extract lineage from BigQuery.
+    Args:
+        project_id: Google Cloud project ID
+        query: SQL query to extract metadata
+        api_key: API credentials
+        visualization_format: Desired output format
+    Returns:
+        Tuple of (visualization_code, summary_text)
+    """
+    # TODO: Integrate with BigQuery and your agent backend
+    return (
+        "graph TD\n    A[BigQuery Table] --> B[Destination Table]",
+        f"Extracted lineage from BigQuery project: {project_id}"
+    )
+def extract_lineage_from_url(
+    url: str,
+    visualization_format: str
+) -> Tuple[str, str]:
+    """
+    Extract lineage from URL/API endpoint.
+    Args:
+        url: URL to fetch metadata from
+        visualization_format: Desired output format
+    Returns:
+        Tuple of (visualization_code, summary_text)
+    """
+    # TODO: Integrate with URL fetching and your agent backend
+    return (
+        "graph TD\n    A[API Source] --> B[Data Pipeline]",
+        f"Extracted lineage from URL: {url}"
+    )
+# Create Gradio interface
+with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("""
+    # 🔍 Lineage Graph Extractor
+    Extract and visualize data lineage from various metadata sources including BigQuery, dbt, Airflow,
+    APIs, and more. This tool helps you understand complex data relationships through clear graph visualizations.
+    ## Supported Sources
+    - **Text/File Metadata**: Paste metadata directly
+    - **BigQuery**: Query table metadata and relationships
+    - **URLs/APIs**: Fetch metadata from web endpoints
+    - **dbt, Airflow, Snowflake**: Through MCP integration (when configured)
+    """)
+    with gr.Tabs():
+        # Tab 1: Text/File Input
+        with gr.Tab("Text/File Metadata"):
+            with gr.Row():
+                with gr.Column():
+                    metadata_input = gr.Textbox(
+                        label="Metadata Content",
+                        placeholder="Paste your metadata here (JSON, YAML, SQL, etc.)",
+                        lines=15
+                    )
+                    source_type_text = gr.Dropdown(
+                        choices=["dbt Manifest", "Airflow DAG", "SQL DDL", "Custom JSON", "Other"],
+                        label="Source Type",
+                        value="Custom JSON"
+                    )
+                    viz_format_text = gr.Dropdown(
+                        choices=["Mermaid", "DOT/Graphviz", "Text", "All"],
+                        label="Visualization Format",
+                        value="Mermaid"
+                    )
+                    extract_btn_text = gr.Button("Extract Lineage", variant="primary")
+                with gr.Column():
+                    output_viz_text = gr.Code(
+                        label="Lineage Visualization",
+                        language="mermaid"
+                    )
+                    output_summary_text = gr.Textbox(
+                        label="Summary",
+                        lines=5
+                    )
+            extract_btn_text.click(
+                fn=extract_lineage_from_text,
+                inputs=[metadata_input, source_type_text, viz_format_text],
+                outputs=[output_viz_text, output_summary_text]
+            )
+        # Tab 2: BigQuery
+        with gr.Tab("BigQuery"):
+            with gr.Row():
+                with gr.Column():
+                    bq_project = gr.Textbox(
+                        label="Project ID",
+                        placeholder="your-gcp-project-id"
+                    )
+                    bq_query = gr.Textbox(
+                        label="Metadata Query",
+                        placeholder="SELECT * FROM `project.dataset.INFORMATION_SCHEMA.TABLES`",
+                        lines=8
+                    )
+                    bq_api_key = gr.Textbox(
+                        label="API Key / Credentials",
+                        placeholder="Enter your credentials",
+                        type="password"
+                    )
+                    viz_format_bq = gr.Dropdown(
+                        choices=["Mermaid", "DOT/Graphviz", "Text", "All"],
+                        label="Visualization Format",
+                        value="Mermaid"
+                    )
+                    extract_btn_bq = gr.Button("Extract Lineage", variant="primary")
+                with gr.Column():
+                    output_viz_bq = gr.Code(
+                        label="Lineage Visualization",
+                        language="mermaid"
+                    )
+                    output_summary_bq = gr.Textbox(
+                        label="Summary",
+                        lines=5
+                    )
+            extract_btn_bq.click(
+                fn=extract_lineage_from_bigquery,
+                inputs=[bq_project, bq_query, bq_api_key, viz_format_bq],
+                outputs=[output_viz_bq, output_summary_bq]
+            )
+        # Tab 3: URL/API
+        with gr.Tab("URL/API"):
+            with gr.Row():
+                with gr.Column():
+                    url_input = gr.Textbox(
+                        label="URL",
+                        placeholder="https://api.example.com/metadata"
+                    )
+                    viz_format_url = gr.Dropdown(
+                        choices=["Mermaid", "DOT/Graphviz", "Text", "All"],
+                        label="Visualization Format",
+                        value="Mermaid"
+                    )
+                    extract_btn_url = gr.Button("Extract Lineage", variant="primary")
+                with gr.Column():
+                    output_viz_url = gr.Code(
+                        label="Lineage Visualization",
+                        language="mermaid"
+                    )
+                    output_summary_url = gr.Textbox(
+                        label="Summary",
+                        lines=5
+                    )
+            extract_btn_url.click(
+                fn=extract_lineage_from_url,
+                inputs=[url_input, viz_format_url],
+                outputs=[output_viz_url, output_summary_url]
+            )
+    gr.Markdown("""
+    ---
+    ## About
+    This tool uses AI-powered metadata parsing to extract lineage relationships and generate clear visualizations.
+    ### Features
+    - Multi-source metadata support
+    - Automatic relationship detection
+    - Multiple visualization formats
+    - MCP (Model Context Protocol) integration support
+    ### Note
+    To use BigQuery or other cloud services, you'll need to configure appropriate API credentials.
+    For MCP integration with dbt, Airflow, Snowflake, etc., additional setup is required.
+    """)
+# Launch the app
+if __name__ == "__main__":
+    demo.launch()

integration_example.py ADDED Viewed

	@@ -0,0 +1,149 @@

+#!/usr/bin/env python3
+"""
+Lineage Graph Extractor - Integration Example
+This script demonstrates how to use the Lineage Graph Extractor agent
+programmatically with the Anthropic API.
+Usage:
+    python integration_example.py
+"""
+import os
+from anthropic import Anthropic
+from dotenv import load_dotenv
+# Load environment variables from .env file
+load_dotenv()
+def load_agent_config():
+    """Load the agent configuration from memories/agent.md"""
+    config_path = os.path.join(os.path.dirname(__file__), "memories", "agent.md")
+    with open(config_path, "r") as f:
+        return f.read()
+def extract_lineage(client, system_prompt, user_message):
+    """
+    Send a lineage extraction request to the agent.
+    Args:
+        client: Anthropic client instance
+        system_prompt: Agent system prompt
+        user_message: User's lineage extraction request
+    Returns:
+        Agent's response text
+    """
+    response = client.messages.create(
+        model="claude-3-5-sonnet-20241022",
+        max_tokens=4000,
+        system=system_prompt,
+        messages=[{
+            "role": "user",
+            "content": user_message
+        }]
+    )
+    return response.content[0].text
+def main():
+    """Main function demonstrating agent usage"""
+    # Initialize Anthropic client
+    api_key = os.getenv("ANTHROPIC_API_KEY")
+    if not api_key:
+        print("Error: ANTHROPIC_API_KEY not found in environment variables.")
+        print("Please set it in your .env file.")
+        return
+    client = Anthropic(api_key=api_key)
+    # Load agent configuration
+    print("Loading agent configuration...")
+    system_prompt = load_agent_config()
+    print("✓ Agent configuration loaded\n")
+    # Example 1: Simple greeting to test agent
+    print("=" * 60)
+    print("Example 1: Testing agent connection")
+    print("=" * 60)
+    response = extract_lineage(
+        client,
+        system_prompt,
+        "Hello! What can you help me with?"
+    )
+    print(response)
+    print()
+    # Example 2: Extract lineage from sample metadata
+    print("=" * 60)
+    print("Example 2: Extract lineage from sample metadata")
+    print("=" * 60)
+    sample_metadata = """
+    {
+        "tables": [
+            {
+                "name": "raw_orders",
+                "type": "source",
+                "description": "Raw order data from API"
+            },
+            {
+                "name": "raw_customers",
+                "type": "source",
+                "description": "Raw customer data from database"
+            },
+            {
+                "name": "stg_orders",
+                "type": "staging",
+                "description": "Cleaned and standardized orders",
+                "depends_on": ["raw_orders"]
+            },
+            {
+                "name": "stg_customers",
+                "type": "staging",
+                "description": "Cleaned and standardized customers",
+                "depends_on": ["raw_customers"]
+            },
+            {
+                "name": "fct_orders",
+                "type": "fact",
+                "description": "Order facts with customer data",
+                "depends_on": ["stg_orders", "stg_customers"]
+            }
+        ]
+    }
+    """
+    response = extract_lineage(
+        client,
+        system_prompt,
+        f"Extract lineage from this metadata and create a Mermaid diagram:\n\n{sample_metadata}"
+    )
+    print(response)
+    print()
+    # Example 3: BigQuery extraction (requires credentials)
+    if os.getenv("GOOGLE_CLOUD_PROJECT"):
+        print("=" * 60)
+        print("Example 3: BigQuery lineage extraction")
+        print("=" * 60)
+        project_id = os.getenv("GOOGLE_CLOUD_PROJECT")
+        response = extract_lineage(
+            client,
+            system_prompt,
+            f"Extract lineage from BigQuery project: {project_id}, dataset: analytics"
+        )
+        print(response)
+    else:
+        print("Skipping BigQuery example (GOOGLE_CLOUD_PROJECT not set)")
+    print("\n" + "=" * 60)
+    print("Examples complete!")
+    print("=" * 60)
+if __name__ == "__main__":
+    main()

main.py ADDED Viewed

	@@ -0,0 +1,6 @@

+def main():
+    print("Hello from lineage-graph-accelerator!")
+if __name__ == "__main__":
+    main()

memories/agent.md ADDED Viewed

	@@ -0,0 +1,162 @@

+# Lineage Graph Extractor Agent
+You are an expert agent specializing in extracting data lineage, pipeline dependencies, and database relationships from metadata sources and visualizing them as graphs.
+## Your Goal
+Help users understand complex data relationships by:
+1. Extracting lineage information from various metadata sources
+2. Identifying entities (tables, pipelines, datasets, code modules) and their relationships
+3. Creating clear, visual graph representations of these relationships
+## Supported Metadata Sources
+You can extract lineage from:
+- **BigQuery**: Execute queries against BigQuery to extract table metadata, schema information, and query histories
+- **URLs/APIs**: Fetch metadata from web endpoints and APIs
+- **Google Sheets**: Read metadata stored in spreadsheet format
+- **Files**: Process metadata that users upload or provide in the chat
+- **MCP Servers**: Connect to Model Context Protocol (MCP) servers that expose metadata and lineage information
+### MCP Integration
+This agent supports Model Context Protocol (MCP) integration, which allows you to:
+- Connect to external MCP servers that expose metadata sources
+- Leverage MCP tools provided by data catalog systems (e.g., dbt, Airflow, Snowflake)
+- Automatically discover and extract lineage from MCP-enabled platforms
+When working with MCP:
+1. **MCP Server Discovery**: Check if the user has MCP servers configured that can provide metadata
+2. **Tool Usage**: Use MCP-exposed tools to query metadata from connected systems
+3. **Standardized Access**: MCP provides a standardized way to access diverse metadata sources
+## Lineage Types You Handle
+- **Data pipeline/ETL lineage**: Track data transformations and pipeline flows
+- **Database table lineage**: Map table dependencies and relationships
+- **Code/dependency lineage**: Identify code module dependencies and call graphs
+## Your Workflow
+### Step 1: Gather Metadata
+When a user asks you to extract lineage:
+1. **Identify the source**: Determine where the metadata is located
+   - If BigQuery: Ask for project ID and table/dataset names, then execute queries
+   - If URL/API: Get the URL and fetch the content
+   - If Google Sheets: Get the spreadsheet ID and range
+   - If file content: The user will provide it directly
+   - If MCP Server: Use MCP tools to query the connected server for metadata
+2. **Retrieve the metadata**: Use the appropriate tools to access the metadata
+### Step 2: Parse and Extract Lineage
+Once you have the metadata, call the **metadata_parser** worker:
+- Provide the raw metadata content to the worker
+- The worker will analyze it and extract structured lineage information
+- It will return nodes (entities with name, description, type, owner) and edges (relationships)
+### Step 3: Visualize the Graph
+After receiving the structured lineage data, call the **graph_visualizer** worker:
+- Pass the nodes and edges to the worker
+- Specify the visualization format(s) the user wants:
+  - **Mermaid diagram**: Text-based diagram syntax (default)
+  - **DOT/Graphviz**: DOT format for Graphviz rendering
+  - **Text description**: Hierarchical text description
+  - **All formats**: Generate all three formats
+### Step 4: Present Results
+Display the graph visualization(s) to the user in the chat with:
+- Clear formatting for code blocks (use ```mermaid or ```dot syntax)
+- A summary of what was extracted (number of entities, types found, key relationships)
+- Suggestions for next steps or refinements if needed
+## Handling Complex Scenarios
+### Multiple Metadata Sources
+If the user provides metadata from multiple sources (e.g., BigQuery + files):
+1. Gather metadata from each source
+2. Call the metadata_parser worker ONCE for each distinct source
+3. Merge the results before visualization
+4. Send the combined lineage to the graph_visualizer worker
+### Large or Complex Graphs
+If the lineage graph is very large or complex:
+- Offer to filter by entity type, owner, or specific subtrees
+- Suggest breaking it into multiple focused views
+- Provide a high-level overview first, then detailed views on request
+### Ambiguous Metadata
+If metadata format is unclear or ambiguous:
+- Make reasonable inferences based on common patterns
+- Note any assumptions made
+- Ask the user for clarification if critical information is missing
+## Response Style
+- **Be clear and concise**: Explain what you're doing at each step
+- **Be proactive**: If you see opportunities to provide additional insights (cycles, orphaned nodes, etc.), mention them
+- **Be visual**: Always provide graph visualizations, not just descriptions
+- **Be helpful**: Suggest ways to refine or explore the lineage further
+- **Be MCP-aware**: When users mention platforms like dbt, Airflow, Snowflake, etc., proactively check for MCP tools
+  - Use `ls /tools | grep -i <platform>` to search for relevant tools
+  - If found, integrate them immediately
+  - If not found, use alternative methods and inform the user
+## Important Notes
+- Always use the workers (metadata_parser and graph_visualizer) for their specialized tasks
+- Call metadata_parser once per distinct metadata source or content block
+- Generate visualizations in the format(s) the user prefers
+- For recurring lineage extraction needs, users can set up automatic triggers externally
+- **MCP Integration**: See `/memories/mcp_integration.md` for detailed MCP server integration guidance
+  - When MCP tools become available, check `/tools` directory and add them to your configuration
+  - MCP enables standardized access to metadata from dbt, Airflow, Snowflake, and other platforms
+  - Combine MCP sources with BigQuery, APIs, and files for comprehensive lineage extraction
+## Example Interaction Flow
+### Standard BigQuery Workflow
+1. User: "Extract lineage from my BigQuery project"
+2. You: Ask for project ID and specific tables/datasets
+3. You: Execute BigQuery queries to retrieve metadata
+4. You: Call metadata_parser worker with the query results
+5. You: Call graph_visualizer worker with the structured lineage
+6. You: Display the Mermaid diagram and summary to the user
+### MCP-Enhanced Workflow (when MCP tools are available)
+1. User: "Extract lineage from my dbt project"
+2. You: Check if dbt MCP tools are available in your tool configuration
+3. You: Use MCP tools to query dbt manifest and model metadata
+4. You: Call metadata_parser worker with the dbt metadata
+5. You: Call graph_visualizer worker with the structured lineage
+6. You: Display the dbt DAG visualization to the user
+## Checking for New MCP Tools
+When a user asks to integrate with a system (dbt, Airflow, Snowflake, etc.):
+1. **Search the tools directory**: Use `ls /tools` or `grep` to check for relevant MCP tools
+2. **If found**:
+   - Read the tool documentation to understand usage
+   - Add the tool to `/memories/tools.json`
+   - Use the tool immediately for the user's request
+3. **If not found**:
+   - Use alternative methods (API calls, file uploads, etc.)
+   - Inform the user that direct MCP integration isn't yet available
+   - Suggest they check `/memories/mcp_integration.md` for future MCP setup
+## MCP Tool Naming Patterns
+When searching for MCP tools, look for patterns like:
+- `mcp_*`: Generic MCP tools
+- `dbt_*`, `airflow_*`, `snowflake_*`: Platform-specific tools
+- `*_metadata`, `*_lineage`, `*_schema`: Metadata extraction tools
+- `datahub_*`, `openmetadata_*`: Data catalog tools

memories/graph_visualizer/agent.md ADDED Viewed

	@@ -0,0 +1,84 @@

+---
+Description: Converts structured lineage data into graph visualizations. Use this worker after lineage relationships have been extracted and structured. It takes nodes and edges as input and generates visual representations in multiple formats (Mermaid diagrams, DOT/Graphviz, or descriptive text). Returns formatted graph code ready to display.
+---
+# Graph Visualizer Worker
+You are a specialized worker that creates graph visualizations from structured lineage data.
+## Your Task
+When given structured lineage information (nodes and edges), you must generate graph visualizations in the requested format(s).
+## Input Format
+You will receive:
+```json
+{
+  "nodes": [
+    {
+      "id": "unique_identifier",
+      "name": "entity_name",
+      "description": "entity_description",
+      "type": "entity_type",
+      "owner": "owner_name"
+    }
+  ],
+  "edges": [
+    {
+      "source": "source_node_id",
+      "target": "target_node_id",
+      "relationship_type": "relationship_description"
+    }
+  ],
+  "format": "mermaid|dot|description|all"
+}
+```
+## Output Formats
+### 1. Mermaid Diagram
+Generate a Mermaid flowchart with:
+- Clear node labels including name and type
+- Directional arrows showing relationships
+- Proper Mermaid syntax
+Example:
+```mermaid
+graph LR
+    A[Table A<br/>Type: source] --> B[Pipeline X<br/>Type: transformation]
+    B --> C[Table C<br/>Type: target]
+```
+### 2. DOT/Graphviz Format
+Generate DOT notation with:
+- Node attributes (label, shape, color based on type)
+- Edge labels for relationship types
+- Proper DOT syntax
+Example:
+```dot
+digraph lineage {
+    rankdir=LR;
+    node [shape=box];
+    "table_a" [label="Table A\nOwner: team1", shape=cylinder];
+    "pipeline_x" [label="Pipeline X", shape=box];
+    "table_a" -> "pipeline_x" [label="feeds_into"];
+}
+```
+### 3. Text Description
+Provide a clear hierarchical description of the lineage with:
+- Entities grouped by type
+- Relationships clearly stated
+- Easy-to-read formatting
+## Guidelines
+- Use appropriate visual styling based on node types (different shapes/colors)
+- Ensure graph flows logically (typically left-to-right or top-to-bottom)
+- Include legends when helpful
+- Keep visualizations readable (break into multiple graphs if too complex)
+- For large graphs, suggest grouping or filtering options

memories/subagents/agent.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+Description: Parses metadata from various sources (BigQuery, files, URLs) to extract lineage relationships. Use this worker when you need to process raw metadata and identify parent-child relationships, dependencies, and data flow connections. It expects metadata content as input and returns structured lineage information including nodes (name, description, type, owner) and edges (relationships between entities).
+---
+# Metadata Parser Worker
+You are a specialized worker that extracts lineage information from metadata sources.
+## Your Task
+When given metadata content from BigQuery, files, URLs, or other sources, you must:
+1. **Parse the metadata** to identify:
+   - Entities (tables, pipelines, datasets, code modules, etc.)
+   - Relationships between entities (dependencies, data flows, transformations)
+   - Entity attributes (name, description, type, owner)
+2. **Extract lineage relationships** by identifying:
+   - Parent-child relationships
+   - Data flow directions (upstream/downstream)
+   - Transformation dependencies
+   - Pipeline connections
+3. **Structure the output** as a list of:
+   - **Nodes**: Each entity with its attributes (name, description, type, owner)
+   - **Edges**: Relationships between nodes with direction and relationship type
+## Output Format
+Return your findings in this structured format:
+```json
+{
+  "nodes": [
+    {
+      "id": "unique_identifier",
+      "name": "entity_name",
+      "description": "entity_description",
+      "type": "table|pipeline|dataset|view|transformation|etc",
+      "owner": "owner_name"
+    }
+  ],
+  "edges": [
+    {
+      "source": "source_node_id",
+      "target": "target_node_id",
+      "relationship_type": "feeds_into|depends_on|transforms|etc"
+    }
+  ]
+}
+```
+## Guidelines
+- Be thorough in identifying all entities and relationships
+- Use consistent identifiers for nodes
+- Clearly indicate the direction of data flow in edges
+- If metadata format is ambiguous, make reasonable inferences and note assumptions
+- Handle multiple metadata formats (SQL schemas, JSON, YAML, CSV, etc.)

pyproject.toml ADDED Viewed

	@@ -0,0 +1,7 @@

+[project]
+name = "lineage-graph-accelerator"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = []

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+gradio>=4.0.0
+anthropic>=0.25.0
+google-cloud-bigquery>=3.10.0
+requests>=2.31.0
+pyyaml>=6.0

test_setup.py ADDED Viewed

	@@ -0,0 +1,221 @@

+#!/usr/bin/env python3
+"""
+Lineage Graph Extractor - Setup Test Script
+This script tests your local setup to ensure everything is configured correctly.
+Usage:
+    python test_setup.py
+"""
+import os
+import sys
+from pathlib import Path
+def test_python_version():
+    """Test Python version"""
+    print("Testing Python version...")
+    version = sys.version_info
+    if version.major >= 3 and version.minor >= 9:
+        print(f"✓ Python {version.major}.{version.minor}.{version.micro} (OK)")
+        return True
+    else:
+        print(f"✗ Python {version.major}.{version.minor}.{version.micro} (Need 3.9+)")
+        return False
+def test_dependencies():
+    """Test if required dependencies are installed"""
+    print("\nTesting dependencies...")
+    dependencies = {
+        "anthropic": "Anthropic API client",
+        "dotenv": "Environment variable loader (python-dotenv)"
+    }
+    all_installed = True
+    for module, description in dependencies.items():
+        try:
+            if module == "dotenv":
+                __import__("dotenv")
+            else:
+                __import__(module)
+            print(f"✓ {description}")
+        except ImportError:
+            print(f"✗ {description} (not installed)")
+            all_installed = False
+    if not all_installed:
+        print("\nInstall missing dependencies with:")
+        print("  pip install -r requirements.txt")
+    return all_installed
+def test_env_file():
+    """Test if .env file exists and has required variables"""
+    print("\nTesting environment configuration...")
+    if not Path(".env").exists():
+        print("✗ .env file not found")
+        print("  Copy .env.example to .env and add your API keys")
+        return False
+    print("✓ .env file exists")
+    # Try to load it
+    try:
+        from dotenv import load_dotenv
+        load_dotenv()
+        api_key = os.getenv("ANTHROPIC_API_KEY")
+        if not api_key or api_key == "your_anthropic_api_key_here":
+            print("✗ ANTHROPIC_API_KEY not set or still has default value")
+            print("  Edit .env and add your actual Anthropic API key")
+            return False
+        print("✓ ANTHROPIC_API_KEY is set")
+        return True
+    except Exception as e:
+        print(f"✗ Error loading .env: {e}")
+        return False
+def test_agent_files():
+    """Test if agent configuration files exist"""
+    print("\nTesting agent configuration files...")
+    required_files = [
+        "memories/agent.md",
+        "memories/tools.json",
+        "memories/subagents/metadata_parser/agent.md",
+        "memories/subagents/metadata_parser/tools.json",
+        "memories/subagents/graph_visualizer/agent.md",
+        "memories/subagents/graph_visualizer/tools.json"
+    ]
+    all_exist = True
+    for file_path in required_files:
+        if Path(file_path).exists():
+            print(f"✓ {file_path}")
+        else:
+            print(f"✗ {file_path} (missing)")
+            all_exist = False
+    return all_exist
+def test_api_connection():
+    """Test connection to Anthropic API"""
+    print("\nTesting Anthropic API connection...")
+    try:
+        from anthropic import Anthropic
+        from dotenv import load_dotenv
+        load_dotenv()
+        api_key = os.getenv("ANTHROPIC_API_KEY")
+        if not api_key:
+            print("✗ API key not found")
+            return False
+        client = Anthropic(api_key=api_key)
+        # Make a simple test request
+        response = client.messages.create(
+            model="claude-3-5-sonnet-20241022",
+            max_tokens=100,
+            messages=[{
+                "role": "user",
+                "content": "Hello"
+            }]
+        )
+        print("✓ API connection successful")
+        print(f"  Model: {response.model}")
+        print(f"  Response: {response.content[0].text[:50]}...")
+        return True
+    except Exception as e:
+        print(f"✗ API connection failed: {e}")
+        return False
+def test_agent_functionality():
+    """Test basic agent functionality"""
+    print("\nTesting agent functionality...")
+    try:
+        from anthropic import Anthropic
+        from dotenv import load_dotenv
+        load_dotenv()
+        client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+        # Load agent configuration
+        with open("memories/agent.md", "r") as f:
+            system_prompt = f.read()
+        print("✓ Agent configuration loaded")
+        # Test agent response
+        response = client.messages.create(
+            model="claude-3-5-sonnet-20241022",
+            max_tokens=500,
+            system=system_prompt,
+            messages=[{
+                "role": "user",
+                "content": "What types of metadata sources can you extract lineage from?"
+            }]
+        )
+        print("✓ Agent responds correctly")
+        print(f"  Response preview: {response.content[0].text[:100]}...")
+        return True
+    except Exception as e:
+        print(f"✗ Agent test failed: {e}")
+        return False
+def main():
+    """Run all tests"""
+    print("=" * 60)
+    print("Lineage Graph Extractor - Setup Test")
+    print("=" * 60)
+    results = {
+        "Python version": test_python_version(),
+        "Dependencies": test_dependencies(),
+        "Environment file": test_env_file(),
+        "Agent files": test_agent_files(),
+        "API connection": test_api_connection(),
+        "Agent functionality": test_agent_functionality()
+    }
+    print("\n" + "=" * 60)
+    print("Test Summary")
+    print("=" * 60)
+    for test_name, passed in results.items():
+        status = "✓ PASS" if passed else "✗ FAIL"
+        print(f"{test_name:.<40} {status}")
+    all_passed = all(results.values())
+    print("\n" + "=" * 60)
+    if all_passed:
+        print("✓ All tests passed! Your setup is ready.")
+        print("\nNext steps:")
+        print("  1. Try the integration example: python integration_example.py")
+        print("  2. Read the README.md for usage examples")
+        print("  3. Extract your first lineage!")
+    else:
+        print("✗ Some tests failed. Please fix the issues above.")
+        print("\nCommon fixes:")
+        print("  - Install dependencies: pip install -r requirements.txt")
+        print("  - Copy .env.example to .env and add your API key")
+        print("  - Verify all files are present")
+    print("=" * 60)
+    return 0 if all_passed else 1
+if __name__ == "__main__":
+    sys.exit(main())