Lineage-graph-accelerator

Running

aamanlamba Claude commited on 14 days ago

Commit

ffe0724

1 Parent(s): b304992

Add Google Gemini AI Assistant chatbot

Features:
- AI Assistant tab powered by Google Gemini (gemini-1.5-flash)
- Natural language lineage generation from pipeline descriptions
- Auto-extract JSON from AI responses
- One-click transfer of generated JSON to lineage tool
- Updated hero section and footer with AI Assistant info

Sponsor integration: Google Gemini API

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (3) hide show

README.md +1 -1
app.py +209 -0
requirements.txt +1 -0

README.md CHANGED Viewed

@@ -290,7 +290,7 @@ python test_setup.py
 ## 🔜 Roadmap
 - [x] Gradio 6 upgrade for enhanced UI components
-- [ ] Agentic chatbot for natural language queries
 - [x] Apache Atlas export support
 - [ ] File upload functionality
 - [x] Graph export as PNG/SVG

 ## 🔜 Roadmap
 - [x] Gradio 6 upgrade for enhanced UI components
+- [x] Agentic chatbot for natural language queries (Google Gemini)
 - [x] Apache Atlas export support
 - [ ] File upload functionality
 - [x] Graph export as PNG/SVG

app.py CHANGED Viewed

@@ -22,6 +22,13 @@ try:
 except ImportError:
     EXPORTERS_AVAILABLE = False
 # ============================================================================
 # Constants and Configuration
 # ============================================================================
@@ -773,6 +780,127 @@ def extract_lineage_from_url(
     return render_mermaid(viz), f"Lineage from URL: {url or 'not specified'}"
 # ============================================================================
 # Gradio UI
 # ============================================================================
@@ -797,6 +925,7 @@ with gr.Blocks(
     | **Visualize** | Generate interactive Mermaid diagrams with color-coded nodes and relationship labels |
     | **Export** | Export to enterprise data catalogs: OpenLineage, Collibra, Purview, Alation, Atlas |
     | **MCP Integration** | Connect to MCP servers for AI-powered metadata extraction |
     ### Quick Start
@@ -1046,6 +1175,85 @@ with gr.Blocks(
                     outputs=[demo_viz, demo_summary]
                 )
     # Footer
     gr.Markdown("""
     ---
@@ -1058,6 +1266,7 @@ with gr.Blocks(
     | **Collibra** | Collibra Data Intelligence | Enterprise data governance |
     | **Purview** | Microsoft Purview | Azure ecosystem |
     | **Alation** | Alation Data Catalog | Self-service analytics |
     ---

 except ImportError:
     EXPORTERS_AVAILABLE = False
+# Import Google Gemini for agentic chatbot
+try:
+    import google.generativeai as genai
+    GEMINI_AVAILABLE = True
+except ImportError:
+    GEMINI_AVAILABLE = False
 # ============================================================================
 # Constants and Configuration
 # ============================================================================
     return render_mermaid(viz), f"Lineage from URL: {url or 'not specified'}"
+# ============================================================================
+# Gemini Agentic Chatbot
+# ============================================================================
+LINEAGE_AGENT_PROMPT = """You are a Data Lineage Assistant powered by the Lineage Graph Accelerator tool.
+You help users understand, extract, and visualize data lineage from various sources.
+Your capabilities:
+1. **Extract Lineage**: Parse metadata from dbt manifests, Airflow DAGs, SQL DDL, and custom JSON
+2. **Explain Lineage**: Help users understand data flow and dependencies
+3. **Generate Metadata**: Create lineage JSON from natural language descriptions
+4. **Export Guidance**: Advise on exporting to data catalogs (OpenLineage, Collibra, Purview, Alation, Atlas)
+When users describe their data pipeline, generate valid JSON lineage in this format:
+```json
+{
+  "nodes": [
+    {"id": "unique_id", "type": "source|table|model|view|report", "name": "Display Name"}
+  ],
+  "edges": [
+    {"from": "source_id", "to": "target_id"}
+  ]
+}
+```
+Node types: source, table, model, view, report, dimension, fact, destination, task
+Be helpful, concise, and always offer to generate lineage JSON when users describe data flows.
+If the user provides metadata or describes a pipeline, generate the JSON they can paste into the tool."""
+def init_gemini(api_key: str) -> bool:
+    """Initialize Gemini with the provided API key."""
+    if not GEMINI_AVAILABLE:
+        return False
+    if not api_key:
+        return False
+    try:
+        genai.configure(api_key=api_key)
+        return True
+    except Exception:
+        return False
+def chat_with_gemini(
+    message: str,
+    history: List[Dict[str, str]],
+    api_key: str
+) -> Tuple[List[Dict[str, str]], str]:
+    """Chat with Gemini about data lineage."""
+    if not GEMINI_AVAILABLE:
+        return history + [
+            {"role": "user", "content": message},
+            {"role": "assistant", "content": "Google Gemini is not available. Please install google-generativeai package."}
+        ], ""
+    if not api_key:
+        return history + [
+            {"role": "user", "content": message},
+            {"role": "assistant", "content": "Please enter your Google Gemini API key to use the chatbot. You can get one at https://makersuite.google.com/app/apikey"}
+        ], ""
+    try:
+        genai.configure(api_key=api_key)
+        model = genai.GenerativeModel('gemini-1.5-flash')
+        # Build conversation history for context
+        chat_history = []
+        for msg in history:
+            role = "user" if msg.get("role") == "user" else "model"
+            chat_history.append({"role": role, "parts": [msg.get("content", "")]})
+        # Start chat with history
+        chat = model.start_chat(history=chat_history)
+        # Send message with system prompt context
+        full_prompt = f"{LINEAGE_AGENT_PROMPT}\n\nUser query: {message}"
+        response = chat.send_message(full_prompt)
+        assistant_message = response.text
+        # Extract any JSON from the response for the metadata field
+        extracted_json = ""
+        if "```json" in assistant_message:
+            try:
+                json_start = assistant_message.find("```json") + 7
+                json_end = assistant_message.find("```", json_start)
+                if json_end > json_start:
+                    extracted_json = assistant_message[json_start:json_end].strip()
+            except Exception:
+                pass
+        new_history = history + [
+            {"role": "user", "content": message},
+            {"role": "assistant", "content": assistant_message}
+        ]
+        return new_history, extracted_json
+    except Exception as e:
+        error_msg = f"Error communicating with Gemini: {str(e)}"
+        return history + [
+            {"role": "user", "content": message},
+            {"role": "assistant", "content": error_msg}
+        ], ""
+def use_generated_json(json_text: str) -> Tuple[str, str, str]:
+    """Use the generated JSON in the lineage extractor."""
+    if not json_text.strip():
+        return "", "", "No JSON to use. Ask the chatbot to generate lineage JSON first."
+    try:
+        # Validate JSON
+        json.loads(json_text)
+        # Return the JSON to be used in the main tab
+        return json_text, "Custom JSON", "JSON copied to metadata input. Switch to 'Text/File Metadata' tab and click 'Extract Lineage'."
+    except json.JSONDecodeError as e:
+        return "", "", f"Invalid JSON: {str(e)}"
 # ============================================================================
 # Gradio UI
 # ============================================================================
     | **Visualize** | Generate interactive Mermaid diagrams with color-coded nodes and relationship labels |
     | **Export** | Export to enterprise data catalogs: OpenLineage, Collibra, Purview, Alation, Atlas |
     | **MCP Integration** | Connect to MCP servers for AI-powered metadata extraction |
+    | **AI Assistant** | Chat with Gemini to generate lineage from natural language descriptions |
     ### Quick Start
                     outputs=[demo_viz, demo_summary]
                 )
+        # Tab 5: AI Chatbot (Gemini)
+        with gr.Tab("AI Assistant", id="chatbot"):
+            gr.Markdown("""
+            ## Lineage AI Assistant (Powered by Google Gemini)
+            Ask questions about data lineage, describe your data pipeline in natural language,
+            and get JSON metadata you can use to visualize lineage.
+            **Examples:**
+            - "I have a PostgreSQL database that feeds into a Spark ETL job, which outputs to a Snowflake warehouse"
+            - "Generate lineage for a dbt project with staging, intermediate, and mart layers"
+            - "What's the best way to document column-level lineage?"
+            """)
+            with gr.Row():
+                with gr.Column(scale=2):
+                    gemini_api_key = gr.Textbox(
+                        label="Google Gemini API Key",
+                        placeholder="Enter your Gemini API key (get one at makersuite.google.com)",
+                        type="password",
+                        info="Your API key is not stored and only used for this session"
+                    )
+            chatbot_display = gr.Chatbot(
+                label="Chat with Lineage AI",
+                height=400
+            )
+            with gr.Row():
+                chat_input = gr.Textbox(
+                    label="Your message",
+                    placeholder="Describe your data pipeline or ask about lineage...",
+                    lines=2,
+                    scale=4
+                )
+                send_btn = gr.Button("Send", variant="primary", scale=1)
+            with gr.Accordion("Generated JSON (if any)", open=False):
+                generated_json = gr.Code(
+                    label="Extracted JSON",
+                    language="json",
+                    lines=10
+                )
+                use_json_btn = gr.Button("Use This JSON in Lineage Tool", size="sm")
+                json_status = gr.Textbox(label="Status", interactive=False)
+            # Chat handlers
+            chat_state = gr.State([])
+            def handle_chat(message, history, api_key):
+                if not message.strip():
+                    return history, "", history
+                new_history, extracted = chat_with_gemini(message, history, api_key)
+                return new_history, extracted, new_history
+            send_btn.click(
+                fn=handle_chat,
+                inputs=[chat_input, chat_state, gemini_api_key],
+                outputs=[chatbot_display, generated_json, chat_state]
+            ).then(
+                fn=lambda: "",
+                outputs=[chat_input]
+            )
+            chat_input.submit(
+                fn=handle_chat,
+                inputs=[chat_input, chat_state, gemini_api_key],
+                outputs=[chatbot_display, generated_json, chat_state]
+            ).then(
+                fn=lambda: "",
+                outputs=[chat_input]
+            )
+            use_json_btn.click(
+                fn=use_generated_json,
+                inputs=[generated_json],
+                outputs=[metadata_input, source_type, json_status]
+            )
     # Footer
     gr.Markdown("""
     ---
     | **Collibra** | Collibra Data Intelligence | Enterprise data governance |
     | **Purview** | Microsoft Purview | Azure ecosystem |
     | **Alation** | Alation Data Catalog | Self-service analytics |
+    | **Atlas** | Apache Atlas | Open-source governance |
     ---

requirements.txt CHANGED Viewed

@@ -1,5 +1,6 @@
 gradio>=6.0.0
 anthropic>=0.25.0
 google-cloud-bigquery>=3.10.0
 requests>=2.31.0
 pyyaml>=6.0

 gradio>=6.0.0
 anthropic>=0.25.0
 google-cloud-bigquery>=3.10.0
+google-generativeai>=0.8.0
 requests>=2.31.0
 pyyaml>=6.0