--- title: Lineage Graph Accelerator emoji: π₯ colorFrom: purple colorTo: blue sdk: gradio sdk_version: 6.0.0 app_file: app.py pinned: true license: mit short_description: AI data lineage extraction & export to data catalogs tags: - data-lineage - mcp - gradio - data-governance - dbt - airflow - etl - mcp-in-action-track-productivity - hackathon --- # Lineage Graph Accelerator π₯ **AI-powered data lineage extraction and visualization for modern data platforms** [](https://huggingface.co/spaces/aamanlamba/Lineage-graph-accelerator) [](https://opensource.org/licenses/MIT) [](https://gradio.app) > π **Built for the Gradio Agents & MCP Hackathon - Winter 2025** π > > Celebrating MCP's 1st Birthday! This project demonstrates the power of MCP integration for enterprise data governance. --- ## π What is Lineage Graph Accelerator? Lineage Graph Accelerator is an AI-powered tool that helps data teams: - **Extract** data lineage from dbt, Airflow, BigQuery, Snowflake, and more - **Visualize** complex data dependencies with interactive Mermaid diagrams - **Export** lineage to enterprise data catalogs (Collibra, Microsoft Purview, Alation) - **Integrate** with MCP servers for enhanced AI-powered processing ### Why Data Lineage Matters Understanding where your data comes from and where it goes is critical for: - **Data Quality**: Track data transformations and identify issues - **Compliance**: Document data flows for GDPR, CCPA, and other regulations - **Impact Analysis**: Understand downstream effects of schema changes - **Data Discovery**: Help analysts find and trust data assets --- ## π― Key Features ### Multi-Source Support | Source | Status | Description | |--------|--------|-------------| | dbt Manifest | β | Parse dbt's manifest.json for model dependencies | | Airflow DAG | β | Extract task dependencies from DAG definitions | | SQL DDL | β | Parse CREATE statements for table lineage | | BigQuery | β | Query INFORMATION_SCHEMA for metadata | | Custom JSON | β | Flexible node/edge format for any source | | Snowflake | π | Coming via MCP integration | ### Export to Data Catalogs | Catalog | Status | Format | |---------|--------|--------| | OpenLineage | β | Universal open standard | | Collibra | β | Data Intelligence Platform | | Microsoft Purview | β | Azure Data Governance | | Alation | β | Data Catalog | | Apache Atlas | π | Coming soon | ### Visualization Options - **Mermaid Diagrams**: Interactive, client-side rendering - **Subgraph Grouping**: Organize by data layer (raw, staging, marts) - **Color-Coded Nodes**: Distinguish sources, tables, models, reports - **Edge Labels**: Show transformation types --- ## π Quick Start ### Try Online (HuggingFace Space) 1. Visit [Lineage Graph Accelerator on HuggingFace](https://huggingface.co/spaces/YOUR_SPACE) 2. Click "Load Sample" to load example data 3. Click "Extract Lineage" to see the visualization 4. Explore the Demo Gallery for more examples ### Run Locally ```bash # Clone the repository git clone https://github.com/YOUR_REPO/lineage-graph-accelerator.git cd lineage-graph-accelerator # Create virtual environment python3 -m venv .venv source .venv/bin/activate # Install dependencies pip install -r requirements.txt # Run the app python app.py ``` Open http://127.0.0.1:7860 in your browser. --- ## π Usage Guide ### 1. Text/File Metadata Tab Paste your metadata directly: ```json { "nodes": [ {"id": "source_db", "type": "source", "name": "Source Database"}, {"id": "staging", "type": "table", "name": "Staging Table"}, {"id": "analytics", "type": "table", "name": "Analytics Table"} ], "edges": [ {"from": "source_db", "to": "staging"}, {"from": "staging", "to": "analytics"} ] } ``` ### 2. Sample Data Load pre-built samples to explore different scenarios: - **Simple JSON**: Basic node/edge lineage - **dbt Manifest**: Full dbt project with 15+ models - **Airflow DAG**: ETL pipeline with 15 tasks - **Data Warehouse**: Snowflake-style multi-layer architecture - **ETL Pipeline**: Complex multi-source pipeline - **Complex Demo**: 50+ node e-commerce platform ### 3. Export to Data Catalogs 1. Extract lineage from your metadata 2. Expand "Export to Data Catalog" 3. Select format (OpenLineage, Collibra, Purview, Alation) 4. Click "Generate Export" 5. Copy the JSON for import into your catalog --- ## π MCP Integration Connect to MCP (Model Context Protocol) servers for enhanced processing: ``` βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β Lineage Graph ββββββΆβ MCP Server ββββββΆβ AI Model β β Accelerator β β (HuggingFace) β β (Claude) β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ ``` ### Configuration 1. Expand "MCP Server Configuration" in the UI 2. Enter your MCP server URL 3. Add API key (if required) 4. Click "Test Connection" ### Run Local MCP Server ```bash uvicorn mcp_example.server:app --reload --port 9000 ``` Then use `http://localhost:9000/mcp` as your server URL. --- ## ποΈ Architecture ```mermaid flowchart TD A[User Interface - Gradio] --> B[Input Parser] B --> C{Source Type} C -->|dbt| D[dbt Parser] C -->|Airflow| E[Airflow Parser] C -->|SQL| F[SQL Parser] C -->|JSON| G[JSON Parser] D & E & F & G --> H[LineageGraph] H --> I[Mermaid Generator] H --> J[Export Engine] I --> K[Visualization] J --> L[OpenLineage] J --> M[Collibra] J --> N[Purview] J --> O[Alation] subgraph Optional P[MCP Server] --> H end ``` ### Project Structure ``` lineage-graph-accelerator/ βββ app.py # Main Gradio application βββ exporters/ # Data catalog exporters β βββ __init__.py β βββ base.py # Base classes β βββ openlineage.py # OpenLineage format β βββ collibra.py # Collibra format β βββ purview.py # Microsoft Purview format β βββ alation.py # Alation format βββ samples/ # Sample data files β βββ sample_metadata.json β βββ dbt_manifest_sample.json β βββ airflow_dag_sample.json β βββ sql_ddl_sample.sql β βββ warehouse_lineage_sample.json β βββ etl_pipeline_sample.json β βββ complex_lineage_demo.json βββ mcp_example/ # Example MCP server β βββ server.py βββ tests/ # Unit tests β βββ test_app.py βββ memories/ # Agent configuration βββ USER_GUIDE.md # Comprehensive user guide βββ BUILD_PLAN.md # Development roadmap βββ requirements.txt ``` --- ## π§ͺ Testing ```bash # Activate virtual environment source .venv/bin/activate # Run unit tests python -m unittest tests.test_app -v # Run setup validation python test_setup.py ``` --- ## π Requirements - Python 3.9+ - Gradio 5.49.1+ - See `requirements.txt` for full dependencies --- ## ποΈ Competition Submission **Track**: Track 2 - MCP in Action (Productivity) **Team Members**: - [Aaman Lamba](https://aamanlamba.com) | [HuggingFace](https://huggingface.co/aamanlamba) | [GitHub](https://github.com/aamanlamba) ### Judging Criteria Alignment | Criteria | Implementation | |----------|----------------| | **UI/UX Design** | Clean, professional interface with tabs, accordions, and color-coded visualizations | | **Functionality** | Full MCP integration, multiple input formats, 5 export formats | | **Creativity** | Novel approach to data lineage visualization with AI-powered parsing | | **Documentation** | Comprehensive README, USER_GUIDE.md, inline comments | | **Real-world Impact** | Solves critical enterprise need for data governance and compliance | ### Demo Video πΊ **YouTube**: [Watch the Demo](https://youtu.be/U4Dfc7txa_0) π₯ **Loom**: [Alternative Link](https://www.loom.com/share/3de27e88e01f4e97bfd13e4f0031f416) **Highlights**: - AI Assistant with Google Gemini generating lineage from natural language - MCP Integration with Local Demo server - Demo Gallery with 50+ node complex pipelines - Export to Collibra, Purview, and Apache Atlas - Interactive Mermaid visualizations with zoom and download ### Social Media Post π± **LinkedIn**: [View the announcement post](https://www.linkedin.com/posts/aamanlamba_lineage-graph-accelerator-a-hugging-face-activity-7400658296166297600-n9a6) --- ## π Roadmap - [x] Gradio 6 upgrade for enhanced UI components - [x] Agentic chatbot for natural language queries (Google Gemini) - [x] Apache Atlas export support - [ ] File upload functionality - [x] Graph export as PNG/SVG - [ ] Batch processing API - [ ] Column-level lineage --- ## π€ Contributing Contributions welcome! Please: 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Submit a pull request See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. --- ## π License MIT License - see [LICENSE](LICENSE) for details. --- ## π Acknowledgments - **Anthropic** - MCP Protocol and Claude - **Gradio Team** - Amazing UI framework - **HuggingFace** - Hosting and community - **dbt Labs** - Inspiration for metadata standards - **OpenLineage** - Open lineage specification --- ## π Support - **Documentation**: [USER_GUIDE.md](USER_GUIDE.md) - **Author Website**: [aamanlamba.com](https://aamanlamba.com) - **Issues**: [GitHub Issues](https://github.com/aamanlamba/lineage-graph-accelerator/issues) - **Discussion**: [HuggingFace Community](https://huggingface.co/spaces/aamanlamba/Lineage-graph-accelerator/discussions) ---
Built with β€οΈ by Aaman Lamba for the Gradio Agents & MCP Hackathon - Winter 2025
Celebrating MCP's 1st Birthday! π