AI & ML interests

Open science and open source

Recent Activity

Parveshiiii 
posted an update 18 days ago
view post
Post
1600
Another banger from XenArcAI! 🔥

We’re thrilled to unveil three powerful new releases that push the boundaries of AI research and development:

🔗 XenArcAI/SparkEmbedding-300m

- A lightning-fast embedding model built for scale.
- Optimized for semantic search, clustering, and representation learning.

🔗 XenArcAI/CodeX-7M-Non-Thinking

- A massive dataset of 7 million code samples.
- Designed for training models on raw coding patterns without reasoning layers.

🔗 XenArcAI/CodeX-2M-Thinking

- A curated dataset of 2 million code samples.
- Focused on reasoning-driven coding tasks, enabling smarter AI coding assistants.

Together, these projects represent a leap forward in building smarter, faster, and more capable AI systems.

💡 Innovation meets dedication.
🌍 Knowledge meets responsibility.


Parveshiiii 
posted an update 26 days ago
view post
Post
3015
SparkEmbedding - SoTA cross lingual retrieval

Iam very happy to announce our latest embedding model sparkembedding-300m base on embeddinggemma-300m we fine tuned it on 1m extra examples spanning over 119 languages and result is this model achieves exceptional cross lingual retrieval

Model: XenArcAI/SparkEmbedding-300m
Parveshiiii 
posted an update about 2 months ago
view post
Post
196
AIRealNet - SoTA - Image detection model

We’re proud to release AIRealNet — a binary image classifier built to detect whether an image is AI-generated or a real human photograph. Based on SwinV2 and fine-tuned on the AI-vs-Real dataset, this model is optimized for high-accuracy classification across diverse visual domains.

If you care about synthetic media detection or want to explore the frontier of AI vs human realism, we’d love your support. Please like the model and try it out. Every download helps us improve and expand future versions.

Model page: XenArcAI/AIRealNet
Sri-Vigneshwar-DJ 
posted an update 2 months ago
view post
Post
327
Do you think domain-specific embedding fine-tuners are needed?
I've been working with embeddings for marketing use cases and noticed something: most embeddings don't get marketing concepts very well. They're trained in general-purpose ways.
The Issue I'm Seeing
When I search marketing content with general embeddings:

"organic growth" returns farming articles
"conversion funnel" matches industrial equipment
"brand lift" doesn't connect to campaign effectiveness
Marketing jargon like CAC, ROAS, CTR aren't properly understood

My Question
Do you think domain-specific embeddings are needed for marketing?
Some thoughts:

Marketing has its own vocabulary and concept relationships
General models trained on Wikipedia/web crawl miss these nuances
But is fine-tuning worth the effort vs just using more retrieval tricks?

Quick Example
I fine-tuned all-mpnet-base-v2 on ~1000 marketing concept pairs and saw 15-20% better retrieval accuracy. But I'm curious:

Has anyone else tried this for marketing or other domains?
When do you think domain-specific embeddings are actually necessary vs overkill?
Are there better approaches I'm missing?

https://huggingface.co/blog/Sri-Vigneshwar-DJ/why-your-marketing-rag-system-needs-domain-specifi
  • 6 replies
·
Parveshiiii 
posted an update 2 months ago
view post
Post
4482
Ever wanted an open‑source deep research agent? Meet Deepresearch‑Agent 🔍🤖

1. Multi‑step reasoning: Reflects between steps, fills gaps, iterates until evidence is solid.

2. Research‑augmented: Generates queries, searches, synthesizes, and cites sources.

3. Fullstack + LLM‑friendly: React/Tailwind frontend, LangGraph/FastAPI backend; works with OpenAI/Gemini.


🔗 GitHub: https://github.com/Parveshiiii/Deepresearch-Agent
Sri-Vigneshwar-DJ 
posted an update 2 months ago
view post
Post
4427
🚀 Exciting News! We've released a Performance Marketing Expert Dataset from Hawky.ai [www.hawky.ai] Hawky-ai


This dataset empowers AI models with cutting-edge strategies for Meta, Google Ads, and TikTok campaigns. It includes:
1. Multi-platform strategies for e-commerce, DTC, B2B, and more
2. Creative optimization and audience targeting insights
3. ROI-driven recommendations based on 2025 best practices

Sri-Vigneshwar-DJ/Performance-Marketing-Data
Sri-Vigneshwar-DJ 
posted an update 2 months ago
view post
Post
3331
🚀 Qwen3-Omni for Marketing: A Game-Changer

Just wanted to share something exciting I've been exploring—Qwen3-Omni and how it's transforming marketing workflows.

What makes it special? At Hawky.ai we are started experimenting with Qwen3 recently for Analysis and Optimization.

Unlike traditional tools that look at text, images, or audio separately, Qwen3-Omni analyzes everything together. It handles 119 languages, processes 40-minute audio sequences, and understands both images and videos—all at once.

The cool part? It's 2-3x faster than similar models thanks to its MoE architecture.

Real applications I'm seeing:
Ad Analysis: It scores video ads by combining visual elements, audio tone, and text—giving 25% better CTR predictions than single-mode tools.
Campaign Localization: Drop in one ad, get 10 localized versions with native voiceovers in under a minute. Perfect for testing across markets.

Market Research: Feed it competitor content, podcasts, or UGC videos. It extracts actionable insights like "3-second hooks boost retention by 15%" and saves about 70% of analysis time.

Quality Checks: Automatically catches lip-sync errors and audio-visual mismatches.

Full technical breakdown: https://huggingface.co/blog/Sri-Vigneshwar-DJ/hawky-aiqwen3-omni-advanced-architecture-and-marke

Has anyone else been experimenting with multimodal models for marketing? Would love to hear what you're building!

#MultimodalAI #MarTech #OpenSource
Parveshiiii 
posted an update 2 months ago
view post
Post
3099
🚀 Big news from XenArcAI!

We’ve just released our new dataset: **Bhagwat‑Gita‑Infinity** 🌸📖

✨ What’s inside:
- Verse‑aligned Sanskrit, Hindi, and English
- Clean, structured, and ready for ML/AI projects
- Perfect for research, education, and open‑source exploration

🔗 Hugging Face: XenArcAI/Bhagwat-Gita-Infinity

Let’s bring timeless wisdom into modern AI together 🙌
Parveshiiii 
posted an update 2 months ago
view post
Post
2451
🚀 New Release from XenArcAI
We’re excited to introduce AIRealNet — our SwinV2‑based image classifier built to distinguish between artificial and real images.

✨ Highlights:
- Backbone: SwinV2
- Input size: 256×256
- Labels: artificial vs. real
- Performance: Accuracy 0.999 | F1 0.999 | Val Loss 0.0063

This model is now live on Hugging Face:
👉 XenArcAI/AIRealNet

We built AIRealNet to push forward open‑source tools for authenticity detection, and we can’t wait to see how the community uses it.
freddyaboulton 
posted an update 3 months ago
Parveshiiii 
posted an update 4 months ago
view post
Post
1110
🚀 Just Dropped: MathX-5M — Your Gateway to Math-Savvy GPTs

👨‍🔬 Wanna fine-tune your own GPT for math?
🧠 Building a reasoning agent that actually *thinks*?
📊 Benchmarking multi-step logic across domains?

Say hello to [**MathX-5M**]( XenArcAI/MathX-5M) — a **5 million+ sample** dataset crafted for training and evaluating math reasoning models at scale.

Built by **XenArcAI**, it’s optimized for:
- 🔍 Step-by-step reasoning with , , and formats
- 🧮 Coverage from arithmetic to advanced algebra and geometry
- 🧰 Plug-and-play with Gemma, Qwen, Mistral, and other open LLMs
- 🧵 Compatible with Harmony, Alpaca, and OpenChat-style instruction formats

Whether you're prototyping a math tutor, testing agentic workflows, or just want your GPT to solve equations like a pro—**MathX-5M is your launchpad**.

🔗 Dive in: ( XenArcAI/MathX-5M)

Let’s make open-source models *actually* smart at math.
#FineTuneYourGPT #MathX5M #OpenSourceAI #LLM #XenArcAI #Reasoning #Gemma #Qwen #Mistral

Parveshiiii 
posted an update 4 months ago
view post
Post
1096
🚀 Launch Alert: Dev-Stack-Agents
Meet your 50-agent senior AI team — principal-level experts in engineering, AI, DevOps, security, product, and more — all bundled into one modular repo.

+ Code. Optimize. Scale. Secure.
- Full-stack execution, Claude-powered. No human bottlenecks.


🔧 Built for Claude Code
Seamlessly plug into Claude’s dev environment:

* 🧠 Each .md file = a fully defined expert persona
* ⚙️ Claude indexes them as agents with roles, skills & strategy
* 🤖 You chat → Claude auto-routes to the right agent(s)
* ✍️ Want precision? Just call @agent-name directly
* 👥 Complex task? Mention multiple agents for team execution

Examples:

"@security-auditor please review auth flow for risks"
"@cloud-architect + @devops-troubleshooter → design a resilient multi-region setup"
"@ai-engineer + @legal-advisor → build a privacy-safe RAG pipeline"


🔗 https://github.com/Parveshiiii/Dev-Stack-Agents
MIT License | Claude-Ready | PRs Welcome

Parveshiiii 
posted an update 5 months ago
view post
Post
2705
🧠 Glimpses of AGI — A Vision for All Humanity
What if AGI wasn’t just a distant dream—but a blueprint already unfolding?

I’ve just published a deep dive called Glimpses of AGI, exploring how scalable intelligence, synthetic reasoning, and alignment strategies are paving a new path forward. This isn’t your average tech commentary—it’s a bold vision for conscious AI systems that reason, align, and adapt beyond narrow tasks.

🔍 Read it, upvote it if it sparks something, and let’s ignite a collective conversation about the future of AGI.

https://huggingface.co/blog/Parveshiiii/glimpses-of-agi


Parveshiiii 
posted an update 5 months ago
view post
Post
2856
🧠 MathX-5M by XenArcAI — Scalable Math Reasoning for Smarter LLMs

Introducing MathX-5M, a high-quality, instruction-tuned dataset built to supercharge mathematical reasoning in large language models. With 5 million rigorously filtered examples, it spans everything from basic arithmetic to advanced calculus—curated from public sources and enhanced with synthetic data.

🔍 Key Highlights:
- Step-by-step reasoning with verified answers
- Covers algebra, geometry, calculus, logic, and more
- RL-validated correctness and multi-stage filtering
- Ideal for fine-tuning, benchmarking, and educational AI

📂 - XenArcAI/MathX-5M


  • 1 reply
·
freddyaboulton 
posted an update 5 months ago
freddyaboulton 
posted an update 6 months ago
freddyaboulton 
posted an update 6 months ago