AI & ML interests

AGI and ML Pipelines, Ambient IoT AI, Behavior Cognitive and Memory AI, Clinical Medical and Nursing AI, Genomics AI, GAN Gaming GAIL AR VR XR and Simulation AI, Graph Ontology KR KE AI, Languages and NLP AI, Quantum Compute GPU TPU NPU AI, Vision Image Document AI

hesamation 
posted an update 6 days ago
view post
Post
2782
this is big... 50 AI researchers from Bytedance, Alibaba, Tencent, and other labs/universities just published a 300-page paper with surprising lessons about coding models and agents (data, pre and post-training, etc).

key highlights:

> small LLMs can beat proprietary giants
RL (RLVR specifically) gives small open-source models an edge over big models in reasoning. a 14B model trained with RLVR on high-quality verified problems can match the performance of OpenAI's o3.

> models have a hard time learning Python.
mixing language models during pre-training is good, but Python behaves different from statically typed languages. languages with similar syntax (Java and C#, or JavaScript and TypeScript) creates high positive synergy. mixing Python heavily into the training of statically typed languages can actually hurt because of Python's dynamic typing.

> not all languages are equal (coding scaling laws)
the amount of data required to specialize a model on a language drastically depends on the language. paper argues like C# and Java are easier to learn (less training data required). languages like Python and Javascript are actually more tricky to learn, ironically (you see AI most used for these languages :)

> MoE vs Dense (ability vs stability)
MoE models offer higher capacity, but are much more fragile during SFT than dense models. hyperparams in training have a more drastic effect in MoE models, while dense models are more stable. MoE models also require constant learning rate schedules to avoid routing instability.

> code models are "insecure" by default (duh)
training on public repos makes models learn years of accumulated insecure coding patterns. safety fine-tuning often fails to work much on code. a model might refuse to write a hate speech email but will happily generate a SQL-injection vulnerable function because it "works."

read the full paper:
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence (2511.18538)
Sri-Vigneshwar-DJ 
posted an update 2 months ago
view post
Post
327
Do you think domain-specific embedding fine-tuners are needed?
I've been working with embeddings for marketing use cases and noticed something: most embeddings don't get marketing concepts very well. They're trained in general-purpose ways.
The Issue I'm Seeing
When I search marketing content with general embeddings:

"organic growth" returns farming articles
"conversion funnel" matches industrial equipment
"brand lift" doesn't connect to campaign effectiveness
Marketing jargon like CAC, ROAS, CTR aren't properly understood

My Question
Do you think domain-specific embeddings are needed for marketing?
Some thoughts:

Marketing has its own vocabulary and concept relationships
General models trained on Wikipedia/web crawl miss these nuances
But is fine-tuning worth the effort vs just using more retrieval tricks?

Quick Example
I fine-tuned all-mpnet-base-v2 on ~1000 marketing concept pairs and saw 15-20% better retrieval accuracy. But I'm curious:

Has anyone else tried this for marketing or other domains?
When do you think domain-specific embeddings are actually necessary vs overkill?
Are there better approaches I'm missing?

https://huggingface.co/blog/Sri-Vigneshwar-DJ/why-your-marketing-rag-system-needs-domain-specifi
  • 6 replies
·
Sri-Vigneshwar-DJ 
posted an update 2 months ago
view post
Post
4427
🚀 Exciting News! We've released a Performance Marketing Expert Dataset from Hawky.ai [www.hawky.ai] Hawky-ai


This dataset empowers AI models with cutting-edge strategies for Meta, Google Ads, and TikTok campaigns. It includes:
1. Multi-platform strategies for e-commerce, DTC, B2B, and more
2. Creative optimization and audience targeting insights
3. ROI-driven recommendations based on 2025 best practices

Sri-Vigneshwar-DJ/Performance-Marketing-Data
Sri-Vigneshwar-DJ 
posted an update 2 months ago
view post
Post
3331
🚀 Qwen3-Omni for Marketing: A Game-Changer

Just wanted to share something exciting I've been exploring—Qwen3-Omni and how it's transforming marketing workflows.

What makes it special? At Hawky.ai we are started experimenting with Qwen3 recently for Analysis and Optimization.

Unlike traditional tools that look at text, images, or audio separately, Qwen3-Omni analyzes everything together. It handles 119 languages, processes 40-minute audio sequences, and understands both images and videos—all at once.

The cool part? It's 2-3x faster than similar models thanks to its MoE architecture.

Real applications I'm seeing:
Ad Analysis: It scores video ads by combining visual elements, audio tone, and text—giving 25% better CTR predictions than single-mode tools.
Campaign Localization: Drop in one ad, get 10 localized versions with native voiceovers in under a minute. Perfect for testing across markets.

Market Research: Feed it competitor content, podcasts, or UGC videos. It extracts actionable insights like "3-second hooks boost retention by 15%" and saves about 70% of analysis time.

Quality Checks: Automatically catches lip-sync errors and audio-visual mismatches.

Full technical breakdown: https://huggingface.co/blog/Sri-Vigneshwar-DJ/hawky-aiqwen3-omni-advanced-architecture-and-marke

Has anyone else been experimenting with multimodal models for marketing? Would love to hear what you're building!

#MultimodalAI #MarTech #OpenSource
hesamation 
posted an update 3 months ago
view post
Post
10329
a senior engineer at google just dropped a 400-page free book on docs for review: agentic design patterns.

the table of contents looks like everything you need to know about agents + code:
> advanced prompt techniques
> multi-agent patterns
> tool use and MCP
> you name it

read it here: https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0#heading=h.pxcur8v2qagu

you can also pre-order on Amazon (published by Springer) and the royalties goes to Save the Children: https://www.amazon.com/Agentic-Design-Patterns-Hands-Intelligent/dp/3032014018/
hesamation 
posted an update 5 months ago
view post
Post
4136
longer context doesn't generate better responses. it can even hurt your llm/agent. 1M context window doesn't automatically make models smarter as it's not about the size; it's how you use it.

here are 4 types of context failure and why each one happens:

1. context poisoning: if hallucination finds its way into your context, the agent will rely on that false information to make its future moves. for example if the agent hallucinates about the "task description", all of its planning to solve the task would also be corrupt.

2. context distraction: when the context becomes too bloated, the model focuses too much on it rather than come up with novel ideas or to follow what it has learned during training. as Gemini 2.5 Pro technical report points out, as context grows significantly from 100K tokens, "the agent showed a tendency toward favoring repeating actions from its vast history rather than synthesizing novel plans".

3. context confusion: everyone lost it when MCPs became popular, it seemed like AGI was achieved. I suspected there is something wrong and there was: it's not just about providing tools, bloating the context with tool use derails the model from selecting the right one! even if you can fit all your tool metadata in the context, as their number grows, the model gets confused over which one to pick.

4. Context Clash: if you exchange conversation with a model step by step and provide information as you go along, chances are you get worse performance rather than providing all the useful information at once. one the model's context fills with wrong information, it's more difficult to guide it to embrace the right info. agents pull information from tools, documents, user queries, etc. and there is a chance that some of these information contradict each other, and it's not good new for agentic applications.

check this article by Drew Breunig for deeper read: https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html?ref=blog.langchain.com
  • 2 replies
·
hesamation 
posted an update 5 months ago
view post
Post
5367
in case you didn’t know, Claude now has a developer training course with certificates,

this is better than anything you can find on Coursera.

covers Claude Code, MCP and its advanced topics and even more:

https://www.anthropic.com/learn/build-with-claude
hesamation 
posted an update 6 months ago
hesamation 
posted an update 6 months ago
view post
Post
2796
I really like how this seven-stage pipeline was laid out in the Ultimate Guide to Fine-Tuning book.

It gives an overview, then goes into detail for each stage, even providing best practices.

It’s 115 pages on arxiv, definitely worth a read.

Check it out: https://arxiv.org/abs/2408.13296
hesamation 
posted an update 7 months ago
hesamation 
posted an update 7 months ago
view post
Post
3128
this book actually exists for free, “the little book of deep learning”. best to refresh your mind about DL basics:
> foundations of machine learning
> how models train
> common layers (dropout, pooling…)
> basic intro to LLMs
actually optimized for mobile.

Book: https://fleuret.org/public/lbdl.pdf
hesamation 
posted an update 8 months ago
view post
Post
3009
The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,

Here's some of their key findings:

1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.

This is verified in the DeepSeek-R1 paper.

2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.

3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.

This shows the RL reasoning is generalized beyond the specific domain knowledge.

Previous research also shows RL can be a great generalizer.

4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.

So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)

5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.

RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.

This might explain the "aha" moments!

6/ OpenAI's competitive programming paper showed an interesting finding:

o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)

RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.

He also lists more influential papers on this topic, It's a must-read if you're interested.

check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training
hesamation 
posted an update 8 months ago
view post
Post
2328
OpenAI just released a 34-page practical guide to building agents,

Here's 10 things it teaches us:

1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.

2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data

3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave

4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed

5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.

6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.

7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.

8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.

9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.

10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.

Download: https://t.co/fJaCkgf7ph
·
hesamation 
posted an update 8 months ago
hesamation 
posted an update 8 months ago
view post
Post
10373
Google published a 69-page whitepaper on Prompt Engineering and its best practices, a must-read if you are using LLMs in production:
> zero-shot, one-shot, few-shot
> system prompting
> chain-of-thought (CoT)
> ReAct

LINK: https://www.kaggle.com/whitepaper-prompt-engineering
> code prompting
> best practices
hesamation 
posted an update 8 months ago
view post
Post
2932
The best researchers from Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],

Here are some of their key findings:

They build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:

- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static

An agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.

Agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.

The memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory

The agent must simulate or predict the future states of the environment for planning and decision-making.

ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.

LLM world models are mostly implicit and embedded.

EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.

Agents must understand emotions to better interact with us.

But rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.

Perception is the process by which an agent receives and interprets raw data from its surroundings.

READ PAPER: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (2504.01990)
hesamation 
posted an update 8 months ago
view post
Post
2765
What, How, Where, and How Well? This paper reviews test-time scaling methods and all you need to know about them:
> parallel, sequential, hybrid, internal scaling
> how to scale (SFT, RL, search, verification)
> metrics and evals of test-time scaling

🔗paper: What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models (2503.24235)

If you want to learn what inference-time compute scaling is @rasbt has a great blog post on that:
https://magazine.sebastianraschka.com/p/state-of-llm-reasoning-and-inference-scaling
hesamation 
posted an update 8 months ago
awacke1 
posted an update 8 months ago
view post
Post
2530
AI Vision & SFT Titans 🌟 Turns PDFs into text, snaps pics, and births AI art.

https://huggingface.co/spaces/awacke1/TorchTransformers-Diffusion-CV-SFT

1. OCR a grocery list or train a titan while sipping coffee? ☕
2. Camera Snap 📷: Capture life’s chaos—your cat’s face or that weird receipt. Proof you’re a spy!
3. OCR 🔍: PDFs beg for mercy as GPT-4o extracts text.
4. Image Gen 🎨: Prompt “neon superhero me”
5. PDF 📄: Double-page OCR Single-page sniping

Build Titans 🌱: Train tiny AI models. 💪Characters🧑‍🎨: Craft quirky heroes.
🎥

not-lain 
posted an update 9 months ago