UniXcoder ONNX for Code Search

Converted by VibeAtlas - AI Context Optimization for Developers

This is Microsoft's UniXcoder converted to ONNX format for use with Transformers.js in browser and Node.js environments.

Why UniXcoder?

UniXcoder understands code semantically, not just as text:

  • Trained on 6 programming languages (Python, Java, JavaScript, PHP, Ruby, Go)
  • Understands AST structure and data flow
  • 20-30% better code search accuracy vs generic embedding models

Quick Start

Transformers.js (Browser/Node.js)

import { pipeline } from '@huggingface/transformers';

const embedder = await pipeline(
  'feature-extraction',
  'sailesh27/unixcoder-base-onnx'
);

const code = `function authenticate(user) {
  return user.isValid && user.hasPermission;
}`;

const embedding = await embedder(code, {
  pooling: 'mean',
  normalize: true
});

console.log(embedding.dims); // [1, 768]

Semantic Code Search

import { pipeline, cos_sim } from '@huggingface/transformers';

const embedder = await pipeline('feature-extraction', 'sailesh27/unixcoder-base-onnx');

// Index your code
const codeSnippets = [
  'function login(user, pass) { ... }',
  'function formatDate(date) { ... }',
  'function validateEmail(email) { ... }'
];

const codeEmbeddings = await embedder(codeSnippets, { pooling: 'mean', normalize: true });

// Search with natural language
const query = 'user authentication';
const queryEmbedding = await embedder(query, { pooling: 'mean', normalize: true });

// Find most similar
const similarities = codeEmbeddings.tolist().map((emb, i) => ({
  code: codeSnippets[i],
  score: cos_sim(queryEmbedding.tolist()[0], emb)
}));

Technical Details

  • Architecture: RoBERTa-based encoder
  • Hidden Size: 768
  • Max Sequence Length: 512 tokens
  • Output Dimensions: 768
  • ONNX Opset: 14

About VibeAtlas

VibeAtlas is the reliability infrastructure for AI coding:

  • Reduce AI token costs by 40-60%
  • Improve code search accuracy with semantic understanding
  • Add governance guardrails to AI workflows

Links:

Citation

@misc{unixcoder-onnx-2025,
  title={UniXcoder ONNX: Code Embeddings for JavaScript},
  author={VibeAtlas Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/sailesh27/unixcoder-base-onnx}
}

Original UniXcoder Paper

@inproceedings{guo2022unixcoder,
  title={UniXcoder: Unified Cross-Modal Pre-training for Code Representation},
  author={Guo, Daya and Lu, Shuai and Duan, Nan and Wang, Yanlin and Zhou, Ming and Yin, Jian},
  booktitle={ACL},
  year={2022}
}

License

Apache 2.0 (same as original UniXcoder)

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sailesh27/unixcoder-base-onnx

Quantized
(1)
this model