jrkns commited on
Commit
639332d
·
verified ·
1 Parent(s): 12314bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -3
README.md CHANGED
@@ -1,3 +1,116 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - ThaiLLM/ThaiLLM-8B
5
+ - Qwen/Qwen3-8B
6
+ - Qwen/Qwen3-8B-Base
7
+ pipeline_tag: text-generation
8
+ language:
9
+ - en
10
+ - th
11
+ tags:
12
+ - finance
13
+ - mergekit
14
+ - merge
15
+ ---
16
+ # THaLLE-ThaiLLM: Domain-Specialized Small LLMs for Finance and Thai
17
+
18
+ ## Model Overview
19
+
20
+ This 8B language model is developed as an extension of ThaiLLM-8B, with a focus on enhancing instruction-following capabilities and financial knowledge. The model is constructed using [mergekit](https://github.com/arcee-ai/mergekit) that integrates ThaiLLM-8B with Qwen3-8B and THaLLE, the latter of which was trained on 80 CFA examination sets.
21
+
22
+ **THaLLE-0.2-ThaiLLM-8B-fa** has the following features:
23
+ - **Supports switching between thinking and non-thinking modes**, similar to [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
24
+ - **Offers enhanced Thai language understanding** from [ThaiLLM-8B](https://huggingface.co/ThaiLLM/ThaiLLM-8B).
25
+ - **Incorporates the financial knowledge and understanding** expected of THaLLE fine-tuning.
26
+
27
+ ## Usage
28
+
29
+ ### Requirements
30
+
31
+ Since `KBTG-Labs/THaLLE-0.2-ThaiLLM-8B-fa` is a fine-tuned of Qwen3-8B you will need to install `transformers>=4.51.0`.
32
+
33
+ ### Running using Transformers
34
+
35
+ Running the script below generates output based on the given input messages.
36
+
37
+ ```python
38
+ import torch
39
+ from transformers import AutoModelForCausalLM, AutoTokenizer
40
+
41
+ MODEL_ID: str = "KBTG-Labs/THaLLE-0.2-ThaiLLM-8B-fa"
42
+
43
+ def inference(messages: list[dict[str, str]], model, tokenizer) -> str:
44
+ text = tokenizer.apply_chat_template(
45
+ messages,
46
+ tokenize=False,
47
+ add_generation_prompt=True,
48
+ enable_thinking=False, # Switches thinking modes.
49
+ )
50
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
51
+
52
+ generated_ids = model.generate(
53
+ model_inputs.input_ids,
54
+ max_new_tokens=768,
55
+ do_sample=False,
56
+ temperature=None,
57
+ top_p=None,
58
+ top_k=None,
59
+ pad_token_id=tokenizer.eos_token_id,
60
+ )
61
+ generated_ids = [
62
+ output_ids[len(input_ids) :]
63
+ for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
64
+ ]
65
+
66
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
67
+ return response
68
+
69
+ if __name__ == "__main__":
70
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
71
+ model = AutoModelForCausalLM.from_pretrained(
72
+ MODEL_ID,
73
+ torch_dtype=torch.bfloat16,
74
+ device_map="auto",
75
+ )
76
+
77
+ messages = [{"role": "user", "content": "สวัสดี!"}]
78
+ inference(messages, model, tokenizer)
79
+
80
+ ```
81
+
82
+ ## Results
83
+
84
+ For more details, our [Technical Report]() will be released soon.
85
+
86
+ | Model | M3 Exam | M6 Exam | Flare CFA* | IC |
87
+ | --------------------------------------- | --------- | --------- | ---------- | --------- |
88
+ | Non-Thinking | | | | |
89
+ | `Qwen3-8B` | 0.660 | 0.545 | 0.753 | 0.640 |
90
+ | `ThaiLLM-8B-Instruct`** | 0.707 | **0.623** | 0.762 | **0.720** |
91
+ | `THaLLE-0.2-ThaiLLM-8B-fa` | **0.725** | 0.572 | **0.771** | **0.720** |
92
+ | Thinking | | | | |
93
+ | `Qwen3-8B` | 0.706 | 0.590 | 0.806 | 0.600 |
94
+ | `ThaiLLM-8B-Instruct`** | 0.720 | 0.661 | 0.820 | 0.720 |
95
+ | `THaLLE-0.2-ThaiLLM-8B-fa` | **0.779** | **0.678** | **0.852** | **0.840** |
96
+
97
+ [*] Flare CFA is `"TheFinAI/flare-cfa"`
98
+
99
+ [**] `"ThaiLLM-8B-Instruct"` is [KBTG-Labs/ThaiLLM-8B-Instruct](https://huggingface.co/KBTG-Labs/ThaiLLM-8B-Instruct)
100
+
101
+ [vLLM](https://github.com/vllm-project/vllm) was used for evaluations, results might vary.
102
+
103
+ ## Citation
104
+
105
+ If you find our work useful, please cite:
106
+
107
+ ```
108
+ @misc{labs2024thalle,
109
+ title={THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report},
110
+ author={KBTG Labs and Danupat Khamnuansin and Atthakorn Petchsod and Anuruth Lertpiya and Pornchanan Balee and Thanawat Lodkaew and Tawunrat Chalothorn and Thadpong Pongthawornkamol and Monchai Lertsutthiwong},
111
+ year={2024},
112
+ eprint={2406.07505},
113
+ archivePrefix={arXiv},
114
+ primaryClass={cs.CL}
115
+ }
116
+ ```