
This repository contains the code and instructions necessary to reproduce the experiments presented in the paper: "STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models", accepted to ICLR 2026.
STAR (Similarity-guided Teacher-Assisted Refinement) is a novel holistic framework designed to effectively transfer the function calling capabilities of large language models (LLMs) to super-tiny, cost-efficient models. Our STAR training curriculum involves the two processes:
- Constrained Knowledge Distillation (CKD): The selected teacher's knowledge is transferred to a super-tiny student model (e.g., 0.6B) using our novel Constrained Knowledge Distillation (CKD) objective, which ensures training stability and preserves exploratory capacity.
- Similarity-guided Reinforcement Learning (Sim-RL): The distilled student model is polished with Sim-RL to enhance its generalization capability and optimize its performance on complex problems.
- [2025.02.04] We released the STAR codebase, including implementations for CKD and Sim-RL.
- [2025.02.04] Our paper is now available on arXiv: 2602.03022.
- [2026.01.26] Our paper has been accepted to ICLR 2026!
Our STAR models establish new state-of-the-art performance in their size classes. The STAR framework significantly closes the performance gap with much larger models.
We rely on uv for Python environment management and OpenRLHF for our RL training framework.
-
Create Python Environment
# Create a virtual environment using uv uv venv --seed --python 3.12 ./train-env # Install dependencies uv pip sync -p ./train-env/bin/python ./requirements_uv.txt source ./train-env/bin/activate
-
Install Patched OpenRLHF
# Clone the specific commit of OpenRLHF git clone https://github.com/OpenRLHF/OpenRLHF.git cd OpenRLHF git checkout c1fc63a9f7e1837577a76b0c688809b3c0bdc644 # Apply the patch for CKD functionality git apply ../0001-add-ckd.patch cd ..
Download the base models from Hugging Face. We use the Qwen-8B model as the teacher and smaller models as students.
# Teacher Model
huggingface-cli download star-lab/Teacher-8B --local-dir models/Teacher-8B
# Student Models (e.g., 0.6B)
huggingface-cli download Qwen/Qwen3-0.6B --local-dir models/Qwen3-0.6BBoth CKD and SimRL require datasets in jsonlines format, where each line is a JSON object with two fields:
inputs: The prompt formatted with the Qwen chat template.outputs: The response formatted with the Qwen chat template.
We recommend organizing your data into a structured format first (e.g., using the messages API format) and then converting it.
Example structured format:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "...", "tool_calls": [...], "reasoning_content": "..."},
{"role": "tool", "content": "..."},
],
"tools": [
{"name": "...", "description": "...", "parameters": ...},
]
}The example_messages.jsonl file is provided as an example, containing 1024 random samples from the xlam50k dataset. Based on this structured data, the training data for CKD can be generated using the teacher_rollout.py script, while the training data for Sim-RL can be generated using the messages_to_trainset.py script:
CKD data:
python teacher_rollout.py --input=example_messages.jsonl --output=kd_messages.jsonl --model-path ./models/Teacher-8B --rollout-n 8 --dp-size 8
python messages_to_trainset.py --input=kd_messages.jsonl --output=kd_data.jsonl --tokenizer-path=./models/Teacher-8B --add-reasoning-contentSimRL data:
python messages_to_trainset.py --input=example_messages.jsonl --output=rl_data.jsonl --tokenizer-path=./models/Teacher-8BBefore starting, prepare the environment for a training run:
export PYTHONPATH=$PWD/OpenRLHF
ray start --head --node-ip-address 0.0.0.0 --num-gpus 8 --disable-usage-statsFirst, distill knowledge from the teacher model to the student using CKD. This step requires training data generated by the teacher model. We provide teacher_rollout.py as a reference for generating these samples.
After preparing your models and data, edit the paths in scripts/train_ckd.sh and run it:
bash scripts/train_ckd.shThe distilled student model will be saved to the path specified in the script (e.g., checkpoints/student-0.6b-ckd).
Next, refine the CKD-distilled student model using Sim-RL to further boost its capabilities. While Sim-RL can be applied to any base model, it is most effective when used on a model already trained with CKD.
Update the model and data paths in scripts/train_sim_rl.sh and run the script:
bash scripts/train_sim_rl.shThe final STAR-0.6B model will be saved to the path specified in the script (e.g., checkpoints/star-0.6b).
This project is built upon the OpenRLHF framework. We thank the original authors for their significant open-source contributions.
If you find this work useful, please kindly cite our paper:
@misc{ni2026starsimilarityguidedteacherassistedrefinement,
title={STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models},
author={Jiliang Ni and Jiachen Pu and Zhongyi Yang and Jingfeng Luo and Conggang Hu},
year={2026},
eprint={2602.03022},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.03022},
}