Reasoning Distillation: як навчити маленьку модель думати як велику

GPT-4 розв'язує математичні олімпіади. Llama-7B — ні. Різниця? 100x параметрів і мільярди доларів тренування. Але що якщо можна «вкрасти» reasoning здібності великої моделі і передати їх маленькій?

Не copy-paste відповідей — це банально і не працює. А copy-paste способу мислення. Не «що», а «як».

Reasoning distillation — саме про це. Навчити компактну модель не просто імітувати outputs, а відтворювати процес логічного міркування. І результати вражають: 7B модель після distillation може досягати 70-80% performance GPT-4 на reasoning tasks. За 1% вартості.

Класична vs Reasoning Distillation

Проблема класичної Knowledge Distillation (Hinton, 2015):

class ClassicKnowledgeDistillation:
    """
    Класичний підхід: студент імітує soft labels teacher'а

    Проблема для reasoning:
    - Teacher: "Відповідь: 42"
    - Student вчиться: передбачати "42"
    - Student НЕ вчиться: ЧОМУ 42
    """

    def distillation_loss(self, student_logits, teacher_logits, temperature=2.0):
        """
        KL divergence між softened distributions

        Обмеження: передає "що відповідати", не "як думати"
        """
        student_soft = F.softmax(student_logits / temperature, dim=-1)
        teacher_soft = F.softmax(teacher_logits / temperature, dim=-1)

        return F.kl_div(
            student_soft.log(),
            teacher_soft,
            reduction='batchmean'
        ) * (temperature ** 2)


class ReasoningDistillation:
    """
    Reasoning Distillation: студент вчиться процесу

    Teacher: "Крок 1: ... Крок 2: ... Крок 3: ... Відповідь: 42"
    Student вчиться: відтворювати ВЕСЬ ПРОЦЕС міркування

    Перевага: student вчить думати, не просто відповідати
    """

    def generate_reasoning_trace(self, teacher_model, question: str) -> dict:
        """Генеруємо повний trace з teacher"""

        prompt = f"""Solve this problem step by step.
        Show all your reasoning clearly.

        Problem: {question}

        Let's solve this step by step:
        """

        response = teacher_model.generate(prompt, max_tokens=512)

        return {
            "question": question,
            "reasoning": response,
            "answer": self.extract_answer(response)
        }

    def train_student(self, student_model, reasoning_traces):
        """Fine-tune student на reasoning traces"""

        # Формат training data:
        # input: question
        # output: full reasoning trace + answer

        for trace in reasoning_traces:
            training_input = f"Question: {trace['question']}\nLet's think step by step:"
            training_output = trace['reasoning']

            # Standard language modeling objective
            loss = student_model.compute_lm_loss(training_input, training_output)
            loss.backward()

Методи Reasoning Distillation

1. Chain-of-Thought Distillation:

class ChainOfThoughtDistillation:
    """
    Базовий метод: teacher генерує CoT, student fine-tuned на CoT

    Проста ідея, сильні результати
    """

    def __init__(self, teacher_model, student_model):
        self.teacher = teacher_model
        self.student = student_model

    def generate_cot_dataset(self, questions: list[str],
                             num_samples_per_question: int = 3) -> list[dict]:
        """
        Генеруємо reasoning traces для training dataset

        Multiple samples per question:
        - Різні reasoning paths
        - Self-consistency filtering
        """
        dataset = []

        for question in questions:
            samples = []
            for _ in range(num_samples_per_question):
                trace = self.generate_single_trace(question)
                samples.append(trace)

            # Keep only consistent samples (same answer)
            answers = [s["answer"] for s in samples]
            most_common = max(set(answers), key=answers.count)
            filtered = [s for s in samples if s["answer"] == most_common]

            dataset.extend(filtered)

        return dataset

    def generate_single_trace(self, question: str) -> dict:
        prompt = f"""Solve this problem step by step.
Be clear and precise in your reasoning.
Format each step on a new line.

Problem: {question}

Solution:"""

        response = self.teacher.generate(
            prompt,
            temperature=0.7,  # Some diversity
            max_tokens=512
        )

        return {
            "question": question,
            "cot": response,
            "answer": self.extract_final_answer(response)
        }

    def fine_tune_student(self, dataset: list[dict]) -> None:
        """Fine-tune student на CoT traces"""

        training_examples = []
        for item in dataset:
            example = {
                "input": f"Problem: {item['question']}\n\nLet's solve step by step:",
                "output": item["cot"]
            }
            training_examples.append(example)

        # Standard SFT
        self.student.supervised_fine_tune(training_examples)

2. Step-by-Step Distillation (Hsieh et al., 2023):

class StepByStepDistillation:
    """
    Покращений метод: окремо distill кожен reasoning step

    Ідея: розбити reasoning на atomic steps,
    навчити student кожен step окремо
    """

    def __init__(self, teacher_model, student_model, label_model=None):
        self.teacher = teacher_model
        self.student = student_model
        # Optional smaller model для initial labeling
        self.label_model = label_model or teacher_model

    def generate_stepwise_labels(self, question: str,
                                  answer: str) -> list[dict]:
        """
        Генеруємо step-by-step reasoning з final answer verification
        """

        prompt = f"""Given the question and correct answer, generate step-by-step reasoning.

Question: {question}
Correct Answer: {answer}

Generate clear reasoning steps that lead to this answer:
Step 1:"""

        response = self.teacher.generate(prompt, max_tokens=512)

        # Parse into individual steps
        steps = self.parse_steps(response)

        return {
            "question": question,
            "steps": steps,
            "final_answer": answer
        }

    def parse_steps(self, reasoning: str) -> list[dict]:
        """Parse reasoning into structured steps"""
        steps = []
        lines = reasoning.split('\n')

        current_step = {"number": 0, "text": "", "operation": None, "result": None}

        for line in lines:
            if line.startswith("Step"):
                if current_step["text"]:
                    steps.append(current_step)
                step_num = int(line.split(":")[0].replace("Step", "").strip())
                current_step = {
                    "number": step_num,
                    "text": line.split(":", 1)[1].strip() if ":" in line else "",
                    "operation": self.extract_operation(line),
                    "result": self.extract_result(line)
                }
            else:
                current_step["text"] += " " + line.strip()

        if current_step["text"]:
            steps.append(current_step)

        return steps

    def train_with_rationale_loss(self, dataset: list[dict]):
        """
        Multi-task training:
        1. Predict next step given previous steps
        2. Predict final answer
        """

        for item in dataset:
            # Step prediction loss
            for i, step in enumerate(item["steps"][1:], 1):
                prefix = self.format_prefix(item["question"], item["steps"][:i])
                target = step["text"]

                step_loss = self.student.compute_lm_loss(prefix, target)

            # Answer prediction loss
            full_reasoning = self.format_full_reasoning(item)
            answer_loss = self.student.compute_lm_loss(
                full_reasoning,
                f"The answer is: {item['final_answer']}"
            )

            # Combined loss
            total_loss = step_loss + answer_loss
            total_loss.backward()

3. Self-Taught Reasoner (STaR):

class STaRDistillation:
    """
    Self-Taught Reasoner (Zelikman et al., 2022)

    Ітеративний процес:
    1. Student намагається reasoning
    2. Якщо правильно — зберігаємо reasoning
    3. Якщо неправильно — rationalize (генеруємо з hint)
    4. Fine-tune на успішних traces
    5. Repeat
    """

    def __init__(self, model, max_iterations: int = 5):
        self.model = model
        self.max_iterations = max_iterations

    def star_iteration(self, training_data: list[dict]) -> list[dict]:
        """Одна ітерація STaR"""

        successful_traces = []

        for item in training_data:
            question = item["question"]
            gold_answer = item["answer"]

            # 1. Try direct reasoning
            trace = self.model.generate_reasoning(question)
            predicted = self.extract_answer(trace)

            if predicted == gold_answer:
                # Success! Keep this trace
                successful_traces.append({
                    "question": question,
                    "reasoning": trace,
                    "answer": gold_answer
                })
            else:
                # 2. Rationalization: generate with hint
                rationalized = self.rationalize(question, gold_answer)
                successful_traces.append({
                    "question": question,
                    "reasoning": rationalized,
                    "answer": gold_answer
                })

        return successful_traces

    def rationalize(self, question: str, correct_answer: str) -> str:
        """
        Генеруємо reasoning маючи правильну відповідь

        Це "cheating" але дозволяє моделі вчитись
        правильним reasoning patterns
        """

        prompt = f"""Given the question and correct answer, generate the reasoning.

Question: {question}
Correct Answer: {correct_answer}

Step-by-step reasoning that leads to this answer:"""

        return self.model.generate(prompt)

    def train(self, initial_data: list[dict]):
        """Повний STaR training loop"""

        current_data = initial_data

        for iteration in range(self.max_iterations):
            print(f"STaR Iteration {iteration + 1}")

            # Generate traces
            traces = self.star_iteration(current_data)

            # Fine-tune on successful traces
            self.model.fine_tune(traces)

            # Evaluate
            accuracy = self.evaluate(current_data)
            print(f"Accuracy: {accuracy:.2%}")

            if accuracy > 0.95:
                break

4. Orca-style Progressive Learning:

class OrcaDistillation:
    """
    Orca: Progressive Learning from Complex Explanations (Microsoft, 2023)

    Ключові ідеї:
    1. System prompts для різних типів пояснень
    2. Progressive: прості → складні tasks
    3. Explanation tuning замість answer tuning
    """

    EXPLANATION_PROMPTS = {
        "step_by_step": """You are an AI assistant. Think step by step
and show all your reasoning before giving the final answer.""",

        "explain_like_teacher": """You are a patient teacher.
Explain your reasoning in a way a student would understand.
Break down complex concepts.""",

        "critical_thinking": """You are a critical thinker.
Consider multiple perspectives, identify assumptions,
and evaluate the strength of arguments.""",

        "expert_analysis": """You are a domain expert.
Provide detailed technical analysis with precise terminology."""
    }

    def generate_diverse_explanations(self, question: str,
                                      teacher_model) -> list[dict]:
        """Генеруємо explanations з різними стилями"""

        explanations = []

        for prompt_type, system_prompt in self.EXPLANATION_PROMPTS.items():
            response = teacher_model.generate(
                prompt=question,
                system_prompt=system_prompt,
                temperature=0.7
            )

            explanations.append({
                "question": question,
                "explanation_type": prompt_type,
                "response": response
            })

        return explanations

    def progressive_curriculum(self, data: list[dict]) -> list[list[dict]]:
        """
        Розбиваємо data на рівні складності

        Curriculum:
        1. Простi single-step reasoning
        2. Multi-step з 2-3 кроками
        3. Complex multi-hop reasoning
        4. Creative/abstract reasoning
        """

        levels = {
            "simple": [],
            "medium": [],
            "complex": [],
            "advanced": []
        }

        for item in data:
            complexity = self.estimate_complexity(item["question"])
            levels[complexity].append(item)

        return [levels["simple"], levels["medium"],
                levels["complex"], levels["advanced"]]

    def estimate_complexity(self, question: str) -> str:
        """Оцінка складності питання"""
        # Simplified heuristic
        word_count = len(question.split())
        has_numbers = any(c.isdigit() for c in question)
        keywords = ["compare", "analyze", "evaluate", "explain why"]

        if word_count < 20 and not any(k in question.lower() for k in keywords):
            return "simple"
        elif word_count < 50:
            return "medium"
        elif word_count < 100:
            return "complex"
        else:
            return "advanced"

Верифікація Reasoning

Перевірка якості distilled reasoning:

class ReasoningVerifier:
    """
    Верифікація якості reasoning в distilled model

    Три аспекти:
    1. Correctness — чи правильна відповідь
    2. Faithfulness — чи reasoning веде до відповіді
    3. Coherence — чи reasoning логічно зв'язний
    """

    def __init__(self, verifier_model=None):
        self.verifier = verifier_model

    def verify_correctness(self, prediction: str, gold: str) -> bool:
        """Просте порівняння відповіді"""
        pred_answer = self.extract_answer(prediction)
        gold_answer = self.extract_answer(gold)
        return self.normalize(pred_answer) == self.normalize(gold_answer)

    def verify_faithfulness(self, reasoning: str, answer: str) -> float:
        """
        Чи reasoning дійсно веде до answer?

        Faithfulness score: P(answer | reasoning) vs P(answer | question only)
        """

        if self.verifier is None:
            return self.heuristic_faithfulness(reasoning, answer)

        prompt = f"""Given this reasoning, does it logically lead to the stated answer?

Reasoning: {reasoning}
Stated Answer: {answer}

Rate faithfulness from 0 to 1:"""

        score = self.verifier.generate(prompt)
        return float(score)

    def verify_coherence(self, reasoning: str) -> float:
        """
        Чи reasoning логічно зв'язний?

        Перевіряємо:
        - Кожен step має сенс
        - Steps логічно пов'язані
        - Немає суперечностей
        """

        steps = self.parse_steps(reasoning)

        if len(steps) < 2:
            return 1.0  # Single step is trivially coherent

        coherence_scores = []

        for i in range(1, len(steps)):
            prev_step = steps[i-1]
            curr_step = steps[i]

            # Check if current step follows from previous
            prompt = f"""Does step {i+1} logically follow from step {i}?

Step {i}: {prev_step}
Step {i+1}: {curr_step}

Rate logical connection (0-1):"""

            score = self.verifier.generate(prompt)
            coherence_scores.append(float(score))

        return sum(coherence_scores) / len(coherence_scores)

    def comprehensive_evaluation(self, model, test_data: list[dict]) -> dict:
        """Повна оцінка reasoning quality"""

        results = {
            "correctness": [],
            "faithfulness": [],
            "coherence": [],
            "avg_steps": []
        }

        for item in test_data:
            prediction = model.generate(item["question"])

            results["correctness"].append(
                self.verify_correctness(prediction, item["answer"])
            )
            results["faithfulness"].append(
                self.verify_faithfulness(prediction, item["answer"])
            )
            results["coherence"].append(
                self.verify_coherence(prediction)
            )
            results["avg_steps"].append(
                len(self.parse_steps(prediction))
            )

        return {
            "accuracy": sum(results["correctness"]) / len(results["correctness"]),
            "avg_faithfulness": sum(results["faithfulness"]) / len(results["faithfulness"]),
            "avg_coherence": sum(results["coherence"]) / len(results["coherence"]),
            "avg_reasoning_steps": sum(results["avg_steps"]) / len(results["avg_steps"])
        }

Contrastive Reasoning Distillation

Вчимо модель розрізняти хороший і поганий reasoning:

class ContrastiveReasoningDistillation:
    """
    Contrastive learning для reasoning

    Ідея: показувати і правильний, і неправильний reasoning
    Student вчиться розрізняти
    """

    def generate_contrastive_pairs(self, question: str,
                                   correct_answer: str,
                                   teacher_model) -> dict:
        """Генеруємо позитивний і негативний приклади"""

        # Positive: correct reasoning
        positive = self.generate_correct_reasoning(
            teacher_model, question, correct_answer
        )

        # Negative: plausible but incorrect reasoning
        negative = self.generate_incorrect_reasoning(
            teacher_model, question, correct_answer
        )

        return {
            "question": question,
            "positive": positive,
            "negative": negative,
            "correct_answer": correct_answer
        }

    def generate_incorrect_reasoning(self, model, question: str,
                                     correct_answer: str) -> str:
        """
        Генеруємо plausible але неправильний reasoning

        Типи помилок:
        1. Arithmetic error
        2. Wrong operation
        3. Missing step
        4. Wrong interpretation
        """

        prompt = f"""Generate a plausible but INCORRECT reasoning for this problem.
The reasoning should look convincing but have a subtle error.

Problem: {question}
(The correct answer is {correct_answer}, but generate reasoning that leads to a DIFFERENT answer)

Incorrect reasoning:"""

        return model.generate(prompt)

    def contrastive_loss(self, student, positive: str, negative: str,
                         margin: float = 1.0) -> torch.Tensor:
        """
        Contrastive loss: push apart good/bad reasoning

        score(positive) - score(negative) > margin
        """

        pos_score = student.score_reasoning(positive)
        neg_score = student.score_reasoning(negative)

        loss = torch.relu(margin - (pos_score - neg_score))

        return loss

    def train_with_contrastive(self, student, contrastive_data: list[dict]):
        """Training loop з contrastive learning"""

        for item in contrastive_data:
            # Standard LM loss на positive
            lm_loss = student.compute_lm_loss(
                item["question"],
                item["positive"]
            )

            # Contrastive loss
            contrast_loss = self.contrastive_loss(
                student,
                item["positive"],
                item["negative"]
            )

            # Combined
            total_loss = lm_loss + 0.1 * contrast_loss
            total_loss.backward()

Specialized Domain Distillation

Reasoning distillation для specific domains:

class DomainSpecificDistillation:
    """
    Distillation для конкретних доменів

    Кожен domain має свої reasoning patterns:
    - Math: symbolic manipulation
    - Code: execution traces
    - Legal: precedent-based reasoning
    - Medical: differential diagnosis
    """

    DOMAIN_PROMPTS = {
        "math": """You are a mathematician. Solve using formal notation.
Show each algebraic step clearly. Verify your answer.""",

        "code": """You are a programmer. Think about:
1. Edge cases
2. Time/space complexity
3. Test with examples
Write clean, documented code.""",

        "legal": """You are a legal expert. Consider:
1. Relevant statutes and precedents
2. Arguments for both sides
3. Standard of proof
Cite sources.""",

        "medical": """You are a physician. For diagnosis:
1. List symptoms
2. Consider differential diagnoses
3. Recommend tests
4. Explain reasoning
Never give medical advice."""
    }

    def domain_specific_dataset(self, domain: str,
                                 questions: list[str],
                                 teacher_model) -> list[dict]:
        """Генеруємо domain-specific reasoning traces"""

        system_prompt = self.DOMAIN_PROMPTS[domain]
        dataset = []

        for question in questions:
            response = teacher_model.generate(
                prompt=question,
                system_prompt=system_prompt,
                temperature=0.3  # Lower for consistency
            )

            dataset.append({
                "domain": domain,
                "question": question,
                "reasoning": response,
                "answer": self.extract_answer(response)
            })

        return dataset


class MathReasoningDistillation(DomainSpecificDistillation):
    """Спеціалізована distillation для математики"""

    def generate_math_trace(self, problem: str, teacher) -> dict:
        """Генеруємо math reasoning trace з verification"""

        # Generate solution
        solution = teacher.generate(
            f"Solve step by step: {problem}",
            system_prompt=self.DOMAIN_PROMPTS["math"]
        )

        # Verify with symbolic computation (якщо можливо)
        verification = self.verify_math(problem, solution)

        return {
            "problem": problem,
            "solution": solution,
            "verified": verification["correct"],
            "verification_details": verification
        }

    def verify_math(self, problem: str, solution: str) -> dict:
        """Верифікація математичного рішення"""
        # Could use SymPy, WolframAlpha API, etc.
        return {"correct": True, "method": "symbolic"}


class CodeReasoningDistillation(DomainSpecificDistillation):
    """Спеціалізована distillation для коду"""

    def generate_code_trace(self, problem: str, teacher) -> dict:
        """Генеруємо code reasoning з execution"""

        # Generate solution
        solution = teacher.generate(
            f"Write code to solve: {problem}",
            system_prompt=self.DOMAIN_PROMPTS["code"]
        )

        # Extract code and execute
        code = self.extract_code(solution)
        execution = self.execute_safely(code)

        return {
            "problem": problem,
            "solution": solution,
            "code": code,
            "execution_result": execution,
            "verified": execution["success"]
        }

Benchmark Results

Порівняння методів на різних benchmarks:

|--------|-----------|-------|------|-----|------------|

| Baseline 7B | 7B | 11% | 4% | 52% | 61% |

| CoT Distillation | 7B | 48% | 18% | 68% | 72% |

| Step-by-Step | 7B | 53% | 22% | 71% | 75% |

| STaR | 7B | 56% | 24% | 73% | 76% |

| Orca-style | 13B | 62% | 29% | 78% | 81% |

| GPT-4 (teacher) | ~1.7T | 92% | 42% | 95% | 89% |

Key insight: 7B модель може досягти 50-70% performance GPT-4 на reasoning tasks.

Практичний Pipeline

End-to-end reasoning distillation:

class ReasoningDistillationPipeline:
    """Повний pipeline для reasoning distillation"""

    def __init__(self,
                 teacher_api: str,  # "openai" or "anthropic"
                 student_model: str = "mistral-7b",
                 domain: str = "general"):

        self.teacher = self.init_teacher(teacher_api)
        self.student = self.init_student(student_model)
        self.domain = domain

    def run(self,
            questions: list[str],
            output_dir: str,
            num_epochs: int = 3):
        """Повний pipeline"""

        # 1. Generate reasoning traces
        print("Generating reasoning traces...")
        traces = self.generate_traces(questions)

        # 2. Filter and clean
        print("Filtering traces...")
        cleaned = self.filter_traces(traces)

        # 3. Progressive training
        print("Training student...")
        self.progressive_train(cleaned, num_epochs)

        # 4. Evaluate
        print("Evaluating...")
        results = self.evaluate()

        # 5. Save
        self.save(output_dir, results)

        return results

    def generate_traces(self, questions: list[str]) -> list[dict]:
        """Step 1: Generate from teacher"""
        traces = []

        for q in tqdm(questions):
            for _ in range(3):  # Multiple samples
                trace = self.teacher.generate_reasoning(q)
                traces.append({
                    "question": q,
                    "trace": trace
                })

        return traces

    def filter_traces(self, traces: list[dict]) -> list[dict]:
        """Step 2: Quality filtering"""

        filtered = []

        for trace in traces:
            # Check answer consistency
            if not self.check_answer_format(trace["trace"]):
                continue

            # Check reasoning quality
            if self.count_steps(trace["trace"]) < 2:
                continue

            # Check for hallucinations (basic)
            if self.detect_hallucination(trace["trace"]):
                continue

            filtered.append(trace)

        return filtered

    def progressive_train(self, data: list[dict], epochs: int):
        """Step 3: Progressive curriculum training"""

        # Sort by complexity
        sorted_data = sorted(data, key=lambda x: len(x["trace"]))

        # Split into levels
        n = len(sorted_data)
        levels = [
            sorted_data[:n//3],      # Easy
            sorted_data[n//3:2*n//3], # Medium
            sorted_data[2*n//3:]      # Hard
        ]

        for level_idx, level_data in enumerate(levels):
            print(f"Training on level {level_idx + 1}...")

            for epoch in range(epochs):
                self.train_epoch(level_data)

    def evaluate(self) -> dict:
        """Step 4: Comprehensive evaluation"""

        # Load test sets
        gsm8k = self.load_benchmark("gsm8k")
        math = self.load_benchmark("math")

        return {
            "gsm8k_accuracy": self.eval_on_benchmark(gsm8k),
            "math_accuracy": self.eval_on_benchmark(math),
            "avg_reasoning_steps": self.measure_reasoning_depth(),
            "faithfulness": self.measure_faithfulness()
        }

Що не distill-иться

Обмеження reasoning distillation:

class DistillationLimitations:
    """
    Що НЕ можна ефективно передати через distillation
    """

    LIMITATIONS = {
        "world_knowledge": """
            Фактичні знання потребують параметрів.
            Можна distill reasoning patterns,
            не можна distill "хто написав Гамлета".
        """,

        "context_length": """
            Architectural constraint.
            Якщо teacher має 128K context, а student 4K,
            long-context reasoning не distill-иться.
        """,

        "very_complex_reasoning": """
            10+ step reasoning все ще проблема.
            Помилки накопичуються.
            Smaller model має менше "headroom" для error correction.
        """,

        "creative_novel_reasoning": """
            Нові підходи до проблем.
            Student може імітувати patterns, не винаходити нові.
        """,

        "meta_reasoning": """
            Reasoning про reasoning.
            "Чому цей підхід кращий?" важко distill.
        """
    }

    @staticmethod
    def what_distills_well():
        return [
            "Arithmetic reasoning patterns",
            "Logical inference templates",
            "Step decomposition strategies",
            "Verification habits",
            "Common problem-solving heuristics"
        ]

Ідеї для дослідження

Для бакалаврської роботи:

Distill GPT-4 reasoning на GSM8K, порівняти з baseline fine-tuning
Аналіз: на яких типах задач distillation працює найкраще
Візуалізація reasoning quality до/після distillation

Для магістерської:

Multi-step reasoning distillation з verification
Domain-specific distillation (code, legal, medical)
Contrastive reasoning distillation
Порівняння різних teacher models (GPT-4 vs Claude vs Gemini)

Для PhD:

Theoretical bounds на reasoning transfer
Що робить reasoning "distillable"?
Novel distillation objectives beyond imitation
Compositional reasoning distillation
Self-improving reasoning через iterative distillation

Чому це практично важливо

GPT-4 коштує $0.06/1K tokens (input). Для production з мільйонами запитів — це десятки тисяч доларів на день. Self-hosted fine-tuned 7B: ~$0.001/1K tokens.

Reasoning distillation дозволяє:

Отримати 50-70% performance GPT-4 на reasoning
За 1-2% вартості
З можливістю self-hosting (privacy, control)
Без залежності від external API

Це не компроміс якості. Це практична економіка AI deployment.

Для тих, хто готує наукову роботу з reasoning distillation — від курсової до дисертації — команда SKP-Degree на skp-degree.com.ua допоможе з дослідженням та реалізацією. Пишіть у Telegram: @kursovi_diplomy — маємо досвід роботи з Orca, WizardMath та власними distillation pipelines.

Ключові слова: reasoning distillation, knowledge distillation, chain-of-thought, CoT, STaR, Orca, small models, fine-tuning, LLM efficiency, edge deployment, наукова робота, дипломна, магістерська, курсова.

Reasoning Distillation: як навчити маленьку модель думати як велику

Класична vs Reasoning Distillation

Методи Reasoning Distillation

Верифікація Reasoning

Contrastive Reasoning Distillation

Specialized Domain Distillation

Benchmark Results

Практичний Pipeline

Що не distill-иться

Ідеї для дослідження

Чому це практично важливо

Про автора

Команда SKP-Degree

Поділитися

Схожі статті

AI Agents та автономне кодування у 2026 році: повний гайд для розробників і студентів

Digital Twins + AI: симуляція реального світу нейромережами

Computer Vision з OpenCV та YOLO

Потрібна допомога з роботою?