Efficiently Fine-Tuning Small Language Models with Python in 2025

Sigal Shaked

Language: Hebrew

The presentation was given on 2025.09.09 at PyCon Israel 2025 - Conference.

Learn how to fine-tune small language models efficiently using modern Python tools like Axolotl. A practical, GPU- conscious guide to customizing LLMs with QLoRA, chat templates, dataset chunking, and cloud-friendly workflows.

This 20-minute light talk walks through a real-world fine-tuning pipeline built entirely in Python. You'll learn how to structure and run scalable fine-tuning jobs, even on limited hardware like Colab or cloud GPU services like RunPod. Topics include: • Why full fine-tuning is dead: a quick look at parameter-efficient approaches (like QLoRA) • How Axolotl simplifies model loading, LoRA injection, and dataset prep • Managing training across large datasets using chunked fine-tuning • Moving beyond Colab: when and how to scale to multi-GPU training with DeepSpeed • Performing inference on your fine-tuned model with minimal setup No prior ML experience needed — just some Python familiarity and curiosity about LLMs.