Build a Command-Line LLM with Python

Posted Aug 19, 2025

By Nathan Berg

6 min read

Creating your own large language model (LLM) is much easier than it sounds. In this guide you’ll train a small model and wire it into a simple chat interface you can run from any terminal.

I’ve always been fascinated by how systems like ChatGPT work under the hood. After tinkering with a few open-source models I realized you don’t need a server farm or a PhD to play with this technology. We’ll take a friendly, hands-on approach: set up a Python environment, grab a tiny dataset, and watch a miniature chatbot come to life. The goal isn’t to build something that will pass the Turing test, but to understand each moving piece well enough that you can keep tinkering and extending it later.

Prerequisites

A computer with Python 3.9+
pip for installing packages
At least 8 GB of RAM (training even small models is memory intensive)

Create a virtual environment so the required libraries don’t pollute other projects. A virtual environment acts like a clean room—when you leave it, your system Python remains untouched:

  
python3 -m venv llm-env
source llm-env/bin/activate  # On Windows use: llm-env\Scripts\activate
pip install --upgrade pip

Once inside the environment the prompt usually changes to show the environment name. It’s a small reminder that whatever you install now lives only inside llm-env.

Step 1 – Install Dependencies

We’ll rely on Hugging Face Transformers and the datasets library. Install them with pip:

pip install torch transformers datasets accelerate

Tip: If you have a GPU and the correct CUDA drivers, install torch from pytorch.org for huge speedups.

Here’s what these packages do:

torch – the deep learning engine powering the model’s math.
transformers – high level APIs for loading and training modern NLP models.
datasets – convenient access to thousands of public datasets.
accelerate – handles device placement so the same code can run on CPU, GPU, or even multiple machines.

Step 2 – Prepare Training Data

For a quick demonstration we’ll use the tiny_shakespeare dataset which contains 40k lines of text. The following script downloads the dataset and tokenizes it so the model can understand the text. Tokenization simply means turning raw words into numbers the model can process.

  
# prepare_data.py
from datasets import load_dataset
from transformers import AutoTokenizer

def get_dataset(tokenizer_name="gpt2"):
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
    dataset = load_dataset("tiny_shakespeare", split="train")
    def tokenize(batch):
        return tokenizer(batch["text"], truncation=True, padding="max_length", max_length=128)
    tokenized = dataset.map(tokenize, batched=True, remove_columns=["text"])
    return tokenized, tokenizer

Save the file as prepare_data.py. The helper returns a tokenized dataset and tokenizer object that we’ll reuse in training.

If you’d like to experiment with your own data later, replace tiny_shakespeare with another dataset or even a local text file. The tokenizer will happily crunch whatever you feed it as long as the lines aren’t too long.

Step 3 – Train a Small LLM

Create train_llm.py and add the following code. It fine‑tunes the 117M parameter GPT‑2 model for one epoch. The goal is to demonstrate the workflow, not to build the next ChatGPT. Each step is heavily commented so you can follow along even if you’ve never trained a model before.

  
# train_llm.py
from transformers import (AutoModelForCausalLM, Trainer, TrainingArguments,
                          DataCollatorForLanguageModeling)
from prepare_data import get_dataset

# Load data and tokenizer
train_dataset, tokenizer = get_dataset()

# Create model and data collator
model = AutoModelForCausalLM.from_pretrained("gpt2")
collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Training configuration
args = TrainingArguments(
    output_dir="./llm-model",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=2,
    save_total_limit=2,
    logging_steps=50,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    data_collator=collator,
)

trainer.train()

# Save the trained weights and tokenizer
model.save_pretrained("./llm-model")
tokenizer.save_pretrained("./llm-model")

Before running it, here’s a quick rundown of the most important settings:

output_dir – where checkpoints and the final model are stored.
num_train_epochs – how many passes we make over the dataset.
per_device_train_batch_size – how many samples to process at once; lower it if you run out of memory.
logging_steps – how often progress updates appear.

Run the training script:

python train_llm.py

One epoch on the tiny dataset should finish in a few minutes on a modern laptop. Larger datasets and models will take significantly longer.

If you’re curious about what’s happening during training, open another terminal and watch your CPU or GPU usage spike. The training loop feeds batches of tokenized text into the model, compares the predictions to the real next words, and nudges the model’s weights in the right direction.

Step 4 – Build a Terminal Chat Interface

With the model trained, create chat.py to provide a conversation loop. It loads the weights from the previous step and uses them to generate responses. We’ll keep things simple: the script stores the conversation history in a plain string and appends each user question before asking the model to continue the story.

  
# chat.py
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("./llm-model")
model = AutoModelForCausalLM.from_pretrained("./llm-model")

print("Type 'quit' to exit\n")
conversation = ""
while True:
    user = input("You: ")
    if user.strip().lower() == "quit":
        break
    conversation += f"User: {user}\nBot:"
    inputs = tokenizer.encode(conversation, return_tensors="pt")
    outputs = model.generate(inputs, max_length=inputs.shape[1] + 50,
                              do_sample=True, top_k=50, top_p=0.95)
    reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
    print(f"Bot: {reply}\n")
    conversation += f" {reply}\n"

Start chatting with your model:

python chat.py

Type a message, press Enter, and the model replies. Because the training dataset is small, the responses will be quirky, but the structure mirrors how large‑scale systems like ChatGPT work.

Feel free to dress up the interface however you like. You could add colors, store past conversations, or even expose the bot through a simple web server. The core idea is the same: provide context, ask the model to predict the next words, and show them to the user.

Bonus – Troubleshooting Tips

Model takes forever to respond? Try reducing max_length in generate or moving to a machine with a GPU.
Out of memory errors? Decrease per_device_train_batch_size or shorten the max_length when tokenizing.
Weird or repetitive outputs? That’s normal for such a tiny dataset. Feeding more varied training data helps the model learn better patterns.

Where to Go From Here

This walk‑through barely scratches the surface. To push further:

Collect or download a larger dataset relevant to your interests.
Increase the num_train_epochs or use a more capable base model such as gpt2-medium.
Explore PEFT or LoRA for efficient fine‑tuning.
Wrap the chat loop in a shell script or Textual app for a richer terminal experience.

With Python and a bit of curiosity you can craft your own miniature LLM and experiment with how conversational systems work under the hood. The best way to learn is by doing, so keep iterating, break things, and share what you build. Happy hacking!

blog

This post is licensed under CC BY 4.0 by the author.