Joshua Terranova
Back to highlights

Personal project

LLM-V1.0 — Small-Scale Language Model

2025–2026

Projects

Designed and trained a ~30M-parameter GPT-style transformer (8 layers, 8 heads, 512-dim embeddings, 8,192-token BPE vocab).

LLM-V1.0 is a ~30M-parameter GPT-style transformer trained in PyTorch—8 layers, 8 heads, 512-dim embeddings, 8,192-token BPE vocabulary.

Checkpointing and text-generation pipelines support both Colab and local training workflows.

Journal

What I worked on during 2025–2026. Hover underlined terms for quick definitions.

Architecture choices

At ~30M parameters, LLM-V1.0Small-scale transformer language model for local and Colab training is intentionally trainable on student hardware. Eight transformer layersStacked self-attention and feed-forward blocks—the core repeating unit in GPT-style models with eight heads and 512-dimensional embeddings balance capacity vs. overfit risk on modest corpora.

An 8,192-token BPE vocabularyByte-pair encoding tokenizer with 8k merge tokens—compresses text into subword units for open-vocabulary modeling keeps embedding tables manageable while covering technical tokens I care about.

Training loop & checkpointing

PyTorchPrimary deep learning framework for the training and generation codepaths handles the loop: forward, loss, backward, optimizer step, periodic eval. CheckpointingSaving model and optimizer state so training can resume after interruption is not optional when Colab sessions die mid-epoch.

I logged loss curves and sample generations per checkpoint to catch collapse early—garbled output at step 5k saved days of blind training.

Generation pipelines

Text generation scripts wrap temperature, top-k, and stop conditions for interactive probing. The goal was not SOTA benchmarks—it was a end-to-end understanding of autoregressive decodingPredicting one token at a time conditioned on all prior tokens—the inference pattern behind GPT-style models I could run locally.

LLM-V1.0 sits on my CV next to SpaAIder and pneumonia work as evidence I build models, not only orchestrate them.

Related highlights

All highlights