Personal project

LLM-V1.0 — Small-Scale Language Model

2025–2026

Projects

Designed and trained a ~30M-parameter GPT-style transformer (8 layers, 8 heads, 512-dim embeddings, 8,192-token BPE vocab).

LLM-V1.0 is a ~30M-parameter GPT-style transformer trained in PyTorch—8 layers, 8 heads, 512-dim embeddings, 8,192-token BPE vocabulary.

Checkpointing and text-generation pipelines support both Colab and local training workflows.

Journal

What I worked on during 2025–2026. Hover underlined terms for quick definitions.

Architecture choices

At ~30M parameters, LLM-V1.0 is intentionally trainable on student hardware. Eight transformer layers with eight heads and 512-dimensional embeddings balance capacity vs. overfit risk on modest corpora.

An 8,192-token BPE vocabulary keeps embedding tables manageable while covering technical tokens I care about.

Training loop & checkpointing

PyTorch handles the loop: forward, loss, backward, optimizer step, periodic eval. Checkpointing is not optional when Colab sessions die mid-epoch.

I logged loss curves and sample generations per checkpoint to catch collapse early—garbled output at step 5k saved days of blind training.

Generation pipelines

Text generation scripts wrap temperature, top-k, and stop conditions for interactive probing. The goal was not SOTA benchmarks—it was a end-to-end understanding of autoregressive decoding I could run locally.

LLM-V1.0 sits on my CV next to SpaAIder and pneumonia work as evidence I build models, not only orchestrate them.

Related highlights

Personal project

SpaAIder — Agentic Runtime

Local-first multi-agent runtime with persistent cross-session memory, heterogeneous clients, and Ollama-first sovereign inference.

Personal project

Pneumonia Detection from Chest X-Rays

TensorFlow pipeline with EfficientNetV2B0 transfer learning—83% validation accuracy on 368 held-out radiographs.

All highlights