Skip to content

44David/OpenLM

Repository files navigation

A from scratch implementation of a decoder-only transformer based language model. You can find my articles about this project here: https://davids.bearblog.dev/mathematical-foundation-of-self-attention/ https://davids.bearblog.dev/mathematical-and-architectural-analysis-of-decoder-only-transformers/

This project gave me insight and perspective on language models, and made me realize how much data is required and how compute intensive language is as a task. As language must not only be semantically and grammatically correct, but also extremely concise and informational.

Some stats about one of the completed training runs: image

Visual reprsentation of how attention mechanisms work: image
Attention mechanisms on images, the white blur shows where the model attends to.
Image credits: https://arxiv.org/abs/1502.03044


image
Attention mechanisms working on next word prediction.
Image Credits: https://arxiv.org/abs/1706.03762

About

GPT-2 style language model all from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages