Metadata-Version: 2.1
Name: buildNanoGPT
Version: 0.0.8
Summary: A template for nbdev-based project
Home-page: https://github.com/hdocmsu/buildNanoGPT/
Author: Hung Do, PhD
Author-email: clinicalcollaborations@gmail.com
License: Apache Software License 2.0
Keywords: nbdev
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: matplotlib
Provides-Extra: dev

# buildNanoGPT


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

> `buildNanoGPT` is developed based on Andrej Karpathy’s
> [build-nanoGPT](https://github.com/karpathy/build-nanoGPT) repo and
> [Let’s reproduce GPT-2
> (124M)](https://www.youtube.com/watch?v=l8pRSuU81PU) with added notes
> and details for teaching purposes using
> [nbdev](https://nbdev.fast.ai/), which enables package development,
> testing, documentation, and dissemination all in one place - Jupyter
> Notebook or Visual Studio Code Jupyter Notebook in my case 😄.

## Literate Programming

`buildNanoGPT`

``` mermaid
flowchart LR
  A(Andrej's build-nanoGPT) --> C((Combination))
  B(Jeremy's nbdev) --> C
  C -->|Literate Programming| D(buildNanoGPT)
```

`micrograd2023`

<img src='media/literate_programming.svg' width=100% height=auto >

## Disclaimers

`buildNanoGPT` is written based on [Andrej
Karpathy’s](https://karpathy.ai/)
[build-nanoGPT](https://github.com/karpathy/makemore) and his [“Neural
Networks: Zero to
Hero”](https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ)
lecture series. Andrej is the man who needs no introduction in the field
of Deep Learning.

Andrej released a series of lectures called [Neural Network: Zero to
Hero](https://karpathy.ai/zero-to-hero.html), which I found extremely
educational and practical. I am reviewing the lectures and creating
notes for myself and for teaching purposes.

I developed `buildNanoGPT` using [nbdev](https://nbdev.fast.ai/), which
was developed by [Jeremy Howard](https://jeremy.fast.ai/), the man who
also needs no introduction in the field of Deep Learning. Jeremy also
created `fastai` Deep Learning software [library](https://docs.fast.ai/)
and [Courses](https://course.fast.ai/) that are extremely influential. I
highly recommend `fastai` if you are interested in starting your journey
and learning with ML and DL.

`nbdev` is a powerful tool that can be used to efficiently develop,
build, test, document, and distribute software packages all in one
place, Jupyter Notebook or Jupyter Notebooks in VS Code, which I am
using.

If you study lectures by Andrej and Jeremy you will probably notice that
they are both great educators and utilize both top-down and bottom-up
approaches in their teaching, but Andrej predominantly uses *bottom-up*
approach while Jeremy predominantly uses *top-down* one. I personally
fascinated by both educators and found values from both of them and hope
you are too!

## Usage

### Prepare FineWeb-Edu-10B data

``` python
from buildNanoGPT import data
import tiktoken
import numpy as np
```

``` python
enc = tiktoken.get_encoding("gpt2")
eot = enc._special_tokens['<|endoftext|>'] # end of text token
eot
```

    50256

``` python
t_ref = [eot]
t_ref.extend(enc.encode("Hello, world!"))
t_ref = np.array(t_ref).astype(np.uint16)
t_ref
```

    array([50256, 15496,    11,   995,     0], dtype=uint16)

``` python
t_ref = [eot]
t_ref.extend(enc.encode("Hello, world!"))
t_ref = np.array(t_ref).astype(np.int32)
t_ref
```

    array([50256, 15496,    11,   995,     0], dtype=int32)

``` python
doc = {"text":"Hello, world!"}
t_test = data.tokenize(doc)
t_test
```

    array([50256, 15496,    11,   995,     0], dtype=uint16)

``` python
assert np.all(t_ref == t_test)
```

``` python
# Download and Prepare the FineWeb-Edu-10B sample Data
data.edu_fineweb10B_prep(is_test=True)
```

    Resolving data files:   0%|          | 0/1630 [00:00<?, ?it/s]

    Loading dataset shards:   0%|          | 0/98 [00:00<?, ?it/s]

    'Hello from `prepare_edu_fineweb10B()`! if you want to download the dataset, set is_test=False and run again.'

### Prepare HellaSwag Evaluation data

``` python
data.hellaswag_val_prep(is_test=True)
```

    'Hello from `hellaswag_val_prep()`! if you want to download the dataset, set is_test=False and run again.'

### Training

``` python
# either running 03_train.ipynb or short-cut by running train script from the buildNanoGPT package
from buildNanoGPT import train
```

    using device: cuda
    total desired batch size: 524288
    => calculated gradient accumulation steps: 32
    found 99 shards for split train
    found 1 shards for split val
    num decayed parameter tensors: 50, with 124,354,560 parameters
    num non-decayed parameter tensors: 98, with 121,344 parameters
    using fused AdamW: True
    validation loss: 10.9834
    HellaSwag accuracy: 2534/10042=0.2523
    step     0 | loss: 10.981724 | lr 6.0000e-06 | norm: 15.4339 | dt: 82809.98ms | tok/sec: 6331.22
    step     1 | loss: 10.655205 | lr 1.2000e-05 | norm: 12.4931 | dt: 10492.83ms | tok/sec: 49966.29
    step     2 | loss: 10.274603 | lr 1.8000e-05 | norm: 7.7501 | dt: 10522.88ms | tok/sec: 49823.61
    step     3 | loss: 10.004156 | lr 2.4000e-05 | norm: 5.2698 | dt: 10481.91ms | tok/sec: 50018.35
    step     4 | loss: 9.833108 | lr 3.0000e-05 | norm: 3.6179 | dt: 10495.18ms | tok/sec: 49955.14
    step     5 | loss: 9.711222 | lr 3.6000e-05 | norm: 2.7871 | dt: 10484.25ms | tok/sec: 50007.21
    step     6 | loss: 9.642426 | lr 4.2000e-05 | norm: 2.4048 | dt: 10679.06ms | tok/sec: 49094.97
    step     7 | loss: 9.612312 | lr 4.8000e-05 | norm: 2.3183 | dt: 10555.78ms | tok/sec: 49668.32
    step     8 | loss: 9.558184 | lr 5.4000e-05 | norm: 2.2464 | dt: 10685.39ms | tok/sec: 49065.86
    step     9 | loss: 9.526472 | lr 6.0000e-05 | norm: 2.2171 | dt: 10548.39ms | tok/sec: 49703.14
    step    10 | loss: 9.463450 | lr 6.6000e-05 | norm: 2.1546 | dt: 10559.73ms | tok/sec: 49649.78
    step    11 | loss: 9.413282 | lr 7.2000e-05 | norm: 2.1401 | dt: 10495.94ms | tok/sec: 49951.49
    step    12 | loss: 9.340552 | lr 7.8000e-05 | norm: 2.0149 | dt: 10668.78ms | tok/sec: 49142.26
    step    13 | loss: 9.278631 | lr 8.4000e-05 | norm: 1.9368 | dt: 10605.16ms | tok/sec: 49437.05
    step    14 | loss: 9.159446 | lr 9.0000e-05 | norm: 1.9737 | dt: 10701.77ms | tok/sec: 48990.76
    step    15 | loss: 9.111786 | lr 9.6000e-05 | norm: 3.0525 | dt: 10732.83ms | tok/sec: 48849.00
    step    16 | loss: 9.029915 | lr 1.0200e-04 | norm: 1.9619 | dt: 10790.65ms | tok/sec: 48587.23
    step    17 | loss: 8.937255 | lr 1.0800e-04 | norm: 1.8786 | dt: 10621.46ms | tok/sec: 49361.22
    step    18 | loss: 8.955976 | lr 1.1400e-04 | norm: 2.0179 | dt: 10545.33ms | tok/sec: 49717.53
    step    19 | loss: 8.888343 | lr 1.2000e-04 | norm: 1.9142 | dt: 10598.08ms | tok/sec: 49470.11
    step    20 | loss: 8.672051 | lr 1.2600e-04 | norm: 1.7543 | dt: 10730.04ms | tok/sec: 48861.68
    step    21 | loss: 8.556496 | lr 1.3200e-04 | norm: 1.6246 | dt: 10822.08ms | tok/sec: 48446.13
    step    22 | loss: 8.463942 | lr 1.3800e-04 | norm: 1.4898 | dt: 10733.11ms | tok/sec: 48847.72
    step    23 | loss: 8.389053 | lr 1.4400e-04 | norm: 1.9412 | dt: 10555.51ms | tok/sec: 49669.61
    step    24 | loss: 8.257857 | lr 1.5000e-04 | norm: 2.0539 | dt: 10732.67ms | tok/sec: 48849.75
    step    25 | loss: 8.128786 | lr 1.5600e-04 | norm: 1.4269 | dt: 10609.93ms | tok/sec: 49414.84
    step    26 | loss: 8.098352 | lr 1.6200e-04 | norm: 2.0206 | dt: 10487.59ms | tok/sec: 49991.30
    step    27 | loss: 7.961097 | lr 1.6800e-04 | norm: 1.2978 | dt: 10578.22ms | tok/sec: 49562.95
    step    28 | loss: 7.884172 | lr 1.7400e-04 | norm: 1.2289 | dt: 10497.51ms | tok/sec: 49944.04
    step    29 | loss: 7.765845 | lr 1.8000e-04 | norm: 1.1969 | dt: 10724.78ms | tok/sec: 48885.65
    step    30 | loss: 7.821087 | lr 1.8600e-04 | norm: 1.0228 | dt: 10792.80ms | tok/sec: 48577.58
    step    31 | loss: 7.689835 | lr 1.9200e-04 | norm: 0.9216 | dt: 10752.80ms | tok/sec: 48758.30
    step    32 | loss: 7.641486 | lr 1.9800e-04 | norm: 0.8666 | dt: 10985.01ms | tok/sec: 47727.58
    step    33 | loss: 7.572504 | lr 2.0400e-04 | norm: 0.7996 | dt: 10684.39ms | tok/sec: 49070.46
    step    34 | loss: 7.429519 | lr 2.1000e-04 | norm: 0.7874 | dt: 10696.01ms | tok/sec: 49017.15
    step    35 | loss: 7.414855 | lr 2.1600e-04 | norm: 0.7272 | dt: 10580.76ms | tok/sec: 49551.08
    step    36 | loss: 7.393157 | lr 2.2200e-04 | norm: 0.8536 | dt: 10748.95ms | tok/sec: 48775.74
    step    37 | loss: 7.287198 | lr 2.2800e-04 | norm: 0.5487 | dt: 10921.08ms | tok/sec: 48006.98
    step    38 | loss: 7.252760 | lr 2.3400e-04 | norm: 0.4738 | dt: 10716.44ms | tok/sec: 48923.69
    step    39 | loss: 7.292991 | lr 2.4000e-04 | norm: 0.5769 | dt: 10659.42ms | tok/sec: 49185.43
    step    40 | loss: 7.251584 | lr 2.4600e-04 | norm: 0.9509 | dt: 10570.06ms | tok/sec: 49601.22
    step    41 | loss: 7.209351 | lr 2.5200e-04 | norm: 1.7773 | dt: 10611.45ms | tok/sec: 49407.78
    step    42 | loss: 7.140303 | lr 2.5800e-04 | norm: 0.9441 | dt: 10753.44ms | tok/sec: 48755.36
    step    43 | loss: 7.216593 | lr 2.6400e-04 | norm: 2.1513 | dt: 10632.68ms | tok/sec: 49309.09
    step    44 | loss: 7.155683 | lr 2.7000e-04 | norm: 1.3599 | dt: 10780.88ms | tok/sec: 48631.27
    step    45 | loss: 7.159153 | lr 2.7600e-04 | norm: 1.1990 | dt: 10722.27ms | tok/sec: 48897.11
    step    46 | loss: 7.126624 | lr 2.8200e-04 | norm: 0.8272 | dt: 10791.48ms | tok/sec: 48583.50
    step    47 | loss: 7.190242 | lr 2.8800e-04 | norm: 0.9578 | dt: 10718.49ms | tok/sec: 48914.35
    step    48 | loss: 7.194102 | lr 2.9400e-04 | norm: 0.7273 | dt: 10651.67ms | tok/sec: 49221.22
    step    49 | loss: 7.113352 | lr 3.0000e-04 | norm: 1.1239 | dt: 10732.94ms | tok/sec: 48848.51
    step    50 | loss: 7.169769 | lr 3.0600e-04 | norm: 1.0528 | dt: 10706.81ms | tok/sec: 48967.72
    step    51 | loss: 7.103631 | lr 3.1200e-04 | norm: 1.0537 | dt: 10826.62ms | tok/sec: 48425.82
    step    52 | loss: 7.092214 | lr 3.1800e-04 | norm: 0.7355 | dt: 10777.80ms | tok/sec: 48645.18
    step    53 | loss: 7.021073 | lr 3.2400e-04 | norm: 0.8493 | dt: 10907.12ms | tok/sec: 48068.41
    step    54 | loss: 7.030515 | lr 3.3000e-04 | norm: 0.7924 | dt: 10822.94ms | tok/sec: 48442.27
    step    55 | loss: 7.027347 | lr 3.3600e-04 | norm: 0.8563 | dt: 10661.62ms | tok/sec: 49175.26
    step    56 | loss: 7.007086 | lr 3.4200e-04 | norm: 1.2067 | dt: 10764.39ms | tok/sec: 48705.77
    step    57 | loss: 6.978011 | lr 3.4800e-04 | norm: 0.5606 | dt: 10967.17ms | tok/sec: 47805.22
    step    58 | loss: 6.919628 | lr 3.5400e-04 | norm: 1.3408 | dt: 10802.21ms | tok/sec: 48535.23
    step    59 | loss: 6.887385 | lr 3.6000e-04 | norm: 1.3971 | dt: 10907.45ms | tok/sec: 48066.97
    step    60 | loss: 6.879627 | lr 3.6600e-04 | norm: 0.7581 | dt: 10768.36ms | tok/sec: 48687.80
    step    61 | loss: 6.906055 | lr 3.7200e-04 | norm: 0.9657 | dt: 10613.11ms | tok/sec: 49400.03
    step    62 | loss: 6.795964 | lr 3.7800e-04 | norm: 0.6819 | dt: 10593.62ms | tok/sec: 49490.92
    step    63 | loss: 6.780255 | lr 3.8400e-04 | norm: 0.7485 | dt: 10719.51ms | tok/sec: 48909.68
    step    64 | loss: 6.767306 | lr 3.9000e-04 | norm: 0.7399 | dt: 10806.62ms | tok/sec: 48515.44
    step    65 | loss: 6.801779 | lr 3.9600e-04 | norm: 0.7439 | dt: 10609.56ms | tok/sec: 49416.58
    step    66 | loss: 6.721136 | lr 4.0200e-04 | norm: 0.5727 | dt: 10749.83ms | tok/sec: 48771.73
    step    67 | loss: 6.750595 | lr 4.0800e-04 | norm: 0.7310 | dt: 10711.53ms | tok/sec: 48946.13
    step    68 | loss: 6.730660 | lr 4.1400e-04 | norm: 0.5052 | dt: 10772.71ms | tok/sec: 48668.16
    step    69 | loss: 6.631037 | lr 4.2000e-04 | norm: 0.6577 | dt: 10736.56ms | tok/sec: 48832.04
    step    70 | loss: 6.612390 | lr 4.2600e-04 | norm: 0.6208 | dt: 10598.25ms | tok/sec: 49469.31
    step    71 | loss: 6.643014 | lr 4.3200e-04 | norm: 0.6751 | dt: 10712.97ms | tok/sec: 48939.57
    step    72 | loss: 6.602534 | lr 4.3800e-04 | norm: 0.8274 | dt: 10685.25ms | tok/sec: 49066.50
    step    73 | loss: 6.606695 | lr 4.4400e-04 | norm: 1.0497 | dt: 10784.33ms | tok/sec: 48615.72
    step    74 | loss: 6.532132 | lr 4.5000e-04 | norm: 0.9483 | dt: 11051.53ms | tok/sec: 47440.31
    step    75 | loss: 6.571723 | lr 4.5600e-04 | norm: 0.5493 | dt: 10943.98ms | tok/sec: 47906.50
    step    76 | loss: 6.519442 | lr 4.6200e-04 | norm: 0.6364 | dt: 11138.90ms | tok/sec: 47068.20
    step    77 | loss: 6.553431 | lr 4.6800e-04 | norm: 0.6423 | dt: 10943.91ms | tok/sec: 47906.81
    step    78 | loss: 6.525961 | lr 4.7400e-04 | norm: 0.4541 | dt: 10733.66ms | tok/sec: 48845.21
    step    79 | loss: 6.474160 | lr 4.8000e-04 | norm: 0.6690 | dt: 10748.03ms | tok/sec: 48779.93
    step    80 | loss: 6.481711 | lr 4.8600e-04 | norm: 0.5859 | dt: 10679.49ms | tok/sec: 49093.00
    step    81 | loss: 6.486966 | lr 4.9200e-04 | norm: 0.6897 | dt: 10656.78ms | tok/sec: 49197.58
    step    82 | loss: 6.430150 | lr 4.9800e-04 | norm: 0.6284 | dt: 10426.83ms | tok/sec: 50282.59
    step    83 | loss: 6.387268 | lr 5.0400e-04 | norm: 0.5746 | dt: 10644.15ms | tok/sec: 49255.97
    step    84 | loss: 6.405340 | lr 5.1000e-04 | norm: 0.5523 | dt: 10856.28ms | tok/sec: 48293.53
    step    85 | loss: 6.371199 | lr 5.1600e-04 | norm: 0.6764 | dt: 10573.15ms | tok/sec: 49586.76
    step    86 | loss: 6.367082 | lr 5.2200e-04 | norm: 0.7355 | dt: 10731.52ms | tok/sec: 48854.94
    step    87 | loss: 6.404164 | lr 5.2800e-04 | norm: 0.7907 | dt: 10878.82ms | tok/sec: 48193.45
    step    88 | loss: 6.383866 | lr 5.3400e-04 | norm: 0.7472 | dt: 10855.23ms | tok/sec: 48298.20

## How to install

The [buildNanoGPT](https://pypi.org/project/buildNanoGPT/) package was
uploaded to [PyPI](https://pypi.org/) and can be easily installed using
the below command.

`pip install buildNanoGPT`

### Developer install

If you want to develop `buildNanoGPT` yourself, please use an editable
installation.

`git clone https://github.com/hdocmsu/buildNanoGPT.git`

`pip install -e "buildNanoGPT[dev]"`

You also need to use an editable installation of
[nbdev](https://github.com/fastai/nbdev),
[fastcore](https://github.com/fastai/fastcore), and
[execnb](https://github.com/fastai/execnb).

Happy Coding!!!

<div class="alert alert-info">

<b>Note:</b> `buildNanoGPT` is currently Work in Progress (WIP).

</div>
