How Small Can Language Models Be and Still Speak Coherent English?

Jul 28, 2023

This morning I found research named "TinyStories: How Small Can Language Models Be and Still Speak Coherent English?". It was done in May of 2023. It is pretty interesting.

TinyStories is a unique synthetic dataset comprised of short, easy-to-understand stories crafted by GPT-3.5 and GPT-4. These stories contain only words familiar to 3 to 4-year-old children, providing a different angle to test the capabilities of Small Language Models (SLMs). Although SLMs are much more compact than the latest models, the study shows that when trained with TinyStories, these SLMs can produce fluent, coherent, and diverse multi-paragraph stories demonstrating near-perfect grammar and reasoning skills.

The study emphasizes the potential of TinyStories as an instrument to probe the emergence of coherent text generation, reasoning, and instruction following in LMs, despite being much smaller in size, both in terms of model and dataset.

In addition to showcasing similar behaviors to Large Language Models (LLMs), such as scaling laws and trade-offs between width and depth, SLMs trained on TinyStories exhibited higher interpretability. Researchers could visualize and analyze their attention and activation patterns, furthering our understanding of story generation and comprehension processes.

The study also delves into the "creativity" of these models, raising questions about their proper understanding of the narratives they produce. The researchers hope to use TinyStories in future works to gain more insights into the degree of creativity in language models.

Another significant contribution of this study is the introduction of a new evaluation paradigm. This approach uses GPT-4 to grade content produced by these models, simulating a student-teacher dynamic and providing a multidimensional evaluation of different capabilities. This new evaluation overcomes standard benchmark limitations, is often too structured, and can be beneficial far beyond TinyStories.

Lastly, the researchers share initial findings on the roles of model width and depth in their intellectual capabilities, suggesting the importance of width for factual knowledge acquisition and depth for contextual tracking. They also hint at a sequence in the emergence of language capabilities in generative models, with grammatical and syntactic abilities surfacing before the ability to produce consistent text or creative content.

The TinyStories dataset presents the promising potential to spur further development and research in LMs, particularly for low-resource or specialized domains. The success of this project encourages further exploration into synthesizing refined datasets for practical applications such as training a customer service chatbot using a large dataset of hypothetical calls.

I found this research in another quite exciting project: Inference Llama 2 in one file of pure C.

Serge’s Substack

Discussion about this post