Recipe Generation with GPT-2

Audrey Leduc
6 min readJan 8, 2020

This article summarizes a portion of a project that was developed in collaboration with Carlos-Andrés Madrid as part of Laurent Charlin’s course on Machine Learning for Large-Scale Data Analysis and Decision Making at HEC Montréal.

Last September Carlos and I started searching for a fun machine learning project to tackle. We stumbled upon impressive novel Natural Language Generating (NLG) models: pre-trained neural networks that users can leverage to make text predictions without any additional training required (zero-shot learning). Their function is to predict the next word(s), given an input text. I became increasingly excited by their potential. I did not know it then, but a lot of people felt the same way: 2019 turned out to be the year of language models.

As we brainstormed ideas for NLG use cases we started to imagine a world where machines could create recipes based on a given set of ingredients (for example leftovers in your fridge). And so we started on our quest to train a model to generate cooking instructions. In this article, I walk you through the creation of a recipe generator and share my thoughts on the process.

You can find my code for the dataset preprocessing here and for the model fine-tuning here.

Model Choice

Initially I tried to play around with BERT, Google’s language model that was open sourced in November 2018, but the learning curve for the tools was quite steep.Then I found that OpenAI’s language model GPT-2, released in February 2019, was also high performing, thanks to its pre-training on 8 million web pages. Its biggest publicly-released version contains 345M parameters, whereas the full (limited release to scientific communities only) contains 1.5B parameters. The model is capable of completing scientific articles, short stories, (fake) news, assignments, and much more. At the time of writing this article, several models (i.e. [1], [2] & [3]) had been built to generate recipes, but none had used the GPT-2 architecture yet. The choice was made: GPT-2 would be the backbone of my recipe generator.

Data Preprocessing

This project makes use of Ryan Lee’s Recipe Box dataset which contains 125,000 scraped recipes from 3 websites (, and

The recipes are structured, but contain unwanted pieces for our experiment, such as advertisements, odd web scraping punctuation, and ingredient quantities. So, to prepare the data for training I preprocessed it using, among other codes, some of Ryan Lee’s Recipe Summarization tools. Stop words and quantities were removed, and ingredients were tokenized. A </>separator</> was also inserted between each recipe.


There are 2 ways to use GPT-2: out of the box (base model) or with fine-tuning. We tried both.

1. Base Model (Out of the Box)

The job of NLG models is to write texts that make sense and in comprehensive language. When they are used out of the box (zero-shot learning), users assume that the machine will ‘know’ how to generate text that follow the desired structure (a recipe in our case).

Among other more worrying things it can do, Talk to Transformer is a useful resource to test this idea by generating text without fine-tuning. Here the model is tested using the following input: “chocolate chips, eggs, sugar, butter”.


“chocolate chips, eggs, sugar, butter, oil and chocolate wafer crumbs.

Cream together the dry ingredients until smooth and fluffy, add the wet ingredients and stir until just combined. I usually let it stand for 30 minutes before I actually use it. The longer it is on the stove, the creamier and more dense the icing will be. Spread on a thin layer of chocolate chips.

Place on a plate and immediately drizzle with a little bit of melted chocolate. Sprinkle with powdered sugar if desired. Drink immediately or freeze”

We saw that some aspects of the task are already executed correctly, such as using somewhat correct grammar and respecting the topic. However the recipe sequence is not logical. For example it is not possible to cream together dry ingredients.

Now let’s see how this result compares with a recipe generated by a fine-tuned model.

2. Fine-Tuning Model on Dataset

NLG models might speak English, but they might not know how to cook yet. By using fine-tuning, we can show recipe examples to the model and train it to follow the structure of a recipe.

Since the GPT-2 release, the community has built great tools to make its fine-tuning more accessible. First N. Shepperd published a code repository to fine-tune the model. Then Max Woolf used Shepperd’s code as a base for a Python package that runs a Tensorflow session in the background, and a notebook that are both open sourced. Using these tools I was able to train the 124M parameter version of GPT-2 version on my dataset.

Text Generation

Keeping the same input as for the base model (“chocolate chips, eggs, sugar, butter”) the following recipe was generated:

“, ‘ all purpose flour ‘, ‘ ground cinnamon ‘, ‘ baking powder ‘, ‘ baking soda ‘, ‘ salt ‘, ‘ sugar ‘, ‘ milk ‘, ‘ vanilla extract ‘, ‘ ‘, ‘Preheat oven to 350 degrees F ( 175 degrees C ) . Grease and flour 2 9x5-inch loaf pans . Beat eggs and sugar together in a mixing bowl until blended . Add butter and beat until incorporated ‘]” “ add both sugars and beat until well mixed . Beat milk and vanilla extract together until smooth . Stir dry ingredients into beaten dry ingredients , alternating with milk mixture , until batter is smooth . Spoon batter into prepared loaf pans . Bake in preheated oven until a toothpick inserted in the center comes out clean , about 70 minutes . Cool in pan for 10 minutes before removing to cool completely on rack .”

In my opinion the generated recipe is closer to a real recipe, with only small sequences that seem less sensical, such as “add both sugars” and “Stir dry ingredients into beaten dry ingredients”. Another interesting fact to note is that the model learned the punctuation that was created during the tokenization process.

As a second example here’s what the machine made with rolled oats, walnuts, scallions, black beans and buns (ingredients originally from The Kitchn’s veggie burger recipe):

Preheat the oven to 350 degrees F. In an 8-inch nonstick skillet over medium-high heat , cook the oats , walnuts , and scallions until well toasted , about 2 minutes . Set aside . In a separate small bowl , combine the beans with the black beans . Pour 1/4 cup of the black beans onto a baking sheet . Bake until the beans are just soft , about 5 minutes . Remove from the oven and let cool until easy to handle , about 15 minutes . Slice the toasted oats and walnuts and arrange on a large platter . Serve hot .

Obviously this recipe generator still requires improvements both in the recipe logic and in the output formatting. One of the bigger challenges in this project is that the model does not always use all suggested ingredients, or it sometimes adds additional ingredients (that you might not have readily available in your fridge). In a future experiment, we could add </>separators</> between the ingredient list and the instructions to teach the model to use the given ingredient list as is.

This experiment hints that fine-tuning helped to improve the result. With additional pre-training (for example by scraping additional recipes from more cooking websites) I believe we would be able to generate highly plausible recipes. One fun idea that my colleague Abutaleb Haidary had would be to include recipes from different cultures, which could generate innovative fusion recipes!