What is GPT-3?
- GPT-3, Generative Pre-trained Transformer 3, is the 3rd generation transformer model (after GPT-1 and GPT-2) with 175 billion parameters, making it 17 times larger than its predecessor, GPT-2.
- GPT-3 was trained using a combination of 5 large datasets - Common Crawl, WebText2, Books1, Books2, Wikipedia Corpus. The final dataset resulted in hundreds of billions of words (large portions of web pages from the internet, a huge collection of books, and all of Wikipedia) to train GPT-3 to generate texts.
- So basically, GPT-3 has seen more text than any human will ever see in their lifetime.
Why should you know about it or where can you use it?
- Being a generative model, it is designed for text generation tasks such as question-answering, text-summarization, language translation, named-entity recognition, text classification.
- Apart from these tasks, GPT-3 can be used to create anything that has a language structure in it - that means it can write essays, create code, generate topics for blogs, create humor, generate contents for ads, emails, etc., and many many more if you are creative enough.
- In fact, start-ups are using GPT-3 for copywriting, code generations, etc.
Why it is better than previous state-of-the-art models?
Let's look at some history:
- Before GPT-1, most of the state-of-the-art NPL models were trained using supervised learning on specific tasks like sentiment analysis, language translation, etc. But the major limitations with supervised models are-
- they need a large amount of labeled dataset, which is not always available.
- they perform poorly on tasks for which they are not trained.
- GPT-1 proposed training the language model using an unlabeled dataset and then fine-tuning the model on specific tasks (supervised-training). For this fine-tuning on specific tasks, it required just a few epochs, which showed that the model had learned a lot during pre-training.
- GPT-1 performed better than specifically trained supervised state-of-the-art models in 9 out of 12 tasks the models were compared on.
- GPT-1 proved that with pre-training, the model was able to generalize well.
- The major development with GPT-2 was using a larger dataset and adding more parameters to the model to get a stronger language model.
- Also, GPT-2 aimed at learning multiple tasks using the same unsupervised model. For that, the learning objective was modified to the probability of output given input and task -> P(output | input, task) instead of the probability of output given only input -> P(output | input). So the model was expected to give different outputs for the same input for a different task.
- GPT-2 showed that having more parameters and training on a larger dataset improved the capability of the model and also excel the state-of-the-art of many tasks in zero-shot settings.
- With GPT-3, OpenAI built a very powerful language model which would need no fine-tuning and only a few examples to understand tasks and perform them. Due to its capabilities, GPT-3 has shown to write articles that are hard to differentiate from the ones written by humans.
- GPT-3 performed better than state-of-the-art for few language modeling datasets. For other datasets, it improved the zero-shot state-of-the-art performance.
- GPT-3 also performed reasonably well on NPL tasks like closed book question answering, translation, etc. often beating the state-of-the-art models. For most of the tasks, it performed better in few-shot learning as compared to one and zero-shot learning. And this is the reason that GPT-3 is being used for a variety of language generation tasks.
Some examples
- Text to website: All you need do is provide a sample URL and some description and it will create a website. Check out the demo.
- Text to regex: Describe the regex you want and an example of a string that will match.
- Text to SQL query: Just describe your SQL queries in English and it will provide you with the SQL code.
- Text generated by GPT-3: On providing just the Title, author name, and the first word "it", the model generated the following text.