ChatGPT, Bard, Minerva, and others, a class of tools called Large Language Models (LLMs), are poised to revolutionize how we search the internet. In the simplest terms possible, LLMs are software programs trained on vast (think Carl Sagan’s billions and billions) amounts of text-based information with the goal of being able to respond using human-like language. Those datasets include fiction and non-fiction books, website content, court filings, blogs, social media posts, chat logs, and other documents that can be compiled and accessed repeatedly. To hold conversations, LLMs are trained to respond using natural language – or to write as a human might write. This chat experience isn’t all that new – the first recognized chatbot, E.L.I.Z.A, was introduced in 1966, and it’s not unusual to encounter a chatbot as part of most e-commerce sites.
What then is the amazing leap that LLMs represent? Capability, speed, and agility are three attributes that are often recognized as part of the LLM advancement. Each release is an exponential improvement on its previous version. ChatGPT-4, released less than six months after ChatGPT 3.5, can now process video and images, and without getting too technical, has added the equivalent of approximately 22,000 new words (https://www.moveworks.com/insights/what-is-gpt-4-and-why-is-it-better-than-gpt-3). There have also been improvements in complex problem-solving and creativity, leading one analyst to quantify the updated ChatGPT as “ten times more advanced” (https://www.searchenginejournal.com/gpt-4-vs-gpt-3-5/482463/#close).
Their faster-than-humanly-possible speed allows them to respond to queries with impressive speed. When a user provides ChatGPT, as an example, with a request to create an abstract on eDiscovery written as if the author were Dr. Seuss, in less than two minutes it returns:
Despite all the impressive results, there are concerns about LLMs. The most notable is that LLMs can “hallucinate” responses. While ChatGPT reports that it cannot deliberately produce a false answer, it can make errors or provide incomplete answers based on gaps in the training data or the complexity of the question. How this flaw shows up can include accrediting the wrong degree or university affiliation to a specific person or providing a link to a website that no longer exists. OpenAI’s own testing demonstrated that the more complex the process or the longer the conversation in which ChatGPT was engaged, the more likely it was to produce erroneous responses.
Perhaps more worrisome, technology tools are only as immune to the nefarious motivations of their users as their developers anticipate. Incompletely trained LLMs can be prompted to provide harmful, dangerous, and offensive responses, all of which have garnered media attention, and prove that for all the innovation language-based models present, they’re still developed by humans, at least for now.