If you’re feeling overwhelmed by the advances of Artificial Intelligence (AI) in eDiscovery, we want you to know that we understand. In fact, you can consider us your eDiscovery whisperer, knowing that Digital Mountain has been providing white-glove eDiscovery services for over twenty years. Whether or not you’re feeling pressured to adopt new AI-enhanced technologies before your organization falls behind, and even if you’ve already embraced the future with a new platform, we encourage you to read on – because there’s always more to learn in this rapidly evolving technology environment.
AI has been part of eDiscovery for more than a decade now. Information conglomerate Thomson Reuters credits Judge Andrew Peck with helping usher in the Technology Assisted Review (TAR) age in 2012 with his decision in Da Silva Moore v. Publicis Groupe (Southern District of New York, 2012) wherein Judge Peck encouraged attorneys to use the technology as a way to increase the effectiveness of discovery while decreasing the cost. At its heart, any application that supports TAR must include AI elements because all digital/computer technology is AI (https://legal.thomsonreuters.com/en/insights/articles/myths-and-facts-about-technology-assisted-review). So, if you’re being urged to adopt AI in eDiscovery, pat yourself on the back if you’ve been using TAR or service providers who are TAR professionals, because you’ve already been there, done that.
So how does the rise of the Large-Language Model (LLM) change eDiscovery and its relationship to AI? To be honest, at this juncture, it hasn’t significantly changed eDiscovery – yet. We’re keeping an eye on the new-to-market offerings that come with the promise of generative AI (gAI), and to date, we’re still cautiously optimistic that gAI will advance TAR-enabled eDiscovery. However, there are aspects of gAI and LLM-based applications that raise concerns for us:
1. Privacy and Data Protection. An LLM requires data from which it will work before it begins any eDiscovery workflow. This means that firms will need to upload documents, spreadsheets, logs, and other data into the LLM – and that means uploading your client’s data. Without getting too technical, unless your firm or the company that built the eDiscovery application is also building the LLM, an expensive undertaking, the LLM is owned by another entity and everyone else is accessing the LLM through a chatbot interface. The caveat here is that the terms of the contract must be very clear about where the data is going, who is storing the data, for how long it is being maintained, how it may be shared with additional organizations for training or development of future technologies, what the process is for removing PII/PHI from the training data set, and how to ensure that purged data is truly gone and forgotten.
2. Accuracy of the Output. By now you’ve heard not only of the attorneys who infamously used an LLM in preparing a court filing that included hallucinated citations, but you’ve also seen courts issuing orders requiring attorneys to certify their use of gAI in preparing court filings. These orders underscore the concern that LLMs are not programmed to fact-check their outputs, but rather to promulgate a likely answer with a sense of authority. While application developers can add the ability to include the required certification as a function of the application, the certification itself will not verify the accuracy of the output. Verification remains the responsibility of the user.
3. Contingency and Cost Considerations. One of the most impressive features of LLMs is the speed with which they can disaggregate and reaggregate large amounts of information. However, as a prerequisite to producing an answer rapidly, there must be a body of data to access, requiring data storage. Most LLM products currently marketed include a level of cloud storage for the data (documents, etc), but what happens when the level of data exceeds the contractual storage level? Depending on the contract, there may be a pricey upgrade, the upload may be delayed until more storage is obtained, or the upload may stop. Is a potential storage shortage a fatal flaw for an LLM-based application? Most likely not, but we still caution early adopters to look carefully and do the math based on known data sizes for recent cases.
4. Practicality. The key to getting the most from an LLM-based application is crafting effective prompts. Operators, or prompt engineers, will need to know eDiscovery principles as well as how to construct prompts that lead the application to produce the required outputs. That’s certainly a skill that eDiscovery professionals will be developing over time, but the immediate question that arises is, “Are LLM-based applications the most accurate and cost-effective technology to employ now?” Again, we’re cautiously optimistic about the future of LLM-based eDiscovery applications, but at this juncture, we’re not sold that this nascent technology is ready to replace experienced eDiscovery professionals. Additionally, experienced eDiscovery professionals acquire intuition based on experience that we foresee taking a significant time to translate to LLM-based eDiscovery work.
There’s a lot of competition for your attention where LLM-based legal applications are concerned. Right now, document summary, drafting, and comparison seem to be well within the skill set of LLM-based AI. As far as eDiscovery applications and workflows are concerned, we’re excited for what may be coming next, but we’re not trading in our team of experienced, knowledgeable professionals. Even as technology advances, our white-glove service will always be our strength.