With generative AI top of mind for both developers and business stakeholders, it’s important to explore how products like Workflows, Google Cloud’s serverless execution engine, can automate and orchestrate large language model (LLM) use cases. We recently covered how to orchestrate Vertex AI’s PaLM and Gemini APIs with Workflows. In this blog, we illustrate how Workflows can perform long-document summarization, a concrete use case with wide applicability.
Open-source LLM orchestration frameworks like LangChain for Python and TypeScript developers, or LangChain4j for Java developers, integrate various components such as LLMs, document loaders, and vector databases, to implement complex tasks such as document summarization. You can also use Workflows for this task without investing significant time in an LLM orchestration framework.
Summarization techniques
It’s easy enough to summarize a short document by entering the document’s entire content as a prompt into an LLM’s context window. However, prompts for large language models are usually token-count-limited. For longer documents, a different approach is required. Two common approaches are:
-
Map/reduce — A long document is split into smaller sections that fit the context window. For each section, a summary is created, and a summary of all the summaries is created as a final step.
-
Iterative refinement — Similar to the map/reduce approach, we evaluate the document in a piecemeal fashion. A summary is created for the first section, then the LLM refines its first summary with the details from the following section, and iteratively through to the end of the document.
Both methods yield good results. However, the map/reduce approach has one advantage over the refinement method. With refinement, you have a sequential process, as the next section of the document is summarized using the previously refined summary.
With map/reduce, as illustrated in the diagram below, you can create a summary for each section in parallel (the “map” operation), with a final summarization in the last step (the “reduce” operation). This is faster than the sequential approach.