Skip to the content.

Long Document Summarization Techniques with Java with Langchain4J and Gemini models

Suppose your organization has a large number of documents, in various formats, and you, a Java developer, are tasked to efficiently summarize the content of each document.

While summarizing any document with just a few paragraphs is a simple task, there are several challenges to overcome when summarizing large documents with many pages.

Generative AI is top of mind for both developer and business stakeholder and you want to explore how Large Language Models (LLMs) can help you with large document summarization, a complex use-case with universal applicability in the enterprise.

As a Java developer, you're adept at building robust, scalable, high-performance applications. While Python dominates the ML and NLP landscape, Java is the backbone of enterprise software for a long time. It's strength in enterprise systems makes it an ideal platform for integrating advanced NLP techniques. With LLM usage, you can now add powerful, AI-driven insights, to your Java applications, bridging the gap between traditional enterprise software and cutting-edge machine learning capabilities.

This blog post explores various summarization techniques using LLMs, leaving you with practical information and a codebase with ready-to-test Java examples. The objective is to enable you with both theoretical knowledge and hands-on skills for effective document summarization.

We'll be leveraging Vertex AI with the latest Gemini models and the open-source Langchin4J LLM orchestration framework.

Why consider LLMs for text summarization

LLMs offer a number of advantages over traditional extractive summarization methods:

Text Summarization Techniques

We'll explore in detail the following three summarization techniques in this blog post

Before you start

The summarization code uses Java 21. If not already installed, use the following instructions to set it up.

Documentation provides details for cloning the repository, setting the required environment vriables up and authenticating to Vertex AI.

Loading and splitting the document

Before summarization can be started, you need to load the document, then, depending of your summarization approach, split the content up into smaller chunks that can fit into the context window for your chosen LLM.

The latest multimodal Gemini models in Vertex AI have very large context windows, up to 2M tokens, however you will have to adapt to the context window for LLM of your choice.

Langchain4J offers a number of out-of-the-box Document Loaders, Document Parsers and DocumentSplitters . It is very important to explore which one would yield the best results for your use-case.

The codebase for this blog loads the documents from the test folder using a FileSystemDocumentLoader and the TextDocumentParser. Sample documents are provided in text format in this repo.

For text splitting, the DocumentByParagraphSplitter is being used. It splits a Document into paragraphs and attempts to fit as many paragraphs as possible into a single TextSegment, within the limit set for the chunk size. The splitter allows you to specify an overlap window for chunks, with benefits discussed later in the post.

Choosing the right chunk size is an exercise dependent on the length of the context window for the LLM of your choice.

// load and parse the document  
Document document = loadDocument(resource, new TextDocumentParser());  
  
// Overlap window size between chunks set to OVERLAP_SIZE - can be configured  
// from 0 - text.length()  
DocumentSplitter splitter = new DocumentByParagraphSplitter(CHUNK_SIZE, OVERLAP_SIZE);  
List<TextSegment> chunks = splitter.split(document);

LLM conversation inputs

@SystemMessage and @UserMessage are commonly used in the context of prompting and interacting with Large Language Models (LLMs)

@SystemMessage is used to set the context, or role of the AI models, and is usually not visible to the user. We will use for system instructions the same @SystemMessage whenever the AI Service is invoked.

@UserMessage represents the actual input from the human user interacting with the AI. It's the question, prompt, or statement that the user wants the AI to respond to.

@SystemMesage and @UserMessage can be provided directly as Strings or loaded from a prompt template from resources: SystemMessage(fromResource = "my-system-prompt-template.txt") or @UserMessage(fromResource = "my-user-template.txt")

#1: Prompt Stuffing

Stuffing is the simplest summarization technique, as you can pass in the content of the entire document as a prompt in the LLM's content window. However, as prompts for LLMs are token-count-limited, different techniques need to be used for large documents, depending on the size of the content window.

Google's Gemini models have very large context windows, making them an easy choice summarizing large documents. (see limits here)

public interface StuffingSummarizationAssistant {  
    @SystemMessage("""  
    You are a helpful AI assistant.    
    You are an AI assistant that helps people summarize information.    
    Your name is Gemini    
    You should reply to the users request with your name and also in the style 
    of a literary critic    
    Strictly ignore Project Gutenberg & ignore copyright notice in summary 
    output.    
    """)  
    @UserMessage("""  
    Please provide a concise summary in strictly no more 
    than 10 one sentence bullet points,    
    starting with an introduction and ending with a conclusion, 
    of the following text
                  TEXT: <content>    
    """)  
    String summarize(@V("content") String content);  
}

...
// summarize the document with the help of the StuffingSummarizationAssistant 
StuffingSummarizationAssistant assistant = AiServices.create(StuffingSummarizationAssistant.class, chatModel);  
String response = assistant.summarize(document.text());
...				 

Pros:

Cons

#2: Map-Reduce

Map-reduce is more intricate than prompt stuffing and implements a multi-stage summarization, as you split the document into smaller (optionally overlapping) chunks, summarize each chunk in parallel, then summarize the individual summaries in a second and final step.

In this method, you need to prepare two user prompt templates, one for the initial chunk summarization step and another for the final combine step. The system instructions remain the same across all LLM calls.

Splitting the text and summarizing individual chunks (the "map" step)

You'll be using the following @UserMessage:

public interface ChunkSummarizationAssistant {
	@SystemMessage(fromResource = "my-system-prompt-template.txt")
	@UserMessage("""  
	Taking the following context delimited by triple backquotes into consideration   
	''''''    
	Write a concise summary of the following text delimited by triple backquotes.  
	'''<content>'''  
	Output starts with CONCISE SUB-SUMMARY:  
	""")
	String summarize(@V("context") String context, @V("content") String content);
}

...
ChunkSummarizationAssistant assistant = AiServices.create(ChunkSummarizationAssistant.class, chatModel);  
String response = assistant.summarize(context.toString(), segment);

Map-reduce allows you to parallelize the individual chunk summarization steps, as they are independent of each other:

List<CompletableFuture<Map<Integer, String>>> futures = new ArrayList<>();  
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();  
Map<Integer, String> resultMap = new TreeMap<>(); // TreeMap to automatically sort by key  
  
for(int i = 0; i < segments.size(); i++) {  
    int index = i;  
    CompletableFuture<Map<Integer, String>> future = CompletableFuture  
        .supplyAsync(() -> summarizeChunk(index, segments.get(index).text()), executor);  
    futures.add(future);  
}  
  
// Wait for all futures to complete and collect the results in resultMap  
CompletableFuture<Void> allDone = CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))  
        .thenAccept(v -> futures.forEach(f -> f.thenAccept(resultMap::putAll)));  
  
allDone.get(); // Wait for all processing to complete

Important: please be aware that parallel summarization of individual chunks does not guarantee the ordering of the individual chunk summaries at the end of the process. It is up to the developer to keep summary ordering in mind before proceeding to the second and final step. An index value is attached to each chunk in the sample codebase, guaranteeing an ordered list of individual summaries !

A key factor for improving summarization results is the concept of overlapping chunks.

Splitting a document by a specific chunk size is arbitrary from a summarization perspective, even if done with utility classes which will split the text cleanly in paragraphs, then attempt to fit entire sentences into the remaining chunk space.

Providing the ability for chunks to overlap up to a specified overlap size can yield better summarization results by preserving more contexts between the individual chunks.

Please consider experimenting with different chunk and overlap sizes for your respective summarization use-cases.

Note: the degree to which you can parallelize LLM calls depends on whether the rate limit of API calls per minute imposed by the LLM !! !

Summary of summaries (the "reduce" part)

With all individual summaries on hand, you can move on to the second and final step, the summarization of the individual summaries.

You would be using a different @UserMessage in this step:

public interface FinalSummarizationAssistant {  
    @SystemMessage(fromResource = "my-system-prompt-template.txt")  
    @UserMessage("""  
    Please provide a concise summary in strictly no more than 10 one sentence bullet points,    
    starting with an introduction and ending with a conclusion, 
    of the following text delimited by triple backquotes.
          '''Text:<content>'''  
      Output starts with SUMMARY:  
    """)  
    String summarize(@V("content") String content);  
}
...
FinalSummarizationAssistant assistant = AiServices.create(FinalSummarizationAssistant.class, chatModel);  
String response = assistant.summarize(content);

Pros:

Cons

#3: Refine

The refine method is an alternative to map-reduce to handle large document summarization. You split the document similar to map-reduce, summarize the first chunk, then ask the LLM to refine the initial summary iteratively with the added text from the following chunk, to the end of the text.

This approach ensures a that the summary is both comprehensive, as well as accurate, as it takes into consideration the context of the previous chunk(s).

You would be using the same @UserMessages illustrated in the two steps in the "Map-reduce" approach: ChunkSummarizationAssistant and FinalSummarizationAssistant.

// process each individual chunk in order  
// summary refined in each step by adding the summary of the current chunk  
long start = System.currentTimeMillis();  
StringBuilder context = new StringBuilder();  
chunks.forEach(segment -> summarizeChunk(context, segment.text()));  
  
// process the final summary  of the text  
String output = buildFinalSummary(context.toString());

Pros:

Cons

Summary

In this blog post, we have explored different programmatic summarization techniques for large documents using Google's Gemini LLM, as an advanced use-case for generative AI in enterprise software.

LLM orchestration frameworks do not provide out-of-the-box summarization functionality, therefore you have a full codebase available here, with practical examples, demonstrating how to implement these techniques efficiently in Java.

As an enterprise Java developer, you now have powerful options to leverage LLMs and add AI-driven insights to your applications, potentially transforming how you handle document analysis and summarization.

The field of AI-powered document summarization is rapidly evolving, with new models and techniques emerging regularly. Stay tuned for future developments that could further enhance these capabilities.

Don't hesitate to reach out at @ddobrin for feedback, questions or to discuss new summarization techniques.