Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content.

The Retrieval-Augmented Generation (RAG) framework augments prompts with external data from multiple sources, such as document repositories, databases, or APIs, to make foundation models effective for domain-specific tasks. This post presents the capabilities of the RAG model and highlights the transformative potential of MongoDB Atlas with its Vector Search feature.

MongoDB Atlas is an integrated suite of data services that accelerate and simplify the development of data-driven applications. Its vector data store seamlessly integrates with operational data storage, eliminating the need for a separate database. This integration enables powerful semantic search capabilities through Vector Search, a fast way to build semantic search and AI-powered applications.

Amazon SageMaker enables enterprises to build, train, and deploy machine learning (ML) models. Amazon SageMaker JumpStart provides pre-trained models and data to help you get started with ML. You can access, customize, and deploy pre-trained models and data through the SageMaker JumpStart landing page in Amazon SageMaker Studio with just a few clicks.

Amazon Lex is a conversational interface that helps businesses create chatbots and voice bots that engage in natural, lifelike interactions. By integrating Amazon Lex with generative AI, businesses can create a holistic ecosystem where user input seamlessly transitions into coherent and contextually relevant responses.

Solution overview

The following diagram illustrates the solution architecture.

In the following sections, we walk through the steps to implement this solution and its components.

Set up a MongoDB cluster

To create a free tier MongoDB Atlas cluster, follow the instructions in Create a Cluster. Set up the database access and network access.

Deploy the SageMaker embedding model

You can choose the embedding model (ALL MiniLM L6 v2) on the SageMaker JumpStart Models, notebooks, solutions page.

Choose Deploy to deploy the model.

Verify the model is successfully deployed and verify the endpoint is created.

Vector embedding

Vector embedding is a process of converting a text or image into a vector representation. With the following code, we can generate vector embeddings with SageMaker JumpStart and update the collection with the created vector for every document:

payload = {"text_inputs": [document[field_name_to_be_vectorized]]}
query_response = query_endpoint_with_json_payload(json.dumps(payload).encode('utf-8'))
embeddings = parse_response_multiple_texts(query_response)

# update the document
update = {'$set': {vector_field_name :  embeddings[0]}}
collection.update_one(query, update)

The code above shows how to update a single object in a collection. To update all objects follow the instructions.

MongoDB vector data store

MongoDB Atlas Vector Search is a new feature that allows you to store and search vector data in MongoDB. Vector data is a type of data that represents a point in a high-dimensional space. This type of data is often used in ML and artificial intelligence applications. MongoDB Atlas Vector Search uses a technique called k-nearest neighbors (k-NN) to search for similar vectors. k-NN works by finding the k most similar vectors to a given vector. The most similar vectors are the ones that are closest to the given vector in terms of the Euclidean distance.

Storing vector data next to operational data can improve performance by reducing the need to move data between different storage systems. This is especially beneficial for applications that require real-time access to vector data.

Create a Vector Search index

The next step is to create a MongoDB Vector Search index on the vector field you created in the previous step. MongoDB uses the knnVector type to index vector embeddings. The vector field should be represented as an array of numbers (BSON int32, int64, or double data types only).

Refer to Review knnVector Type Limitations for more information about the limitations of the knnVector type.

The following code is a sample index definition:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "egVector": {
        "dimensions": 384,
        "similarity": "euclidean",
        "type": "knnVector"
      }
    }
  }
}

Note that the dimension must match you embeddings model dimension.

Query the vector data store

You can query the vector data store using the Vector Search aggregation pipeline. It uses the Vector Search index and performs a semantic search on the vector data store.

The following code is a sample search definition:

{
  $search: {
    "index": "<index name>", // optional, defaults to "default"
    "knnBeta": {
      "vector": [<array-of-numbers>],
      "path": "<field-to-search>",
      "filter": {<filter-specification>},
      "k": <number>,
      "score": {<options>}
    }
  }
}

Deploy the SageMaker large language model

SageMaker JumpStart foundation models are pre-trained large language models (LLMs) that are used to solve a variety of natural language processing (NLP) tasks, such as text summarization, question answering, and natural language inference. They are available in a variety of sizes and configurations. In this solution, we use the Hugging Face FLAN-T5-XL model.

Search for the FLAN-T5-XL model in SageMaker JumpStart.

Choose Deploy to set up the FLAN-T5-XL model.

Verify the model is deployed successfully and the endpoint is active.

Create an Amazon Lex bot

To create an Amazon Lex bot, complete the following steps:

On the Amazon Lex console, choose Create bot.

For Bot name, enter a name.
For Runtime role, select Create a role with basic Amazon Lex permissions.
Specify your language settings, then choose Done.
Add a sample utterance in the NewIntent UI and choose Save intent.
Navigate to the FallbackIntent that was created for you by default and toggle Active in the Fulfillment section.
Choose Build and after the build is successful, choose Test.
Before testing, choose the gear icon.
Specify the AWS Lambda function that will interact with MongoDB Atlas and the LLM to provide responses. To create the lambda function follow these steps.
You can now interact with the LLM.

Clean up

To clean up your resources, complete the following steps:

Delete the Amazon Lex bot.
Delete the Lambda function.
Delete the LLM SageMaker endpoint.
Delete the embeddings model SageMaker endpoint.
Delete the MongoDB Atlas cluster.

Conclusion

In the post, we showed how to create a simple bot that uses MongoDB Atlas semantic search and integrates with a model from SageMaker JumpStart. This bot allows you to quickly prototype user interaction with different LLMs in SageMaker Jumpstart while pairing them with the context originating in MongoDB Atlas.

As always, AWS welcomes feedback. Please leave your feedback and questions in the comments section.

About the authors

Igor Alekseev is a Senior Partner Solution Architect at AWS in Data and Analytics domain. In his role Igor is working with strategic partners helping them build complex, AWS-optimized architectures. Prior joining AWS, as a Data/Solution Architect he implemented many projects in Big Data domain, including several data lakes in Hadoop ecosystem. As a Data Engineer he was involved in applying AI/ML to fraud detection and office automation.

Babu Srinivasan is a Senior Partner Solutions Architect at MongoDB. In his current role, he is working with AWS to build the technical integrations and reference architectures for the AWS and MongoDB solutions. He has more than two decades of experience in Database and Cloud technologies . He is passionate about providing technical solutions to customers working with multiple Global System Integrators(GSIs) across multiple geographies.