Improve Amazon Lex with LLMs and enhance the FAQ expertise utilizing URL ingestion

In at this time’s digital world, most shoppers would reasonably discover solutions to their customer support questions on their very own reasonably than taking the time to achieve out to companies and/or service suppliers. This weblog put up explores an progressive answer to construct a query and reply chatbot in Amazon Lex that makes use of current FAQs out of your web site. This AI-powered software can present fast, correct responses to real-world inquiries, permitting the client to rapidly and simply resolve frequent issues independently.

Single URL ingestion

Many enterprises have a broadcast set of solutions for FAQs for his or her prospects accessible on their web site. On this case, we wish to supply prospects a chatbot that may reply their questions from our printed FAQs. Within the weblog put up titled Enhance Amazon Lex with conversational FAQ features using LLMs, we demonstrated how you should use a mixture of Amazon Lex and LlamaIndex to construct a chatbot powered by your current data sources, similar to PDF or Phrase paperwork. To assist a easy FAQ, primarily based on an internet site of FAQs, we have to create an ingestion course of that may crawl the web site and create embeddings that can be utilized by LlamaIndex to reply buyer questions. On this case, we are going to construct on the bot created within the previous blog post, which queries these embeddings with a consumer’s utterance and returns the reply from the web site FAQs.

The next diagram reveals how the ingestion course of and the Amazon Lex bot work collectively for our answer.

Within the answer workflow, the web site with FAQs is ingested through AWS Lambda. This Lambda perform crawls the web site and shops the ensuing textual content in an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket then triggers a Lambda perform that makes use of LlamaIndex to create embeddings which might be saved in Amazon S3. When a query from an end-user arrives, similar to “What’s your return coverage?”, the Amazon Lex bot makes use of its Lambda perform to question the embeddings utilizing a RAG-based strategy with LlamaIndex. For extra details about this strategy and the pre-requisites, confer with the weblog put up, Enhance Amazon Lex with conversational FAQ features using LLMs.

After the pre-requisites from the aforementioned weblog are full, step one is to ingest the FAQs right into a doc repository that may be vectorized and listed by LlamaIndex. The next code reveals the best way to accomplish this:

import logging
import sys
import requests
import html2text
from llama_index.readers.schema.base import Doc
from llama_index import GPTVectorStoreIndex
from typing import Listing

logging.basicConfig(stream=sys.stdout, stage=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


class EZWebLoader:

def __init__(self, default_header: str = None):
self._html_to_text_parser = html2text()
if default_header is None:
self._default_header = {"Consumer-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
else:
self._default_header = default_header

def load_data(self, urls: Listing[str], headers: str = None) -> Listing[Document]:
if headers is None:
headers = self._default_header

paperwork = []
for url in urls:
response = requests.get(url, headers=headers).textual content
response = self._html2text.html2text(response)
paperwork.append(Doc(response))
return paperwork

url = "http://www.zappos.com/general-questions"
loader = EZWebLoader()
paperwork = loader.load_data([url])
index = GPTVectorStoreIndex.from_documents(paperwork)

Within the previous instance, we take a predefined FAQ web site URL from Zappos and ingest it utilizing the EZWebLoader class. With this class, we have now navigated to the URL and loaded all of the questions which might be within the web page into an index. We will now ask a query like “Does Zappos have reward playing cards?” and get the solutions immediately from our FAQs on the web site. The next screenshot reveals the Amazon Lex bot take a look at console answering that query from the FAQs.

We have been in a position to obtain this as a result of we had crawled the URL in step one and created embedddings that LlamaIndex may use to seek for the reply to our query. Our bot’s Lambda perform reveals how this search is run every time the fallback intent is returned:

import time
import json
import os
import logging
import boto3
from llama_index import StorageContext, load_index_from_storage


logger = logging.getLogger()
logger.setLevel(logging.DEBUG)


def download_docstore():
# Create an S3 consumer
s3 = boto3.consumer('s3')

# Listing all objects within the S3 bucket and obtain each
strive:
bucket_name="faq-bot-storage-001"
s3_response = s3.list_objects_v2(Bucket=bucket_name)

if 'Contents' in s3_response:
for merchandise in s3_response['Contents']:
file_name = merchandise['Key']
logger.debug("Downloading to /tmp/" + file_name)
s3.download_file(bucket_name, file_name, '/tmp/' + file_name)

logger.debug('All information downloaded from S3 and written to native filesystem.')

besides Exception as e:
logger.error(e)
elevate e

#obtain the doc retailer regionally
download_docstore()

storage_context = StorageContext.from_defaults(persist_dir="/tmp/")
# load index
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()


def lambda_handler(occasion, context):
"""
Route the incoming request primarily based on intent.
The JSON physique of the request is supplied within the occasion slot.
"""
# By default, deal with the consumer request as coming from the America/New_York time zone.
os.environ['TZ'] = 'America/New_York'
time.tzset()
logger.debug("===== START LEX FULFILLMENT ====")
logger.debug(occasion)
slots = {}
if "currentIntent" in occasion and "slots" in occasion["currentIntent"]:
slots = occasion["currentIntent"]["slots"]
intent = occasion["sessionState"]["intent"]

dialogaction = {"sort": "Delegate"}
message = []
if str.decrease(intent["name"]) == "fallbackintent":
#execute question from the enter given by the consumer
response = str.strip(query_engine.question(occasion["inputTranscript"]).response)
dialogaction["type"] = "Shut"
message.append({'content material': f'{response}', 'contentType': 'PlainText'})

final_response = {
"sessionState": {
"dialogAction": dialogaction,
"intent": intent
},
"messages": message
}

logger.debug(json.dumps(final_response, indent=1))
logger.debug("===== END LEX FULFILLMENT ====")

return final_response

This answer works nicely when a single webpage has all of the solutions. Nevertheless, most FAQ websites usually are not constructed on a single web page. For example, in our Zappos instance, if we ask the query “Do you’ve a worth matching coverage?”, then we get a less-than-satisfactory reply, as proven within the following screenshot.

Within the previous interplay, the price-matching coverage reply isn’t useful for our consumer. This reply is brief as a result of the FAQ referenced is a hyperlink to a selected web page in regards to the worth matching coverage and our net crawl was just for the one web page. Attaining higher solutions will imply crawling these hyperlinks as nicely. The following part reveals the best way to get solutions to questions that require two or extra ranges of web page depth.

N-level crawling

Once we crawl an online web page for FAQ data, the data we would like might be contained in linked pages. For instance, in our Zappos instance, we ask the query “Do you’ve a worth matching coverage?” and the reply is “Sure please go to <hyperlink> to be taught extra.” If somebody asks “What’s your worth matching coverage?” then we wish to give a whole reply with the coverage. Attaining this implies we have now the necessity to traverse hyperlinks to get the precise info for our end-user. In the course of the ingestion course of, we are able to use our net loader to seek out the anchor hyperlinks to different HTML pages after which traverse them. The next code change to our net crawler permits us to seek out hyperlinks within the pages we crawl. It additionally consists of some extra logic to keep away from round crawling and permit a filter by a prefix.

import logging
import requests
import html2text
from llama_index.readers.schema.base import Doc
from typing import Listing
import re


def find_http_urls_in_parentheses(s: str, prefix: str = None):
sample = r'((https?://[^)]+))'
urls = re.findall(sample, s)

matched = []
if prefix shouldn't be None:
for url in urls:
if str(url).startswith(prefix):
matched.append(url)
else:
matched = urls

return record(set(matched)) # take away duplicates by changing to set, then convert again to record



class EZWebLoader:

def __init__(self, default_header: str = None):
self._html_to_text_parser = html2text
if default_header is None:
self._default_header = {"Consumer-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
else:
self._default_header = default_header

def load_data(self,
urls: Listing[str],
num_levels: int = 0,
level_prefix: str = None,
headers: str = None) -> Listing[Document]:

logging.data(f"Variety of urls: {len(urls)}.")

if headers is None:
headers = self._default_header

paperwork = []
visited = {}
for url in urls:
q = [url]
depth = num_levels
for web page in q:
if web page not in visited: #stop cycles by checking to see if we already crawled a hyperlink
logging.data(f"Crawling {web page}")
visited[page] = True #add entry to visited to forestall re-crawling pages
response = requests.get(web page, headers=headers).textual content
response = self._html_to_text_parser.html2text(response) #cut back html to textual content
paperwork.append(Doc(response))
if depth > 0:
#crawl linked pages
ingest_urls = find_http_urls_in_parentheses(response, level_prefix)
logging.data(f"Discovered {len(ingest_urls)} pages to crawl.")
q.lengthen(ingest_urls)
depth -= 1 #cut back the depth counter so we go solely num_levels deep in our crawl
else:
logging.data(f"Skipping {web page} because it has already been crawled")
logging.data(f"Variety of paperwork: {len(paperwork)}.")
return paperwork

url = "http://www.zappos.com/general-questions"
loader = EZWebLoader()
#crawl the positioning with 1 stage depth and prefix of "/c/" for customer support root
paperwork = loader.load_data([url] 
num_levels=1, level_prefix="https://www.zappos.com/c/")
index = GPTVectorStoreIndex.from_documents(paperwork)

Within the previous code, we introduce the power to crawl N ranges deep, and we give a prefix that enables us to limit crawling to solely issues that start with a sure URL sample. In our Zappos instance, the customer support pages all are rooted from zappos.com/c, so we embody that as a prefix to restrict our crawls to a smaller and extra related subset. The code reveals how we are able to ingest as much as two ranges deep. Our bot’s Lambda logic stays the identical as a result of nothing has modified besides the crawler ingests extra paperwork.

We now have all of the paperwork listed and we are able to ask a extra detailed query. Within the following screenshot, our bot offers the right reply to the query “Do you’ve a worth matching coverage?”

We now have a whole reply to our query about worth matching. As an alternative of merely being informed “Sure see our coverage,” it provides us the main points from the second-level crawl.

Clear up

To keep away from incurring future bills, proceed with deleting all of the sources that have been deployed as a part of this train. We’ve supplied a script to close down the Sagemaker endpoint gracefully. Utilization particulars are within the README. Moreover, to take away all the opposite sources you may run cdk destroy in the identical listing as the opposite cdk instructions to deprovision all of the sources in your stack.

Conclusion

The flexibility to ingest a set of FAQs right into a chatbot permits your prospects to seek out the solutions to their questions with simple, pure language queries. By combining the built-in assist in Amazon Lex for fallback dealing with with a RAG answer similar to a LlamaIndex, we are able to present a fast path for our prospects to get satisfying, curated, and authorized solutions to FAQs. By making use of N-level crawling into our answer, we are able to permit for solutions that might presumably span a number of FAQ hyperlinks and supply deeper solutions to our buyer’s queries. By following these steps, you may seamlessly incorporate highly effective LLM-based Q and A capabilities and environment friendly URL ingestion into your Amazon Lex chatbot. This leads to extra correct, complete, and contextually conscious interactions with customers.

In regards to the authors

Max Henkel-Wallace is a Software program Growth Engineer at AWS Lex. He enjoys working leveraging know-how to maximise buyer success. Outdoors of labor he’s enthusiastic about cooking, spending time with buddies, and backpacking.

Music Feng is a Senior Utilized Scientist at AWS AI Labs, specializing in Pure Language Processing and Synthetic Intelligence. Her analysis explores numerous features of those fields together with document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive textual content technology utilizing multimodal knowledge.

John Baker is a Principal SDE at AWS the place he works on Pure Language Processing, Giant Language Fashions and different ML/AI associated initiatives. He has been with Amazon for 9+ years and has labored throughout AWS, Alexa and Amazon.com. In his spare time, John enjoys snowboarding and different outside actions all through the Pacific Northwest.