in

LlamaIndex: the final word LLM framework for indexing and retrieval | by Sophia Yang, Ph.D. | Jun, 2023


An Introduction to LlamaIndex

LlamaIndex, beforehand referred to as the GPT Index, is a exceptional information framework aimed toward serving to you construct functions with LLMs by offering important instruments that facilitate information ingestion, structuring, retrieval, and integration with numerous software frameworks. The capabilities supplied by LlamaIndex are quite a few and extremely invaluable:

✅ Ingest from totally different information sources and information codecs utilizing Information connectors (Llama Hub).
✅ Allow doc operations corresponding to inserting, deleting, updating, and refreshing the doc index.
✅ Assist synthesis over heterogeneous information and a number of paperwork.
✅ Use “Router” to choose between totally different question engines.
✅ Permit for the hypothetical doc embeddings to reinforce output high quality
✅ Supply a variety of integrations with numerous vector shops, ChatGPT plugins, tracing instruments, and LangChain, amongst others.
✅ Assist the model new OpenAI operate calling API.

These are just some examples of the intensive capabilities offered by LlamaIndex. On this weblog submit, we’ll discover a few of the functionalities that I discover exceptionally helpful with LlamaIndex.

When creating an LLM software, it’s important to allow LLM to work together with exterior information sources successfully. Methods to ingest information is the important thing right here. The Llama Hub presents a variety of over 100 information sources and codecs, permitting LlamaIndex or LangChain to ingest information in a constant method.

LlamaHub. Supply: https://llama-hub-ui.vercel.app/.

By default, you possibly can pip set up llama-hub and use it as a standalone bundle. You might also select to make use of our download_loader technique to individually obtain an information loader to be used with LlamaIndex.

Right here is an instance the place we load in a Wikipedia information loader from the llama-hub bundle. The constant syntax may be very good.

from llama_hub.wikipedia.base import WikipediaReader

loader = WikipediaReader()
paperwork = loader.load_data(pages=['Berlin', 'Rome', 'Tokyo', 'Canberra', 'Santiago'])

Verify the output:

Llama Hub additionally helps multimodal paperwork. For instance, the ImageReader loader makes use of pytesseract or the Donut transformer mannequin to extract textual content from a picture.

Index, retriever, and question engine

Index, retriever, and question engine are three primary elements for asking questions over your information or paperwork:

  • Index is an information construction that enables us to retrieve related data shortly for a consumer question from exterior paperwork. Index works by parsing paperwork into textual content chunks, that are referred to as “Node” objects, after which constructing index from the chunks.
  • Retriever is used for fetching and retrieving related data given consumer question.
  • Question engine is constructed on prime of index and retriever offering a generic interface to ask questions on your information.

Right here is the only approach to ask questions on your doc. You create an index from the doc first, after which use a question engine because the interface to your query:

from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.question("Who's Paul Graham.")

There are numerous varieties of index, retriever strategies, and question engines you could future learn on the LlamaIndex docs. Within the the rest of this text, I’d prefer to cowl a few of the cool options I discover helpful subsequent.

Typically instances, as soon as we create an index for our doc, there could be a have to periodically replace the doc. This course of will be expensive if we had been to recreate the embeddings for your complete doc once more. LlamaIndex index construction presents an answer by enabling environment friendly insertion, deletion, replace, and refresh operations. For instance, a brand new doc will be inserted as further nodes (textual content chunks) with out the necessity to recreate nodes from earlier paperwork:

# Supply: https://gpt-index.readthedocs.io/en/newest/how_to/index/document_management.html
from llama_index import ListIndex, Doc

index = ListIndex([])
text_chunks = ['text_chunk_1', 'text_chunk_2', 'text_chunk_3']

doc_chunks = []
for i, textual content in enumerate(text_chunks):
doc = Doc(textual content, doc_id=f"doc_id_{i}")
doc_chunks.append(doc)

# insert
for doc_chunk in doc_chunks:
index.insert(doc_chunk)

With LlamaIndex, it’s straightforward to question a number of paperwork. This performance is enabled by way of the `SubQuestionQueryEngine` class. When given a question, the question engine generates a “question plan” consisting of sub-queries towards sub-documents, that are then synthesized to supply the ultimate reply.

# Supply: https://gpt-index.readthedocs.io/en/newest/examples/usecases/10q_sub_question.html

# Load information
march_2022 = SimpleDirectoryReader(input_files=["../data/10q/uber_10q_march_2022.pdf"]).load_data()
june_2022 = SimpleDirectoryReader(input_files=["../data/10q/uber_10q_june_2022.pdf"]).load_data()
sept_2022 = SimpleDirectoryReader(input_files=["../data/10q/uber_10q_sept_2022.pdf"]).load_data()
# Construct indices
march_index = VectorStoreIndex.from_documents(march_2022)
june_index = VectorStoreIndex.from_documents(june_2022)
sept_index = VectorStoreIndex.from_documents(sept_2022)
# Construct question engines
march_engine = march_index.as_query_engine(similarity_top_k=3)
june_engine = june_index.as_query_engine(similarity_top_k=3)
sept_engine = sept_index.as_query_engine(similarity_top_k=3)
query_engine_tools = [
QueryEngineTool(
query_engine=sept_engine,
metadata=ToolMetadata(name='sept_22', description='Provides information about Uber quarterly financials ending September 2022')
),
QueryEngineTool(
query_engine=june_engine,
metadata=ToolMetadata(name='june_22', description='Provides information about Uber quarterly financials ending June 2022')
),
QueryEngineTool(
query_engine=march_engine,
metadata=ToolMetadata(name='march_22', description='Provides information about Uber quarterly financials ending March 2022')
),
]
# Run queries
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)
Run queries
response = s_engine.question('Analyze Uber income development over the newest two quarter filings')

As you possibly can see under, LlamaIndex decomposed a fancy question into 2 subqueries and was in a position to examine the knowledge from a number of paperwork to get the ultimate reply.

Think about you’re constructing a bot to retrieve data from each Notion and Slack, how does the language mannequin know which device to make use of to seek for data? LlamaIndex is sort of a intelligent helper that may discover issues for you, even when they’re elsewhere. Particularly, LlamaIndex’s “Router” is a brilliant easy abstraction that enables “choosing” between totally different question engines.

On this instance, we now have two doc indexes from Notion and Slack, and we create two question engines for every of them. After that, we put all of the instruments collectively and create a brilliant device referred to as RouterQueryEngine, which picks which device to make use of primarily based on the outline we gave to the person instruments. This manner, once we ask a query about Notion, the router will mechanically search for data from the Notion paperwork.

# Supply: https://gpt-index.readthedocs.io/en/newest/use_cases/queries.html#routing-over-heterogeneous-data
from llama_index import TreeIndex, VectorStoreIndex
from llama_index.instruments import QueryEngineTool
# outline sub-indices
index1 = VectorStoreIndex.from_documents(notion_docs)
index2 = VectorStoreIndex.from_documents(slack_docs)
# outline question engines and instruments
tool1 = QueryEngineTool.from_defaults(
query_engine=index1.as_query_engine(),
description="Use this question engine to do…",
)
tool2 = QueryEngineTool.from_defaults(
query_engine=index2.as_query_engine(),
description="Use this question engine for one thing else…",
)
from llama_index.query_engine import RouterQueryEngine
query_engine = RouterQueryEngine.from_defaults(
query_engine_tools=[tool1, tool2]
)
response = query_engine.question(
"In Notion, give me a abstract of the product roadmap."
)

There are lots of thrilling use circumstances for this. Here’s a full instance that makes use of the router to choose between SQL and a vector db: https://gpt-index.readthedocs.io/en/latest/examples/query_engine/SQLRouterQueryEngine.html.

Usually, once we ask a query about an exterior doc, what we usually do is that we use textual content embeddings to create vector representations for each the query and the doc. Then we use semantic search to search out the textual content chunks which can be probably the most related to the query. Nonetheless, the reply to the query might differ considerably from the query itself. What if we may generate hypothetical solutions to our query first after which discover the textual content chunks which can be most related to the hypothetical reply? That’s the place hypothetical doc embeddings (HyDE) come into play and might doubtlessly enhance output high quality.

# Supply: https://gpt-index.readthedocs.io/en/newest/examples/query_transformations/HyDEQueryTransformDemo.html

# load paperwork
paperwork = SimpleDirectoryReader('llama_index/examples/paul_graham_essay/information').load_data()
index = VectorStoreIndex.from_documents(paperwork)
query_str = "what did paul graham do after going to RISD"

#Now, we use HyDEQueryTransform to generate a hypothetical doc and use it for embedding lookup.
hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)
response = hyde_query_engine.question(query_str)
show(Markdown(f"<b>{response}</b>"))

#On this instance, HyDE improves output high quality considerably, by hallucinating precisely what Paul Graham did after RISD (see under), and thus enhancing the embedding high quality, and last output.
query_bundle = hyde(query_str)
hyde_doc = query_bundle.embedding_strs[0]

OpenAI recently released the operate calling capabilities to extra reliably join GPT’s capabilities with exterior instruments and APIs. Take a look at my earlier video to see precisely the way it works.

LlamaIndex has shortly built-in this performance and added a model new OpenAIAgent. Take a look at this notebook to be taught extra.

What if there are method too many numbers of capabilities? Use the RetrieverOpenAIAgent! Take a look at this notebook.

LlmaIndex presents a variety of integrations with numerous vector shops, ChatGPT plugins, tracing instruments, and LangChain.

Supply: https://imgflip.com/memegenerator.

How is LIamaIndex totally different from LangChain?

In case you have used LangChain, you might surprise how is LlamaIndex totally different from LangChain. If you’re not acquainted with LangChain, take a look at my earlier blog post and video. One can find placing similarities between LIamaIndex and LangChain of their functionalities together with indexing, semantic search, retrieval, and vector databases. They each excel in duties like query answering, doc summarization, and constructing chatbots.

Nonetheless, every of them has its distinctive areas of focus. LangChain, with its intensive checklist of options, casts a wider internet, concentrating on the usage of chains and brokers to attach with exterior APIs. Then again, LlamaIndex has a narrower focus shining within the space of information indexing and doc retrieval.

Methods to use LlamaIndex with LangChain?

Apparently, LIamaIndex and LangChain aren’t mutually unique. Actually, you should utilize each in your LLM functions. You should utilize each LlamaIndex’s information loader and question engine and LangChain’s brokers. I do know lots of people really use each of those instruments of their tasks.

Right here is an instance the place we used LlamaIndex to maintain the chat historical past when utilizing a LangChain agent. After we ask “what’s my title?” within the second spherical of dialog, the language mannequin is aware of that “I’m Bob” from the primary spherical of dialog:

# supply: https://github.com/jerryjliu/llama_index/blob/foremost/examples/langchain_demo/LangchainDemo.ipynb
# Utilizing LlamaIndex as a reminiscence module
from langchain import OpenAI
from langchain.llms import OpenAIChat
from langchain.brokers import initialize_agent
from llama_index import ListIndex
from llama_index.langchain_helpers.memory_wrapper import GPTIndexChatMemory
index = ListIndex([])
reminiscence = GPTIndexChatMemory(
index=index,
memory_key="chat_history",
query_kwargs={"response_mode": "compact"},
# return_source returns supply nodes as an alternative of querying index
return_source=True,
# return_messages returns context in message format
return_messages=True
)
llm = OpenAIChat(temperature=0)
# llm=OpenAI(temperature=0)
agent_executor = initialize_agent([], llm, agent="conversational-react-description", reminiscence=reminiscence)

In abstract, LlamaIndex is an extremely highly effective device for enhancing the capabilities of Massive Language Fashions with your individual information. Its array of information connectors, superior question interfaces, and versatile integration make it an important part within the growth of functions with LLMs.

Thanks Jerry Liu for the recommendation and suggestions!

Picture by Danielle Barnes on Unsplash

. . .

By Sophia Yang on June 19, 2023

Sophia Yang is a Senior Information Scientist. Join with me on LinkedIn, Twitter, and YouTube and be part of the DS/ML Book Club ❤️




Breaking boundaries in protein design with a brand new AI mannequin that understands interactions with any type of molecule | by LucianoSphere | Jun, 2023

AI for Social Good – Google AI Weblog