Glean LangChain Integration

Glean’s official LangChain integration enables you to build powerful AI agents that can search and reason over your organization’s knowledge using Python and the LangChain framework.

langchain-glean

Official LangChain integration for Glean’s search and chat capabilities

Installation

pip install -U langchain-glean

Configuration

Configure your Glean credentials by setting the following environment variables. You can obtain these credentials from the authentication documentation:

export GLEAN_SUBDOMAIN="your-glean-subdomain"
export GLEAN_API_TOKEN="your-glean-api-token"
export GLEAN_ACT_AS="user@example.com"  # Optional: Email to act as when making requests

Usage Examples

Using the Retriever

The GleanSearchRetriever allows you to search and retrieve documents from Glean:

from langchain_glean.retrievers import GleanSearchRetriever

# Initialize the retriever (will use environment variables)
retriever = GleanSearchRetriever()

# Search for documents
documents = retriever.invoke("quarterly sales report")

# Process the results
for doc in documents:
    print(f"Title: {doc.metadata.get('title')}")
    print(f"URL: {doc.metadata.get('url')}")
    print(f"Content: {doc.page_content}")
    print("---")

Building an Agent with Tools

The GleanSearchTool can be used in LangChain agents to search Glean:

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_glean.retrievers import GleanSearchRetriever
from langchain_glean.tools import GleanSearchTool

# Initialize the retriever
retriever = GleanSearchRetriever()

# Create the tool
glean_tool = GleanSearchTool(
    retriever=retriever,
    name="glean_search",
    description="Search for information in your organization's content using Glean."
)

# Create an agent with the tool
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant with access to Glean search."),
    ("user", "{input}")
])

agent = create_openai_tools_agent(llm, [glean_tool], prompt)
agent_executor = AgentExecutor(agent=agent, tools=[glean_tool])

# Run the agent
response = agent_executor.invoke({"input": "Find the latest quarterly report"})
print(response["output"])

RAG with LangChain Chains

You can integrate the retriever with LangChain chains for more complex workflows:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_glean.retrievers import GleanSearchRetriever

# Initialize the retriever
retriever = GleanSearchRetriever()

# Create a prompt template
prompt = ChatPromptTemplate.from_template(
    """Answer the question based only on the context provided.

Context: {context}

Question: {question}"""
)

# Initialize the language model
llm = ChatOpenAI(model="gpt-4o")

# Format documents function
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Create the chain
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Run the chain
result = chain.invoke("What were our Q2 sales results?")
print(result)

Advanced Usage

Search Parameters

You can customize your search by passing additional parameters:

# Search with additional parameters
documents = retriever.invoke(
    "quarterly sales report",
    page_size=5,  # Number of results to return
    disable_spellcheck=True,  # Disable spellcheck
    max_snippet_size=200  # Maximum snippet size
)

Custom Retriever Configuration

Configure the retriever with custom settings:

# Initialize with custom settings
retriever = GleanSearchRetriever(
    subdomain="your-subdomain",  # Override environment variable
    api_token="your-api-token",  # Override environment variable
    act_as="user@example.com",   # Override environment variable
    page_size=10,               # Default number of results
    max_snippet_size=300        # Default snippet size
)

For the complete API documentation and implementation details, visit the GitHub repository.

Was this page helpful?