Building Actionable LLM Agents — end-to-end food app example

10 min readJan 27, 2025

The field of large LLM agents has seen exciting advances in recent years. By combining powerful natural language processing capabilities with specialized models and techniques, we can now create highly capable agents that understand user intent and take appropriate actions to assist and delight users.

This article will dive into the architecture and key components of an intelligent food ordering assistant powered by LLM agents. We’ll explore how it leverages advanced language models and vector embeddings to understand user intent, search for personalized recommendations, and handle the entire ordering process through engaging conversation.

Introduction

Imagine effortlessly ordering your favorite dishes simply by chatting with an AI assistant — no complex menus to navigate, no rigid interfaces to wrestle with. Just natural conversation, personalized recommendations, and seamless ordering. This is the promise of intelligent food ordering applications powered by actionable LLM agents.

Behind the scenes, the agent employs a suite of language models and techniques to understand the user’s intent, search through restaurant menus to find the most relevant dishes, and guide the user through the ordering process.

Architecture

The application follows a client-server architecture. The backend is built on the Falcon web framework, exposing REST endpoints for the frontend to communicate with the AI agent. Key components include:

The OrderingAgent class, which orchestrates the conversation flow and delegates to specialized models for different tasks
Focused LLMs for intent analysis, personalized search, budget extraction, and natural language generation
Vector databases for semantic search over restaurant menus

Stepped Intent Recognition Process

At the heart of the ordering assistant is a multi-step intent recognition flow. When a user message arrives, the agent considers it in the full context of the conversation, dynamically determining the most likely intent. This allows the agent to understand the user’s needs in a nuanced, contextual way.

Context Analysis: First, the agent reviews the conversation context and key information it has gathered so far, such as the user’s address, food preferences, and budget. This background guides how it will interpret the user’s current message.

context = "\n"
if self.user_store.has_preferences(user_id):
 context += f"User preference is provided.\n"
else:
 context += f"User hasn't provided yet any food preference, so I don't know what user wants to order.\n"
if self.user_store.has_address(user_id): 
 context += f"User address is provided.\n"
else:
 context += f"User hasn't provided yet any address, so I don't know where to deliver food.\n"

By dynamically constructing this context based on the conversation state, the agent can interpret user messages more accurately. For example, if the user hasn’t provided an address yet, the agent will be more likely to interpret location-related messages as an attempt to specify a delivery address.

2. Intent Extraction: Next, a QA chain powered by an instruction-tuned LLM predicts the intent of the user’s message within this conversational context. The QA model uses a prompt engineered to align with the ordering flow:

def create_qa_chain(self, user_id: str) -> Chain:
    model_path = self._get_qa_model()        
    llm = LlamaCpp(model_path=model_path, temperature=0.7, max_tokens=50, verbose=False, model_kwargs={"loglevel": logging.ERROR})

    prompt_template = """
    You are an AI assistant for a food ordering app. Your purpose is to interpret what user wants and user's intent.
    Use the following guide to respond directly to the user without explaining different scenarios. 
    - If the user's input looks like an address (e.g., contains a street number, street name, city, state, and/or zip code), assume they are providing their delivery address.
    - If the user's input mentions food ingredients or cuisine name, assume they are providing information to help select a suitable restaurant and dish.
    - If the user's input mentions a monetary amount or budget (e.g., "50 bucks", "$20", "under 30$"), assume they are providing their budget for the order.
    - If the user's input is not related to ordering food, clarify that your role is to assist with placing food delivery orders.
    - If the user's just provides information, just explain what he's trying to do - don't suggest anything as the next step.
    Use the following pieces of context to determine user's intent:
    {context}
    Reply in less than 50 words to what is the main user's intent based on this user's input:'{question}'
    Answer:
    """
    
    prompt = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )

    qa_chain = prompt | llm
    return qa_chain

In this code, we’re using Langchain’s prompts and chains to construct a question-answering system for intent extraction. We define a prompt template that will be used to format the inputs passed to the LLM.

A prompt template in Langchain is a way to dynamically construct prompts by inserting variables. The keys surrounded by curly braces like {context} and {question} are placeholders for values that will be provided at runtime.

This particular prompt is designed to instruct the LLM to act as an intent extraction module for a food ordering assistant. It provides clear guidelines on how to map different types of user messages to intents relevant for an ordering flow. Some key points about this prompt:

It starts by explaining the high-level role of the assistant to frame the task
It gives specific instructions for how to categorize different types of messages (addresses, food preferences, budgets, etc.)
It explicitly lists out the pieces of information that will be provided as input (context and question)
It constrains the output format (reply in <50 words describing the main intent)

By thoroughly instructing the LLM on how to behave as an intent extractor, this prompt essentially “programs” the model to perform this specific task vs. acting as a generic open-ended assistant.

3. Intent Classification: The intent description from the QA chain captures the semantic meaning of the user’s intent, but to cleanly handle it in downstream code, the agent maps it to a structured enum. This is done by comparing the intent description embedding to embeddings of the possible intent categories using cosine similarity:

class IntentEnum(Enum):
    GENERAL_QUESTION = "The user is asking a general question not related to ordering food."
    PROVIDE_ADDRESS = "The user is providing their delivery address or location for their food order."
    PROVIDE_PREFERENCES = "The user is providing their preferred cuisine, type of restaurants or ingredients for their order."
    PROVIDE_BUDGET = "The user is specifying a exact or approximate budget limit or numerical amount they are willing to pay or spend on their food order."

self.embeddings = embeddings.get_embeddings()
self.enum_embeddings = self.embeddings.embed_documents([e.value for e in IntentEnum])

intent_embeddings = self.embeddings.embed_documents([intent_description]) 
similarities = cosine_similarity(intent_embeddings, self.enum_embeddings)
intent_index = similarities.argmax()
intent = IntentEnum(list(IntentEnum)[intent_index])

if intent == IntentEnum.GENERAL_QUESTION:
  ...
elif intent == IntentEnum.PROVIDE_BUDGET:
  ...

This multi-step process allows the agent to deeply understand intents in a nuanced, context-aware way vs rigidly classifying based solely on the current message.

Searching Personalized Recommendations

Once the agent has captured the user’s key preferences — cuisine, ingredients, budget, etc. — it’s time to search the restaurant menus to find the most relevant personalized dishes to recommend.

To power this semantic search, menu items are dynamically indexed into a vector database, with both the structured metadata (price, venue, etc) and the embeddings of the dish descriptions:

menu_items = []
for venue in venues:
  menu = self.eats_api.get_menu(venue["store_id"])
  for item in menu:
    item["venue_id"] = venue["store_id"]
    menu_items.append(Document(page_content=json.dumps(item), metadata=item))

menu_store = FAISS.from_documents(menu_items, self.embeddings)

The user’s combined food preferences are then embedded into the same vector space, allowing a nearest neighbor search to find the most relevant dishes, filtered by the user’s budget:

combined_preferences = " ".join(preferences)
preference_embedding = self.embeddings.embed_query(combined_preferences)
docs_and_scores = menu_store.similarity_search_with_score_by_vector(
 preference_embedding, 
 k=5, 
 filter=lambda metadata: metadata.get('price', float('inf')) <= budget
)

This flexible semantic search approach can surface surprisingly relevant recommendations, even when the user describes their preferences in novel or roundabout ways. It’s not confined to searching for exact ingredient or cuisine keywords.

Focused Models for Different Scenarios

While the conversational agent can engage in freeform dialog, it ultimately needs to extract structured details like the delivery address and payment information to complete an order. To avoid rigid, robotic prompts, it allows the user to provide these details naturally in conversation but uses targeted models to extract the key data.

You’ll notice the agent reaches for different specialized models for different scenarios throughout the flow, vs using a single monolithic model for everything:

The intent analysis QA chain uses a model tuned for in-context QA with an ordering-specific prompt
Dish recommendations are powered by embeddings from a model focused on semantic similarity of descriptions
Budget extraction uses a model fine-tuned on math word problems
Rephrasing responses to be more natural uses yet another model tuned for open-ended language generation

Using focused models allows each component to be optimized for its specific task. The intent classifier can be aligned to analyze requests in the context of an ordering flow, while the math extractor can precisely parse numerical values. Critically, using separate models prevents capabilities from interfering with each other — the budget extractor won’t confuse the agent by hallucinating intents.

This modular architecture of models coordinated by a core agent enables focused optimization while allowing generalist capabilities to combine to handle complex end-to-end tasks. It exemplifies the exciting potential of composing LLMs into rich agent models — grounding powerful language understanding in context-aware, specialized actions to create magical user experiences.

For example, when the agent predicts the user is trying to provide a budget amount, it pipes the message through a specialized math extraction model:

def _parse_budget_amount(self, budget_result: str) -> Optional[float]:
    match = re.search(r'\d+', budget_result)
    if match:
        return int(match.group())
    else:
        return None

def create_math_chain(self) -> Chain:
    model_path = self._get_math_model()
    llm = LlamaCpp(model_path=model_path, temperature=0, max_tokens=20, verbose=False, model_kwargs={"loglevel": logging.ERROR})

    prompt_template = """
        Solve the problem of finding the maximum amount from the following input text.
        If the amount is approximate or not clear enough, round up to the next number dividable by 10.
        Reply shortly and to the point, without explaining the reasoning, in 5 words.
        Input Text:
        {input_text}
        Answer:
    """

    prompt = PromptTemplate(
        input_variables=["input_text"],
        template=prompt_template,
    )

    math_chain = prompt | llm
    return math_chain

math_chain = self.chains.create_math_chain()
budget_result = math_chain.invoke({'input_text': budget_input})
budget_amount = self._parse_budget_amount(budget_result)
if budget_amount is not None:
  self.user_store.set_budget(user_id, budget_amount)

This focused model is fine-tuned to understand monetary amounts expressed in natural language (“around 20 bucks”, “no more than $15”, etc) and round to a clear numerical value, allowing the user to communicate their budget informally but the agent to still capture a structured amount.

Here is the additional section about keeping payment details secure by separating them from the LLM:

Securing Payment Details

When building an AI agent that handles sensitive data like payment information, it’s critical to keep that data isolated from the language model. Even if you’re only performing inference and not fine-tuning the model, payment details can inadvertently get logged in the LLM’s cache or output and potentially leak.

In this food ordering agent, we ensure payment security by:

Keeping the LLM focused solely on intent analysis and recommendation, never passing it any payment info
Returning a structured response from the agent with the intent o f classification
Having the client UI interpret the structured response and collect payment details using secure methods
Providing a separate API endpoint for the client to submit payment details directly, bypassing the LLM

Let’s walk through how this secure payment handling is implemented in the code.

Structured Agent Response

The agent’s handle_input method, after processing the user’s message, returns a structured Response object:

class ResponseStatus(Enum):
 ANSWER = "answer"
 ERROR = "error"
 REQUEST_ADDRESS = "request_address" 
 REQUEST_PREFERENCE = "request_preference"
 REQUEST_BUDGET = "request_budget"
 REQUEST_PAYMENT_DETAILS = "request_payment_details"
 ORDER_CREATED = "order_created"

class Response(BaseModel):
 status: ResponseStatus
 response: str
 order: Optional[OrderDetails] = None

The Response object contains:

status: an enum indicating the type of response (answer, request for info, error, etc.)
response: the actual text response to display to the user
order: an optional OrderDetails object with structured order info (dishes, totals, etc)

The key point is the status enum. When the agent has gathered all the necessary info and is ready for payment, it returns a Response with the REQUEST_PAYMENT_DETAILS status:

return Response(
 status=ResponseStatus.REQUEST_PAYMENT_DETAILS, 
 response=self._reply(f"I'll order for you the preferred dish. Please provide payment details - they will sent directly to the payment provider and will not be stored."),
 order=order
)

Notice the response text instructs the user to provide payment details, but does not actually attempt to collect them itself. The text response is kept intentionally generic and further enhanced by the reply model.

Separate Payment Endpoint

On the client side, the UI checks the status field of the Response. If it sees the REQUEST_PAYMENT_DETAILS status, instead of sending the next user message back to the /query endpoint for LLM processing, it switches to a secure payment flow.

This is handled by a separate /order endpoint in the backend that completely bypasses the LLM:

class OrderResource:
 def on_post_query(self, req: Request, resp: Response):
   # This handles passing user input to the LLM agent for intent analysis,
   # recommendation, and regular conversation 
   …
 
 def on_post_order(self, req: Request, resp: Response): 
   # This handles the actual order placement, including secure 
   # processing of payment details - no LLM involved!
 
   order_data = req.media
   
   # Extract payment details from request payload
   cc_data = order_data["cc_details"]
   payment_details = CCDetails.model_validate(cc_data)
   
   order = order_data["order"]
   order_details = OrderDetails.model_validate(order)
   
   order_to_add = Order(
   id=None, 
   order_details=order_details, 
   payment_details=payment_details
   )
   
   order_id = self.eats_api.book_order(order_to_add)
   
   success_response = AgentResponse(
   status=ResponseStatus.ORDER_CREATED,
   response=f"Order successfully created with ID: {order_id}",
   order=order_details
   )
  resp.status_code = falcon.HTTP_200 
   resp.text = success_response.model_dump_json()

The /order endpoint expects the client to POST a payload with two parts:

order: the OrderDetails object describing the dishes, totals, delivery info, etc. This should be the same OrderDetails that the LLM agent returned earlier.
cc_details: an object with the payment card details like card number, expiration, CVC, etc.

Critically, the /order handler does not pass any of this data to the LLM. It simply validates the POSTed data using Pydantic, combines it into a complete Order object, and passes it to the eats_api to finalize the order with the payment provider.

By cleanly separating the payment handling from the LLM logic, we keep sensitive payment data secure while still allowing the LLM agent to drive the overall ordering flow.

This architecture of using structured responses to signal intent, having the client handle payment flows based on those intents, and providing a separate non-LLM API endpoint for secure payment processing is a robust pattern for building LLM-powered applications that handle sensitive data.

The key takeaways are:

Never pass raw payment data (or any sensitive info) to an LLM, even for inference
Have your LLM agent return structured responses indicating when payment is needed
On the client side, route payment flows to secure non-LLM methods
Provide a separate API endpoint isolated from LLM logic to securely process payments
Use typed schemas like Pydantic models to validate and structure sensitive data

By following this approach of keeping LLM and payment logic separate, you can build powerful AI agents that engage users in natural language while still ensuring the security and privacy of sensitive information. The food ordering assistant shows this pattern in action, demonstrating you can have the best of both worlds — delightful, AI-driven user experiences with rock-solid data security.

Conclusion

This practical implementation shows how LLM-based agents can already handle complex real-world tasks like food ordering through clever combination of specialized models and thoughtful architecture. While current limitations in scalability and testing highlight room for growth, they set the stage for exciting advances. Stay tuned for our next article, where we’ll explore how microservices and advanced preference learning could transform this foundation into a powerful platform for building the next generation of AI agents that seamlessly understand and serve user needs.

Full source code for the application is available at https://github.com/deeprnd/llama_eats.