Infrastructure for the Deployment of Large Language Models: Challenges and Solutions

Authors

  • Tomasz Walkowiak Wrocław University of Science and Technology
  • Bartosz Walkowiak Wrocław University of Science and Technology

Abstract

Large Language Models are increasingly prevalent, and their capabilities are advancing rapidly due to extensive research in this field. A growing number of models are being developed, with sizes significantly surpassing 70 billion parameters. As a result, the ability to perform efficient and scalable inferences on these models is becoming crucial to maximize the utilization of valuable resources such as GPUs and CPUs. This thesis outlines a process for selecting the most effective tools for efficient inference, supported by the results of experiments. Additionally, it provides a comprehensive description of an end-to-end system for the inference process, encompassing all components from model inference and communication to user management and a user-friendly web interface. Furthermore, we detail the development of an LLM chatbot that leverages the function-calling capabilities of LLMs and integrates various external tools, including weather prediction, Wikipedia information, symbolic math, and image generation.

Additional Files

Published

2025-07-09

Issue

Section

Applied Informatics