Picture of the author

Hi, I'm Joshua Farin

SaaS
Landing
Admin
 
Docker Logo

Docker

Kafka Logo

Kafka

RabbitMQ Logo

RabbitMQ

React Logo

React

Kubernetes Logo

Kubernetes

PostgreSQL Logo

PostgreSQL

Gradle Logo

Gradle

Exploring the Future of AI and Virtualization: Our Journey with LLM Fine-Tuning, Hugging Face Models, Proxmox, and Wake-on-LAN

The project I’m working on has truly been a fun weekened, blending AI technologies with virtualization and GPU-powered computing. It’s been a steep learning curve, but the rewards have been substantial as I’ve pushed through various technical challenges. This project combines AI, machine learning, virtualization, and various paradigms, and I’m excited to share the process, the roadblocks I’ve encountered, and the progress I’ve made.

The AI Core: Fine-Tuning LLMs and Hugging Face Models

At the core of this project is large language models (LLMs) and their ability to generate meaningful output based on vast datasets. I started with Hugging Face's pre-trained models, specifically focusing on transformers. These models have revolutionized natural language processing (NLP) by enabling machines to understand and generate human-like text. However, the initial goal was to fine-tune these models to suit specific use cases—primarily code generation, writing documentation, and visualizing analytics for applications like functional programming in Haskell, or scalable microservices architecture.

Overview

Fine-Tuning and Challenges

Fine-tuning transformers models on a large dataset of code was no simple task. I had to carefully select the right training data to ensure the models could handle a wide range of code-generation requests. I worked extensively with Hugging Face's Transformers library to adjust the models for software-specific tasks, such as generating Python functions or parsing logic based on user input.

A major challenge was the computational cost of fine-tuning such large models. Even with a high-end GPU, the process was time-consuming and resource-intensive. During the early stages of fine-tuning, I encountered frequent memory bottlenecks, and GPU memory overflow was a regular issue, particularly when working with larger datasets. To address this, I leveraged techniques like gradient checkpointing and mixed-precision training to optimize memory usage, which ultimately helped speed up training times.

Progress and Breakthroughs

After several iterations of fine-tuning and experimentation, the model started producing more coherent and structured code outputs. We now have a system that can generate code from natural language prompts with increasing accuracy. The goal of having an AI worker to help with automating software development tasks, like code generation, writing documentation, and visualizing analytics, is taking shape.

Machine Learning Frameworks: Grafana and Prometheus for Analytics

To monitor and analyze our system's performance, I integrated Grafana and Prometheus, two powerful tools that provide real-time analytics and visualization capabilities. These frameworks enabled us to track key metrics during training, like GPU usage, memory consumption, and processing time. This real-time data has been invaluable in diagnosing issues and optimizing the system for better performance.

Challenges with Monitoring

Initially, configuring Prometheus to collect custom metrics from our machine learning model was tricky. I had to write custom exporters and configure the Prometheus server to track the performance of the GPUs during model training. Once everything was in place, Grafana proved to be a powerful tool for visualizing training progress. With live graphs showing GPU utilization and throughput, I could immediately identify if the model was running inefficiently or if there were resource constraints.

Proxmox Virtualization and Headless Debian: A Game-Changer for Experimentation

Another key part of this project has been the virtualization environment. We decided to use Proxmox to create virtual machines (VMs) running Debian, set up in a headless configuration. The idea behind this setup was to allow us to run experiments and tests in isolated environments without the overhead of a GUI.

Setup

The VLAN Setup: Exploring Networking Concepts

To dive deeper into networking and virtualization, I configured VLANs (Virtual Local Area Networks) within Proxmox. This was crucial for setting up a multi-networked environment that mimicked real-world production conditions. Setting up VLANs and ensuring that each VM had proper network isolation was tricky. I spent several hours troubleshooting network interfaces, but once the bridged network interfaces were correctly configured and the VLAN tags were properly assigned, everything began functioning as expected.

GPU Passthrough: Getting the Power of High-End GPUs

I also aimed to take full advantage of the high-end NVIDIA RTX 4070 Ti GPU by configuring GPU passthrough to the Debian VMs. The goal was to allow our VMs to access the GPU directly, bypassing the host system to enable high-performance computing tasks like deep learning model training.

In the early stages, the GPU wasn’t recognized in the VMs. After some research and configuration adjustments, including disabling Secure Boot and adjusting kernel parameters, the GPU passthrough was successfully configured. However, GPU lockups persisted, requiring me to fine-tune the IOMMU settings and ensure proper resource allocation.

Wake-on-LAN Setup: A Handy Remote Access Tool

One useful configuration was enabling Wake-on-LAN (WOL) for the host Proxmox Node, which hosted the Debian VMs. This allowed me to remotely wake up machines in our home lab from my laptop, saving energy and providing an extra layer of convenience. With Wake-on-LAN, I could send a magic packet to power up the VMs when needed, even if the system was turned off. This feature became crucial for managing multiple VMs and ensuring everything was ready for testing or training without physically interacting with each server.

Progress and Key Milestones

After overcoming these technical hurdles, I successfully set up a headless Debian VM with GPU passthrough, enabling us to run machine learning tasks directly on the virtualized environment. This setup provides flexibility to experiment with different configurations and test the limits of our AI models without relying on a dedicated physical machine.

Looking Forward: The Road Ahead

As we move forward, there are several areas we are excited to explore and refine: