Posts for: #LLMs

Interactive CV Bot: Automating a Less Than Fun Task

It all started with a colleague asking “Could you send me a copy of a recent CV?” In the world of consulting, it’s a fairly regular ask—albeit a bit rarer for me being non-client facing. The cogs started turning and I thought, well this will take me an hour to collate and another few to format. Then, an idea started to form—what if I could invest a bit more time now, and reduce time spent fulfilling similar requests in the future.
[Read more]

Meta-Writing: Creating a Hugo Content Assistant with VSCodium and Roo Code

This post was written almost entirely by an LLM. A reflection on building LLM-powered writing assistance while maintaining authentic voice The Challenge of Authentic LLM Assistance After well over a year of sporadic posting, I found myself facing a familiar challenge: maintaining consistency in voice and technical depth across content while leveraging the productivity benefits of LLM assistance. The solution emerged through an interesting meta-exercise—using Roo Code (the agentic coding plugin for VSCode/VSCodium) to create its own content creation persona.
[Read more]

Deploying an OpenAI Compatible Endpoint on Runpod with vLLM and K6 Load Testing

This post explores renting a cloud GPU from RunPod and using the vLLM inference engine to run a Large Language Model made available via an OpenAI compatible endpoint, and then load testing that endpoint with K6. What is RunPod? RunPod is a paid cloud GPU provider. It offers: Pods We will utilise a pod in this example. A pod is a container with one or more GPUs attached. We specify the docker image and the configuration.
[Read more]

Converting a Pytorch Model to Safetensors Format and Quantising to Exl2

A set of notes on converting a transformers model from Pytorch format to Safetensors format and then quantising to ExLlamaV2 (Exl2) using a code based calibration dataset. This was inspired by posts which reported coding LLMs quantised to Exl2 format using the wikitext default calibration dataset resulted in relatively lower quality outcomes. ExLlamaV2 is the excellent work of Turboderp. It is an inference library for running LLMs on consumer GPUs. It is fast and supports multi-GPU hosts.
[Read more]

A Conversation with Q on the Nature of Time

Recently I have been playing with open-source LLMs (Large Language Models), LLMs being the technology behind ChatGPT. While I have mainly been checking out how they can help with software development and other language based tasks, I took a moment to have a chat with Q. I should note, the Intellectual Property for Q belongs to Paramount, no profit was made from this post and no breach of copyright is intended.
[Read more]