Introduction
On this planet of huge language fashions (LLMs), the price of computation is usually a vital barrier, particularly for intensive tasks. I lately launched into a challenge that required working 4,000,000 prompts with a mean enter size of 1000 tokens and a mean output size of 200 tokens. That’s practically 5 billion tokens! The normal strategy of paying per token, as is widespread with fashions like GPT-3.5 and GPT-4, would have resulted in a hefty invoice. Nevertheless, I found that by leveraging open supply LLMs, I might shift the pricing mannequin to pay per hour of compute time, resulting in substantial financial savings. This text will element the approaches I took and examine and distinction every of them. Please notice that whereas I share my expertise with pricing, these are topic to alter and will range relying in your area and particular circumstances. The important thing takeaway right here is the potential price financial savings when leveraging open supply LLMs and renting a GPU per hour, somewhat than the particular costs quoted. In case you plan on using my beneficial options on your challenge, I’ve left a few affiliate hyperlinks on the finish of this text.
ChatGPT API
I carried out an preliminary check utilizing GPT-3.5 and GPT-4 on a small subset of my immediate enter knowledge. Each fashions demonstrated commendable efficiency, however GPT-4 constantly outperformed GPT-3.5 in a majority of the circumstances. To provide you a way of the associated fee, working all 4 million prompts utilizing the Open AI API would look one thing like this:
Whereas GPT-4 did supply some efficiency advantages, the associated fee was disproportionately excessive in comparison with the incremental efficiency it added to my outputs. Conversely, GPT-3.5 Turbo, though extra inexpensive, fell quick when it comes to efficiency, making noticeable errors on 2–3% of my immediate inputs. Given these components, I wasn’t ready to speculate $7,600 on a challenge that was…