LLM deployment options other than traditional hyperscalers - AI ML etc. (AI courses for senior IT professionals)

Reading Time: 4 minutes

Once we have created our LLM application (like a YouTube video summariser or Multi-lingual text summariser or anything else), we can deploy it on Hyperscalers like AWS Sagemaker, Google Cloud Platform, Azure Cloud, etc.

However, nowadays there are other options as well which are easier and cost-effective. Let us learn about them in this article.

LLM application deployment options

1. Deploy for Testing/ Educational purposes

2. Using already deployed open-sourced models

3. Using already deployed on AI/ LLM specific cloud

4. GPU on rent

1. Deploy for Testing/ Educational purposes

a) Streamlit

Streamlit helps you in making a basic but beautiful UI for your LLM applications.

If you have created an AI /LLM app and quickly want to share it with your friends/ colleagues then Streamlit is a good option.

It helps your friends and colleagues to try out the application and provide you the feedback.

b) Gradio

Gradio is another deployment option for your LLM application. Gradio was later acquired by Huggingface and is also now known as Huggingface Spaces.

It has more or less a standard UI but we also have an option to customize it. It just takes a few lines of code and a few minutes of your time to build your LLM application’s UI.

2. Using already deployed open-sourced models

c) AWS Bedrock

There are multiple open-sourced LLMs out there. At the time of writing this article, 3000+ open-sourced LLMs are present.

How easy it would be to use any of them without deploying them ourselves?

To solve this problem, we have multiple options. One is AWS Bedrock.

AWS Bedrock provides LLMs like Titan from Amazon, Lllama 3 from Meta, Claude from Antropic, Mistral from Mistral AI, and many more

We can simply call them using an API or AWS Bedrock also provides the code to use it.

As per their website:

“With Bedrock’s serverless experience, you can get started quickly, privately customize FMs with your own data, and easily integrate and deploy them into your applications using the AWS tools without having to manage any infrastructure.”

They also provide a hands-on lab of 3 hours to learn how to use these open-sourced LLMs.

Please note that the pricing is not covered in credits. So, even if you have AWS credits, using models from AWS Bedrock will still cost you money

d) NVIDIA NIM

NVIDIA NIM is another platform where there are almost all pre-deployed LLMs available.

With NVIDIA NIM, we can not only access the latest models like Llama 3.1 but also other models for Visual design, Speech, vision etc.

We can also bring our digital avatars using their models and integrate them into our applications.

e) Azure AI Services

Azure AI Services also has a catalogue of multiple LLMs. We can choose to deploy any model we want.

Like other cloud service providers, it also has a playground where we could test the model. Once satisfied, we can create an endpoint and use the model in our application.

The good part of Azure AI services is that it has models from OpenAI also available.

f) Azure OpenAI service

As the name suggests, Azure OpenAI service is a dedicated space to access all OpenAI’s models.

Why we should use Azure OpenAI service and not call models directly using OpenAI’s API, you ask?

Well, Azure OpenAI service has added an extra layer of security which helps you determine whether the response generated by the model is Toxic, Biased, violent, self-harm etc. or not.

3. Using models already deployed on AI/LLM-specific cloud

g) Replicate

Using deployed models on a hyperscaler cloud platform is always an option. However, these options could be expensive.

At the same time, performance could also be less as these cloud platforms were not designed specifically for LLMs.

So, there are AI/LLM-specific cloud platforms have been designed that claim to give better performance than traditional cloud providers.

Replicate is one of them. If you are still not on AWS (or any other hyperscaler’s ecosystem) then Replicate is a good option to start with.

h) Deep Infra

Deep Infra is another option specifically designed to access pre-deployed Large Language Models.

You can access pre-deployed open-source models or deploy your customised models as well using their serverless GPUs.

In terms of pricing, DeepInfra provides both per-token and inference execution time pricing.

DeepInfra also offers LangChain integration for supported LLMs.

4. GPU on rent

i) Runpod

We know that the computation power required either to pre-train an LLM or fine-tune an LLM is huge. We need to buy multiple GPUs if we have to pre-train or fine-tune a Large Language Model.

So, how about if we could get GPUs on rent? It would save us a lot of money.

There are multiple platforms now have started providing GPUs on rent. Runpod is one of them.

Runpod is also used to deploy your LLM. It is cheaper than the above options but requires more work to get going.

j)Vast AI

Vast AI is another platform that provides GPUs on rent. The company claims that you can reduce your computing cost by 3-5x with their cloud GPU rentals.

As per their website

“Vast.ai is a market based cloud computing platform focused on reducing the costs and friction of compute-intensive workloads, enabling anyone to easily leverage large-scale GPU liquidity. Our software allows all compute providers large and small to easily monetize their spare capacity. Our search engine allows users to quickly find the best options and deals for compute services according to their specific requirements.”

k) Lambda Labs

Lambda Labs is another GPU rental provider. Lambda Labs is one of the first cloud providers to make NVIDIA H100 Tensor Core GPUs available on-demand in a public cloud.

Lambda Reserved Cloud is now available with the NVIDIA H200 Tensor Core GPU. H200 is packed with 141GB of HBM3e running at 4.8TB/s. That’s nearly double the GPU memory at 1.4x faster bandwidth than H100.

l) PaperSpace

Another alternative to get GPUs on rent is PaperSpace. You could explore all the options above and choose the best suited for your needs.

Conclusion

LLMs provide one or more use cases for almost all industries and organizations. As per our requirements and budget, we have multiple options available other than cloud hyperscalers.

These options are easier, cheaper and in some cases faster.

So, now we are no longer dependent on these cloud hyperscalers to deploy our AI/LLM applications.

AI + LLM Course for Senior IT Professionals

In case you are looking to learn AI + LLM in a very simple language in a live online class from an instructor, check out the details here

Pricing for AI courses for senior IT professionals – https://www.aimletc.com/ai-ml-etc-course-offerings-pricing/

Check out these free videos to start your AI learning journey – https://www.aimletc.com/free-course-1-introduction-to-ml-and-ai/

Post Views: 430

LLM application deployment options

1. Deploy for Testing/ Educational purposes

a) Streamlit

b) Gradio

2. Using already deployed open-sourced models

c) AWS Bedrock

d) NVIDIA NIM

e) Azure AI Services

f) Azure OpenAI service

3. Using models already deployed on AI/LLM-specific cloud

g) Replicate

h) Deep Infra

4. GPU on rent

i) Runpod

j)Vast AI

k) Lambda Labs

l) PaperSpace

Conclusion

AI + LLM Course for Senior IT Professionals

Nikhilesh Tayal

Related Posts

51 Common Libraries/ Packages needed to build AI/ LLM apps

FAQs – AI courses for Senior IT Professionals

How is memory handled in OpenClaw AI agents?