Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

Post History

22%

+0 −5

Q&A What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute?

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes) I don't use provisioned throughpu...

1 answer · posted 26d ago by Franck Dernoncourt‭ · last activity 23d ago by Franck Dernoncourt‭

Question google-cloud-platform

#2: Post edited by

Franck Dernoncourt‭ · 2025-05-20T18:55:22Z (24 days ago)

Copy Link

Raw

Markdown

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes)
I don't use provisioned throughput.

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes)
I don't use provisioned throughput.
---
I call Gemini as follows:
```python
YOUR_PROJECT_ID = 'redacted'
YOUR_LOCATION = 'us-central1'
from google import genai
client = genai.Client(
vertexai=True, project=YOUR_PROJECT_ID, location=YOUR_LOCATION,
)
model = "gemini-2.5-pro-exp-03-25"
response = client.models.generate_content(
model=model,
contents=[
"Tell me a joke about alligators"
],
)
print(response.text, end="")
```

#1: Initial revision by

Franck Dernoncourt‭ · 2025-05-17T23:54:04Z (26 days ago)

Copy Link

Raw

Markdown

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute?

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes)

I don't use provisioned throughput.

google-cloud-platform

Communities

Post History