Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Comments on What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute?
Post
What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute?
+0
−5
What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes)
I don't use provisioned throughput.
I call Gemini as follows:
YOUR_PROJECT_ID = 'redacted'
YOUR_LOCATION = 'us-central1'
from google import genai
client = genai.Client(
vertexai=True, project=YOUR_PROJECT_ID, location=YOUR_LOCATION,
)
model = "gemini-2.5-pro-exp-03-25"
response = client.models.generate_content(
model=model,
contents=[
"Tell me a joke about alligators"
],
)
print(response.text, end="")
3 comment threads