Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Post History
What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes) I don't use provisioned throughpu...
#2: Post edited
- What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes)
- I don't use provisioned throughput.
- What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes)
- I don't use provisioned throughput.
- ---
- I call Gemini as follows:
- ```python
- YOUR_PROJECT_ID = 'redacted'
- YOUR_LOCATION = 'us-central1'
- from google import genai
- client = genai.Client(
- vertexai=True, project=YOUR_PROJECT_ID, location=YOUR_LOCATION,
- )
- model = "gemini-2.5-pro-exp-03-25"
- response = client.models.generate_content(
- model=model,
- contents=[
- "Tell me a joke about alligators"
- ],
- )
- print(response.text, end="")
- ```
#1: Initial revision
What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute?
What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes) I don't use provisioned throughput.