Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute?

+0
−5

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes)

I don't use provisioned throughput.


I call Gemini as follows:

YOUR_PROJECT_ID = 'redacted'
YOUR_LOCATION = 'us-central1'
from google import genai
client = genai.Client(
 vertexai=True, project=YOUR_PROJECT_ID, location=YOUR_LOCATION,
)
model = "gemini-2.5-pro-exp-03-25"
response = client.models.generate_content(
 model=model,
 contents=[
   "Tell me a joke about alligators"
 ],
)
print(response.text, end="")
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

3 comment threads

Not a good fit for a Q&A site (2 comments)
If you cross-post the same question on multiple sites, you should include links to all other versions... (10 comments)
What have you tried? (2 comments)

1 answer

+0
−0

Quota from https://cloud.google.com/vertex-ai/docs/quotas:

Request type Requests per minute
Resource management (CRUD) requests1 600
Job or long-running operation (LRO) submission requests 60
Online prediction requests2 30,000
Online prediction request throughput 1.5 GB
Online explain requests 600
Vertex AI TensorBoard Time Series read requests 60,000
ML Metadata (CRUD) requests 12,000
generative AI Caching (CRUD) requests 200
Vertex AI Vizier (CRUD) requests 6,000
Vertex AI Feature Store online serving requests 300,000
Vertex ML Metadata requests 12,000
Number of count tokens or compute tokens requests 3,000

Thanks Googler m1nherz for pointing me to it.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »