Communities

The Great Outdoors

Photography & Video

Scientific Speculation

Electrical Engineering

Languages & Linguistics

Software Development

Community Proposals

tag:snake search within a tag

answers:0 unanswered questions

user:xxxx search by author id

score:0.5 posts with 0.5+ score

"snake oil" exact phrase

votes:4 posts with 4+ votes

created:<1w created < 1 week ago

post_type:xxxx type of post

Notifications

Mark all as read See all your notifications »

Q&A

Posts Tags Edits

Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute?

+0

−5

What's the maximum hit rate, if any, when using Claude, Gemini, Llama and Mistral via Google Cloud Compute? (Example of maximum hit rate: 1M input tokens/minutes)

I don't use provisioned throughput.

I call Gemini as follows:

YOUR_PROJECT_ID = 'redacted'
YOUR_LOCATION = 'us-central1'
from google import genai
client = genai.Client(
 vertexai=True, project=YOUR_PROJECT_ID, location=YOUR_LOCATION,
)
model = "gemini-2.5-pro-exp-03-25"
response = client.models.generate_content(
 model=model,
 contents=[
   "Tell me a joke about alligators"
 ],
)
print(response.text, end="")

google-cloud-platform

posted 26 days ago

CC BY-SA 4.0

24d ago

Franck Dernoncourt‭

-112 reputation 33 5 -73 21

Raw

Markdown

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

3 comment threads

Not a good fit for a Q&A site (2 comments)

If you cross-post the same question on multiple sites, you should include links to all other versions... (10 comments)

What have you tried? (2 comments)

1 answer

Score Active Age

+0

−0

Worked for Franck Dernoncourt‭

The following users marked this post as Works for me:

User	Comment	Date
Franck Dernoncourt‭	(no comment)	May 21, 2025 at 05:00

Quota from https://cloud.google.com/vertex-ai/docs/quotas:

Request type	Requests per minute
Resource management (CRUD) requests1	600
Job or long-running operation (LRO) submission requests	60
Online prediction requests2	30,000
Online prediction request throughput	1.5 GB
Online explain requests	600
Vertex AI TensorBoard Time Series read requests	60,000
ML Metadata (CRUD) requests	12,000
generative AI Caching (CRUD) requests	200
Vertex AI Vizier (CRUD) requests	6,000
Vertex AI Feature Store online serving requests	300,000
Vertex ML Metadata requests	12,000
Number of count tokens or compute tokens requests	3,000

Thanks Googler m1nherz for pointing me to it.

posted 23 days ago

CC BY-SA 4.0

Franck Dernoncourt‭

-112 reputation 33 5 -73 21

Copy Link

Raw

Markdown

0 comment threads

Sign up to answer this question »