Customize Token length for Vertex AI Model Garden API call using langchain

Question

I have deployed Llama2 in the VertexAI Model Garden and am able to use the API endpoint without issues.

But when I started engineering my prompts something came up. I am providing a lot of context to the model and its answer always start with a repetition of the input. Since that is so long, the majority of the actual answer is cut off.

So I wanted to increase the maximum token size that is created.

Here is a my code:

The rough idea is, that the user asks a question. First I extract from a Vector Store a piece of text that should contain the answer and then want the AI to summarize the text so that it answers the question.

from langchain.llms.vertexai import VertexAIModelGarden

llm = VertexAIModelGarden(
    project=...,
    endpoint_id=...,
    location=...,
)

prompt_template = """<s>[INST] <<SYS>>You are a helpful, respectful and honest assistant. If you don't know the answer
    to question don't share false information.
    Use the context below to answer the provided questions: 
    Context: {context}
    <</SYS>>
    Question: {question}[/INST]
    """

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

chain = LLMChain(llm=llm, prompt=PROMPT)

# results is a list of Documents, so basically a bunch of text

inputs = [{"context": result.page_content, "question": question} for result in results]

final_result = chain.apply(inputs)

So fundamentally it boils down to whether I can provide to VertexAIModelGarden a parameter like max_token_length or max_new_token. Nothing in the official documentation stood out to me in that regard: https://api.python.langchain.com/en/latest/llms/langchain.llms.vertexai.VertexAIModelGarden.html?highlight=vertexai#langchain.llms.vertexai.VertexAIModelGarden

Thanks!

semper · Accepted Answer

Not sure if I'm too late to answer but I've ran into this issue too and hope this helps others after having spent hours going through the source code myself. Basically, you need to use the allowed_model_args to specify which arguments to use and give them when you call the model. For example, something like this:

prompt = 'your prompt here'
llm = VertexAIModelGarden(
    project=...,
    endpoint_id=...,
    allowed_model_args=["temperature", "max_tokens"]
)

llm(prompt, max_tokens=4000, temperature=0.0)

Hope this helps!

Customize Token length for Vertex AI Model Garden API call using langchain

Tags:

langchain

google-cloud-vertex-ai

llama

Topjer

1 Answers

semper

Recent Activity

Donate For Us

Customize Token length for Vertex AI Model Garden API call using langchain

Tags:

langchain

google-cloud-vertex-ai

llama

Topjer

1 Answers

semper

Related questions

Recent Activity

Donate For Us