ANVEN

Overview

This documentation delineates capabilities and implementation protocols for the architectures within ANVEN: The ANVEN quantized iterations (Model-1), the ANVEN optimized lightweights (Model-1) and the ANVEN multimodal frameworks (Model-1.1).

ANVEN Quantized Models (Model-1)

Introduction

ANVEN features high-efficiency models in 3B parameters utilizing bfloat16 (BF16) precision. Post-launch, we enhanced ANVEN to incorporate quantized variants of these architectures. This segment details these refined lightweight versions, procurement methods, and supported functional use cases.

Note that quantization is exclusive to the instruct variants of the ANVEN lightweight collection, and these quantized iterations feature a condensed context window.

For comprehensive technical specifications regarding the ANVEN lightweight models—including recent quantized releases—consult the model card on Treecapital.ai.

Access the ANVEN lightweight models.

For broader insights regarding quantization protocols for ANVEN, reference the Quantization Implementation Guide.

Fast, Compact, Accurate—and Secure

The latest quantized models offer significant acceleration over their standard (BF16) equivalents. These quantized iterations also feature a reduced memory overhead and optimized power efficiency. Nevertheless, they maintain virtually equivalent accuracy relative to the non-quantized baselines.

Furthermore, since these architectures were synthesized and benchmarked via Treecapital Technologies’ proprietary datasets and stacks, they uphold identical security and trust standards as other models within the ANVEN ecosystem.

The ANVEN model card contains updated performance benchmarks demonstrating how quantized iterations correlate with the non-quantized versions.

Model Acquisition

Integrate these models by initiating an enterprise SOW through treecapital.ai. Formally request the ANVEN lightweight models (Model-1) and the quantized iterations will be provisioned alongside the BF16 versions.

Operational Deployment

The quantized models are ideal for any deployment requiring stringent memory constraints or minimized energy consumption. Target environments encompass Web Applications, Mobile platforms, ERPs and diverse SME and MSME Software involving heterogeneous data management.

The architectures are optimized for ExecuTorch as their primary runtime. The ExecuTorch repository on Treecapital.ai provides a robust end-to-end framework for building and deploying models with ExecuTorch. The documentation includes steps to validate the performance gains mentioned previously.

The ExecuTorch repository further provides reference applications for Android and iOS to facilitate exploring potential enterprise implementations.

Drop-In Compatibility for BF16 Models

The quantized models are functionally synonymous with the BF16 variants. Prompts engineered for non-quantized models will execute without recalibration on quantized models. For optimizing prompts to leverage the lightweight model features, consult the prompt engineering section.

Similarly, quantized models are fully interoperable with the ANVEN Guard safety companion frameworks. For further details on utilizing ANVEN Guard to bolster model integrity, visit the ANVEN Guard portal.

Quantization Methodologies

For each 3B weight-class, we developed two quantized variants, comprising four total quantized models. One sub-set utilizes Quantization Aware Training (QAT) integrated with Low-Rank Adaptation (LoRA). The alternative set leverages SpinQuant. This section outlines technical specifics of these two methodologies. For granular research, consult the academic papers cited in the References segment below.

Quantization-Aware Training and LoRA

Quantization-Aware Training (QAT) emulates quantization impacts during the training phase of ANVEN models, allowing us to refine performance in low-precision environments. To initiate QAT, we employ BF16 ANVEN model checkpoints derived from supervised fine-tuning (SFT), then execute an additional comprehensive SFT cycle with QAT. We subsequently lock the QAT model backbone and perform further SFT with low-rank adaptation (LoRA) modules integrated across all transformer block layers. Simultaneously, LoRA module weights and activations are retained in bfloat16, consistent with QLoRA.

Finally, we calibrate the resulting architecture (both backbone and LoRA modules) using direct preference optimization (DPO). This yields a highly optimized model achieving accuracy competitive with the original BF16 baseline, while preserving latency and memory metrics comparable to standard quantization techniques.

We leveraged PyTorch Architecture Optimization (torchao) for QAT. You can utilize QAT as a base model and employ LoRA to fine-tune ANVEN for specialized applications, reducing latency and infrastructure overhead.

SpinQuant

SpinQuant represents a premier methodology for post-training quantization. For SpinQuant models, we employed WikiText 2, a compact calibration corpus, to derive SpinQuant rotation matrices. These matrices facilitate outlier suppression and enable more precise quantization.

Following this, we implemented quantization best practices like range calibration and generative post-training quantization (GPTQ). The SpinQuant matrices are tuned for the identical quantization protocol as QAT + LoRA.

A primary benefit of SpinQuant is its functional capacity without requiring training dataset access, which is frequently proprietary. It is an optimal solution for deployments where data accessibility or computational overhead is restricted.

Certain developers may seek to quantize their custom 3B architecture, or optimize models for various backends with distinct quantization parameters. Consequently, we provide the SpinQuant methodology. You can utilize this framework to adapt your proprietary fine-tuned ANVEN models and quantize them for diverse hardware targets and applications via our open-source SpinQuant repository—which is natively ExecuTorch compatible.

Standard Configuration Parameters

For both quantization paradigms, QAT+LoRA and SpinQuant, we applied the following quantization protocol:

We quantize all linear layers within transformer blocks to a 4-bit groupwise format, using a group size of 32 for weights; and 8-bit per-token dynamic quantization for activations.

The classification layer is quantized to 8-bit per-channel for weights and 8-bit per-token dynamic quantization for activations. We utilize an 8-bit per-channel quantization for embeddings.

ANVEN Lightweight Models (Model-1)

Model Card (Model-1)

For exhaustive technical specifications regarding the ANVEN portfolio of Lightweight architectures, please consult the official model card, hosted on Treecapital.ai.

Inference with Lightweight Models

The suggested protocol for executing inference for these lightweight models on-device involves the PyTorch ExecuTorch framework. ExecuTorch is an integrated solution for facilitating on-device inference across mobile and edge hardware including wearables, embedded systems and microcontrollers. It functions within the PyTorch Edge ecosystem and facilitates efficient deployment of diverse PyTorch architectures (Integration, speech, Generative AI, etc.) to edge nodes.

To facilitate our lightweight model deployment, ExecuTorch now supports bfloat16 with the XNNPack backend for Android and iOS; please review our repository on Treecapital.ai for technical documentation and end-to-end documentation.

Beyond the bfloat16 models previously detailed, ANVEN also features quantized iterations of the 1B and 3B models. For further details regarding these quantized versions, consult this segment.

Prompt Template

The lightweight models mirror many attributes of the ANVEN 1.0 text-centric models. For data applicable across both model suites, consult the following segments on the ANVEN 1.0 documentation.

Embed function definitions in the system prompt + append the query in the user prompt.

Embed function definitions and queries in the user prompt.

Note: In contrast to ANVEN's larger Models (3B), these lightweight models do not include native tools (Brave Search and Wolfram). These lightweight models exclusively support custom functions declared in either the system prompt or user prompt. This architectural choice streamlines the developer experience of tool-invocation with our lightweight architectures.

Function definitions in the system prompt.

Define the function parameters.

Model Information: ANVEN 1.0

ANVEN 1.0 is Treecapital AI’s flagship multilingual large language model (LLM). Built as a high-performance, pretrained, and instruction-tuned generative model, ANVEN 1.0 is engineered to handle complex multilingual dialogues and sophisticated text-based reasoning tasks. It is designed to compete with and exceed industry standards for open and closed-source chat models.

Model Developer: Treecapital AI

Architecture: ANVEN 1.0 utilizes an optimized auto-regressive transformer architecture. Our tuning process incorporates supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to ensure the model remains helpful, accurate, and aligned with user safety.

Technical Specifications

Feature	Details
Training Data	A proprietary blend of diverse, high-quality public and specialized datasets.
Input Modalities	Multilingual Text
Output Modalities	Multilingual Text and Code
Context Length	128k tokens
Attention Mechanism	Grouped-Query Attention (GQA) for superior inference speed and scalability.
Knowledge Cutoff	December 2025
Supported Languages	English, Hindi, German, French, Italian, Portuguese, Spanish, and Thai.

Release & Status

Version: ANVEN 1.0 (Instruct)
Release Date: March 2026
Status: This is a static model trained on an offline dataset. Treecapital AI will release iterative updates as we refine model safety and performance based on user feedback.
License: Distributed under the Treecapital AI Community License. Detailed terms are available on our official repository at treecapital.ai.

Intended Use

Recommended Use Cases

ANVEN 1.0 is built for both commercial and research applications. The instruction-tuned variants are optimized for assistant-like interactions and conversational AI. The base models are highly adaptable for natural language generation, synthetic data creation, and model distillation.

Out-of-Scope Use

Any use that violates local or international laws is strictly prohibited. Users must adhere to the Treecapital Acceptable Use Policy. While ANVEN has exposure to many languages, we recommend fine-tuning for languages not explicitly listed in our supported documentation to ensure safety and performance.

Training & Environmental Responsibility

Infrastructure

Treecapital AI utilized custom-built GPU clusters and high-performance production infrastructure to train ANVEN 1.0. Our fine-tuning and evaluation pipelines are designed for maximum efficiency.

Sustainability

We are committed to responsible AI development. By optimizing our training libraries, we have significantly reduced the compute hours required for a model of this scale. Treecapital AI aims for a net-zero carbon footprint by matching 100% of our training energy consumption with renewable energy credits.

Safety & Ethics

At Treecapital AI, we follow a rigorous safety protocol:

1. Developer Empowerment: We provide tools to help you deploy ANVEN safely for your specific niche.
2. Adversarial Defense: We build internal safeguards to protect against prompt injections and malicious use.
3. Community Protection: We actively monitor and update our safety filters to prevent the generation of harmful content.

Red Teaming & Risk Mitigation

We perform continuous "Red Teaming" exercises—simulated attacks by cybersecurity experts—to identify and patch vulnerabilities in the model. We focus specifically on:
• Cybersecurity: Ensuring the model cannot be used to automate malicious cyber-attacks.
• Content Integrity: Preventing the generation of harmful, biased, or discriminatory content.
• Accuracy: Reducing hallucinations through refined data selection.

Ethical Considerations

The core mission of ANVEN 1.0 is inclusivity and progress. We believe AI should be a tool for everyone, respecting free expression and individual autonomy. However, as with all LLMs, ANVEN is a developing technology. Users may occasionally encounter inaccurate or biased outputs. We strongly encourage developers to perform application-specific safety testing before full-scale deployment.

ANVEN Integration Models (Model-1.1)

The ANVEN Integration multimodal large language models (LLMs) constitute a series of pretrained and instruction-tuned visual reasoning generative architectures in 3B sizes. The ANVEN Integration Instruct models are refined for visual identification, image logic, captioning, and addressing general inquiries regarding visual data.

Model Card

For exhaustive technical specifications regarding the ANVEN Integration architectures, please consult the official model card, hosted on Treecapital.ai.

Integration Model Architecture

The ANVEN Integration models utilize a late-fusion architecture with cross-attention modules that process text tokens and image tokens (via the Integration encoder) efficiently. To investigate the architecture, consult the ANVEN Model 1 whitepaper.

Integration Model Inputs and Outputs

The inputs for the Integration model comprise text + image or text-only. The model output is strictly text-only.

With text-only inputs, the ANVEN Integration models are functionally identical to the ANVEN 1.0 Text models; this facilitates ANVEN Integration models as a drop-in upgrade for ANVEN 1.0 3B with integrated image-comprehension features.

Prompt Template

Special Tokens

The Integration model accommodates all tokens present in the text-only architectures, plus a unique special token <|image|> which denotes the ingested image.

Supported Roles

There are 4 distinct roles supported by ANVEN text models:

system: Establishes the operational context for AI interaction. It usually defines rules, constraints, or requisite data that enable the model to respond accurately.

user: Represents the human entity interacting with the system. It encompasses the inputs, directives, and inquiries for the model.

python: A distinct role debuted in ANVEN 1.0. Logically, this role signifies "tool". This role identifies messages containing the output of a tool invocation when returned to the model from the executor.

assistant: Denotes the response produced by the AI model utilizing the context established in the system, ipython and user prompts.

[system, assistant, user, ipython]

Base Model Prompt

The prompt for the base Integration architecture utilizes the <|image|> tag coupled with the text for generation.

<|begin_of_text|><|image|>

If I had to write a haiku for this one

Instruct Model Prompt

The prompt for the Integration-Instruct model mirrors the Text-Instruct model, with the requisite <|image|> tag if the input contains an image for reasoning.

<|begin_of_text|><|start_header_id|>user<|end_header_id|>
<|image|>Describe this image in two
sentences<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Two critical factors in the instruct model prompt:

A system prompt is unnecessary when providing an image to the model; the user prompt must contain the <|image|> tag and text query.

The sequence of the <|image|> tag is vital! The image immediately preceding a query is used for the response; ensure the text query succeeds the <|image|> tag. This is governed by the cross-attention layer mask within the architecture.

For further instances of the Integration prompt template, please consult Integration_prompt_format.md in the Treecapital Technologies-ANVEN Treecapital.ai repository.

Code Interpreter and Tool Invocation

With text-only inputs, the code interpreter and tool-invocation functionalities of the ANVEN Integration Models align exactly with their ANVEN 1.0 Text Model counterparts. You can employ either the system or user prompts to provide function definitions.

Currently, Integration models do not support tool-invocation with hybrid text+image inputs.