Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/en/infernex-bridge/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
weight: 96
---

# Alauda Build of InferNex Bridge

<Overview />
120 changes: 120 additions & 0 deletions docs/en/infernex-bridge/install.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
weight: 20
---

# Install InferNex Bridge

## Prerequisites

Before installing **Alauda Build of InferNex Bridge**, ensure the target cluster has the required platform and inference dependencies.

### Required Dependencies

| Dependency | Type | Description |
| ------------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Kubernetes cluster | Platform | A running cluster with administrator access. |
| KServe | Operator | Required when using the KServe `LLMInferenceService` entry point. InferNex Bridge declares support for upstream KServe v0.17.0. Alauda Build of KServe v0.16 and later are also supported. |
| Envoy Gateway and Gateway API | Operator / CRDs | Required when exposing inference services through Gateway API resources. |
| Gateway API Inference Extension | CRDs | Required for `InferencePool` based intelligent routing. |
| Alauda Build of LeaderWorkerSet | Operator | Required by inference workloads that use LeaderWorkerSet. Install it separately before deploying those workloads. |
| Inference runtime prerequisites | Runtime | Prepare NPU nodes, model storage, runtime templates, runtime images, and network access required by the selected inference engine. |

:::info
`InferNexService` mode does not require users to install the InferNex main chart first. The operator installs the InferNex Bridge control plane; service templates, inference runtime images, model files, and feature-specific prerequisite CRDs must be prepared separately before deploying inference services.
:::

### CRDs Installed by This Operator

The Alauda Build of InferNex Bridge OLM bundle installs only the InferNex Bridge CRDs:

| CRD | Installed by this operator |
| --------------------------------------------- | -------------------------- |
| `infernexservices.infernex.infernex.io` | Yes |
| `infernexserviceconfigs.infernex.infernex.io` | Yes |

The following CRDs are not installed by this OLM bundle. Install them separately before enabling the corresponding features:

| CRD | When Required | How to Install |
| ------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------ |
| `leaderworkersets.leaderworkerset.x-k8s.io` | Workloads that use LeaderWorkerSet | Install [Alauda Build of LeaderWorkerSet](../../lws/install.mdx) separately. |
| `resourcescalinggroups.autoscaling.openfuyao.com` | PD-Orchestrator ResourceScalingGroup | Install the CRD from the matching openFuyao InferNex Bridge release or an equivalent platform package. |
| `elasticscalers.elasticscaler.io` | PD-Orchestrator Elastic-Scaler | Install the CRD from the matching openFuyao InferNex Bridge release or an equivalent platform package. |
| `tidals.tidal.io` | PD-Orchestrator Tidal | Install the CRD from the matching openFuyao InferNex Bridge release or an equivalent platform package. |
| `rolebasedgroups.workloads.x-k8s.io` | Workload grouping features that require RoleBasedGroup | Install the corresponding workload controller or platform package before enabling this feature. |

### Runtime Templates and Images

:::warning
The operator package does not install model-serving runtime images into the cluster registry. In the tested release, the InferNex Bridge runtime templates reference `hub.oepkgs.net/openfuyao/ascend/vllm-ascend:v0.18.0`, but this image is not bundled with the operator package and is not installed automatically.

Before deploying inference services, upload, import, or mirror the required runtime images, including `vllm-ascend:v0.18.0`, to the cluster registry or another registry accessible from the target cluster. If the registry address changes, update the runtime templates to use the image address accessible from the cluster.
:::

:::info
The Alauda OLM bundle registers the InferNex Bridge admission webhook for the KServe `LLMInferenceService` API versions used by the release examples, including `serving.kserve.io/v1alpha2`. The webhook is used for admission-time compatibility patches when `infernex.io/runtime: "true"` is set on a KServe `LLMInferenceService`; it does not create or reconcile the `LLMInferenceService` resource itself.
:::

### Optional Dependencies

| Dependency | Required For | Description |
| --------------------- | ------------ | ------------------------------------------------------------------ |
| NATS | Eagle-Eye | Required when enabling Eagle-Eye hardware monitoring or diagnosis. |
| kube-prometheus-stack | Eagle-Eye | Required when enabling Eagle-Eye hardware monitoring or diagnosis. |

## Upload Operator \{#upload-operator}

Download the Alauda Build of InferNex Bridge Operator installation file, for example `infernex-bridge.alpha.ALL.xxxx.tgz`.

Use the `violet` command to publish it to the platform repository:

```bash
violet push --platform-address=<platform-access-address> --platform-username=<platform-admin> --platform-password=<platform-admin-password> infernex-bridge.alpha.ALL.xxxx.tgz
```

## Install Operator

In **Administrator** view:

1. Click **Marketplace / OperatorHub**.
2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where you want to install the InferNex Bridge Operator.
3. Search for and select **Alauda Build of InferNex Bridge**, then click **Install**.
4. Leave **Channel** unchanged.
5. Check whether the **Version** matches the InferNex Bridge version you want to install.
6. Leave **Installation Location** unchanged, it should be `infernex-system` by default.
7. Select **Manual** for **Upgrade Strategy**.
8. Click **Install**.

### Verification

Confirm that the **Alauda Build of InferNex Bridge** tile shows one of the following states:

- `Installing`: installation is in progress; wait for this to change to `Installed`.
- `Installed`: installation is complete.

Verify that the operator controller and webhooks are running:

```bash
kubectl get pods -n infernex-system
kubectl get mutatingwebhookconfiguration,validatingwebhookconfiguration | grep infernex
kubectl get crd infernexservices.infernex.infernex.io infernexserviceconfigs.infernex.infernex.io
```

The controller pod should be `Running`, and both `InferNexService` and `InferNexServiceConfig` CRDs should exist.

## Community Examples

For community-maintained examples, see [InferNex Bridge examples](https://gitcode.com/openFuyao/InferNex/tree/release-26.6.0-rc.2/component/InferNex-Bridge/config/examples).

## Upgrading Alauda Build of InferNex Bridge

1. Upload the new version of the **Alauda Build of InferNex Bridge** operator package using the `violet` tool.
2. Go to the `Administrator` -> `Marketplace` -> `OperatorHub` page, find **Alauda Build of InferNex Bridge**, and click **Confirm** to apply the new version.

### Verification

After upgrading, confirm that the **Alauda Build of InferNex Bridge** tile shows `Installed` and verify the controller and CRD status:

```bash
kubectl get pods -n infernex-system
kubectl get crd infernexservices.infernex.infernex.io infernexserviceconfigs.infernex.infernex.io
```
55 changes: 55 additions & 0 deletions docs/en/infernex-bridge/intro.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
weight: 10
---

# Introduction

## InferNex Bridge

**Alauda Build of InferNex Bridge** is based on the [openFuyao InferNex](https://gitcode.com/openFuyao/InferNex) project.
InferNex Bridge connects KServe `LLMInferenceService` workloads with the InferNex inference acceleration stack, and also provides native `InferNexService` APIs for environments that do not use KServe.

The operator installs the InferNex Bridge controller, admission webhooks, RBAC, and the following custom resources:

- **InferNexService**: A managed LLM inference service that can deploy inference engines, Hermes Router, Mooncake KV cache, cache-indexer, PD-Orchestrator, Eagle-Eye, and related resources.
- **InferNexServiceConfig**: A reusable configuration template referenced by `InferNexService` through `spec.baseRefs`.

## Deployment Modes

InferNex Bridge supports two deployment entry points. Choose one entry point for each inference service and do not deploy the same service through both paths.

InferNex Bridge currently supports NPU inference workloads only.

### KServe LLMInferenceService

Use this mode when KServe is already installed and you want to keep the KServe `LLMInferenceService` workflow.

Add the `infernex.io/runtime: "true"` label to an `LLMInferenceService`. KServe continues to reconcile the inference engine, Hermes Router, Gateway, `HTTPRoute`, and `InferencePool`; InferNex Bridge reconciles the InferNex enhancement components such as Mooncake KV cache, cache-indexer, PD-Orchestrator, Eagle-Eye, and KServe runtime compatibility patches.

### InferNexService

Use this mode when you want InferNex Bridge to manage the full inference service without using KServe as the entry point.

Create an `InferNexService` that references one or more `InferNexServiceConfig` templates. InferNex Bridge reconciles the inference engine, Hermes Router, enhancement components, and, when intelligent gateway routing is enabled, Gateway API resources.

## Capabilities

- **KServe compatibility**: Use the existing KServe `LLMInferenceService` workflow and opt in to InferNex acceleration with the `infernex.io/runtime: "true"` label.
- **Native InferNex APIs**: Deploy inference services directly with `InferNexService` and reusable `InferNexServiceConfig` templates.
- **Prefill-decode disaggregation**: Run P/D inference patterns with proxy-server coordination for prefill and decode workloads.
- **Mooncake KV cache**: Deploy Mooncake KV cache and cache-indexer components for KV cache reuse and coordination.
- **Intelligent gateway routing**: Integrate Hermes Router and Gateway API resources for model-aware request routing.
- **Elastic orchestration**: Use PD-Orchestrator components such as Elastic-Scaler, Tidal, and ResourceScalingGroup when the inference engine replica fields are left for the scaler to manage.
- **Hardware observability**: Integrate Eagle-Eye hardware monitor and diagnosis components when the required observability dependencies are installed.

For installation on the platform, see [Install InferNex Bridge](./install).

## Documentation

InferNex Bridge upstream documentation and key dependencies:

- **InferNex Bridge User Guide**: [https://gitcode.com/openFuyao/sig-ai-inference/blob/main/docs/zh/ai_inference_infernex/user_guide/ai_inference_infernex_bridge.md](https://gitcode.com/openFuyao/sig-ai-inference/blob/main/docs/zh/ai_inference_infernex/user_guide/ai_inference_infernex_bridge.md) — Upstream user guide covering deployment modes, prerequisites, and usage examples.
- **InferNex Source**: [https://gitcode.com/openFuyao/InferNex](https://gitcode.com/openFuyao/InferNex) — Source code, charts, examples, and release tags.
- **InferNex Bridge Technical Specification**: [https://gitcode.com/openFuyao/InferNex/blob/master/component/InferNex-Bridge/docs/InferNex-Bridge-Technical-Specification.md](https://gitcode.com/openFuyao/InferNex/blob/master/component/InferNex-Bridge/docs/InferNex-Bridge-Technical-Specification.md) — Architecture, ownership boundaries, webhook behavior, and routing contracts.
- **KServe Documentation**: [https://kserve.github.io/website/](https://kserve.github.io/website/) — KServe concepts and `LLMInferenceService` documentation.
- **Gateway API Inference Extension**: [https://gateway-api-inference-extension.sigs.k8s.io/](https://gateway-api-inference-extension.sigs.k8s.io/) — Inference-aware Gateway API resources used by model routing.