diff --git a/docs/en/infernex-bridge/index.mdx b/docs/en/infernex-bridge/index.mdx new file mode 100644 index 0000000..c4d145e --- /dev/null +++ b/docs/en/infernex-bridge/index.mdx @@ -0,0 +1,7 @@ +--- +weight: 96 +--- + +# Alauda Build of InferNex Bridge + + diff --git a/docs/en/infernex-bridge/install.mdx b/docs/en/infernex-bridge/install.mdx new file mode 100644 index 0000000..0cc40d2 --- /dev/null +++ b/docs/en/infernex-bridge/install.mdx @@ -0,0 +1,120 @@ +--- +weight: 20 +--- + +# Install InferNex Bridge + +## Prerequisites + +Before installing **Alauda Build of InferNex Bridge**, ensure the target cluster has the required platform and inference dependencies. + +### Required Dependencies + +| Dependency | Type | Description | +| ------------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| Kubernetes cluster | Platform | A running cluster with administrator access. | +| KServe | Operator | Required when using the KServe `LLMInferenceService` entry point. InferNex Bridge declares support for upstream KServe v0.17.0. Alauda Build of KServe v0.16 and later are also supported. | +| Envoy Gateway and Gateway API | Operator / CRDs | Required when exposing inference services through Gateway API resources. | +| Gateway API Inference Extension | CRDs | Required for `InferencePool` based intelligent routing. | +| Alauda Build of LeaderWorkerSet | Operator | Required by inference workloads that use LeaderWorkerSet. Install it separately before deploying those workloads. | +| Inference runtime prerequisites | Runtime | Prepare NPU nodes, model storage, runtime templates, runtime images, and network access required by the selected inference engine. | + +:::info +`InferNexService` mode does not require users to install the InferNex main chart first. The operator installs the InferNex Bridge control plane; service templates, inference runtime images, model files, and feature-specific prerequisite CRDs must be prepared separately before deploying inference services. +::: + +### CRDs Installed by This Operator + +The Alauda Build of InferNex Bridge OLM bundle installs only the InferNex Bridge CRDs: + +| CRD | Installed by this operator | +| --------------------------------------------- | -------------------------- | +| `infernexservices.infernex.infernex.io` | Yes | +| `infernexserviceconfigs.infernex.infernex.io` | Yes | + +The following CRDs are not installed by this OLM bundle. Install them separately before enabling the corresponding features: + +| CRD | When Required | How to Install | +| ------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------ | +| `leaderworkersets.leaderworkerset.x-k8s.io` | Workloads that use LeaderWorkerSet | Install [Alauda Build of LeaderWorkerSet](../../lws/install.mdx) separately. | +| `resourcescalinggroups.autoscaling.openfuyao.com` | PD-Orchestrator ResourceScalingGroup | Install the CRD from the matching openFuyao InferNex Bridge release or an equivalent platform package. | +| `elasticscalers.elasticscaler.io` | PD-Orchestrator Elastic-Scaler | Install the CRD from the matching openFuyao InferNex Bridge release or an equivalent platform package. | +| `tidals.tidal.io` | PD-Orchestrator Tidal | Install the CRD from the matching openFuyao InferNex Bridge release or an equivalent platform package. | +| `rolebasedgroups.workloads.x-k8s.io` | Workload grouping features that require RoleBasedGroup | Install the corresponding workload controller or platform package before enabling this feature. | + +### Runtime Templates and Images + +:::warning +The operator package does not install model-serving runtime images into the cluster registry. In the tested release, the InferNex Bridge runtime templates reference `hub.oepkgs.net/openfuyao/ascend/vllm-ascend:v0.18.0`, but this image is not bundled with the operator package and is not installed automatically. + +Before deploying inference services, upload, import, or mirror the required runtime images, including `vllm-ascend:v0.18.0`, to the cluster registry or another registry accessible from the target cluster. If the registry address changes, update the runtime templates to use the image address accessible from the cluster. +::: + +:::info +The Alauda OLM bundle registers the InferNex Bridge admission webhook for the KServe `LLMInferenceService` API versions used by the release examples, including `serving.kserve.io/v1alpha2`. The webhook is used for admission-time compatibility patches when `infernex.io/runtime: "true"` is set on a KServe `LLMInferenceService`; it does not create or reconcile the `LLMInferenceService` resource itself. +::: + +### Optional Dependencies + +| Dependency | Required For | Description | +| --------------------- | ------------ | ------------------------------------------------------------------ | +| NATS | Eagle-Eye | Required when enabling Eagle-Eye hardware monitoring or diagnosis. | +| kube-prometheus-stack | Eagle-Eye | Required when enabling Eagle-Eye hardware monitoring or diagnosis. | + +## Upload Operator \{#upload-operator} + +Download the Alauda Build of InferNex Bridge Operator installation file, for example `infernex-bridge.alpha.ALL.xxxx.tgz`. + +Use the `violet` command to publish it to the platform repository: + +```bash +violet push --platform-address= --platform-username= --platform-password= infernex-bridge.alpha.ALL.xxxx.tgz +``` + +## Install Operator + +In **Administrator** view: + +1. Click **Marketplace / OperatorHub**. +2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where you want to install the InferNex Bridge Operator. +3. Search for and select **Alauda Build of InferNex Bridge**, then click **Install**. +4. Leave **Channel** unchanged. +5. Check whether the **Version** matches the InferNex Bridge version you want to install. +6. Leave **Installation Location** unchanged, it should be `infernex-system` by default. +7. Select **Manual** for **Upgrade Strategy**. +8. Click **Install**. + +### Verification + +Confirm that the **Alauda Build of InferNex Bridge** tile shows one of the following states: + +- `Installing`: installation is in progress; wait for this to change to `Installed`. +- `Installed`: installation is complete. + +Verify that the operator controller and webhooks are running: + +```bash +kubectl get pods -n infernex-system +kubectl get mutatingwebhookconfiguration,validatingwebhookconfiguration | grep infernex +kubectl get crd infernexservices.infernex.infernex.io infernexserviceconfigs.infernex.infernex.io +``` + +The controller pod should be `Running`, and both `InferNexService` and `InferNexServiceConfig` CRDs should exist. + +## Community Examples + +For community-maintained examples, see [InferNex Bridge examples](https://gitcode.com/openFuyao/InferNex/tree/release-26.6.0-rc.2/component/InferNex-Bridge/config/examples). + +## Upgrading Alauda Build of InferNex Bridge + +1. Upload the new version of the **Alauda Build of InferNex Bridge** operator package using the `violet` tool. +2. Go to the `Administrator` -> `Marketplace` -> `OperatorHub` page, find **Alauda Build of InferNex Bridge**, and click **Confirm** to apply the new version. + +### Verification + +After upgrading, confirm that the **Alauda Build of InferNex Bridge** tile shows `Installed` and verify the controller and CRD status: + +```bash +kubectl get pods -n infernex-system +kubectl get crd infernexservices.infernex.infernex.io infernexserviceconfigs.infernex.infernex.io +``` diff --git a/docs/en/infernex-bridge/intro.mdx b/docs/en/infernex-bridge/intro.mdx new file mode 100644 index 0000000..814980a --- /dev/null +++ b/docs/en/infernex-bridge/intro.mdx @@ -0,0 +1,55 @@ +--- +weight: 10 +--- + +# Introduction + +## InferNex Bridge + +**Alauda Build of InferNex Bridge** is based on the [openFuyao InferNex](https://gitcode.com/openFuyao/InferNex) project. +InferNex Bridge connects KServe `LLMInferenceService` workloads with the InferNex inference acceleration stack, and also provides native `InferNexService` APIs for environments that do not use KServe. + +The operator installs the InferNex Bridge controller, admission webhooks, RBAC, and the following custom resources: + +- **InferNexService**: A managed LLM inference service that can deploy inference engines, Hermes Router, Mooncake KV cache, cache-indexer, PD-Orchestrator, Eagle-Eye, and related resources. +- **InferNexServiceConfig**: A reusable configuration template referenced by `InferNexService` through `spec.baseRefs`. + +## Deployment Modes + +InferNex Bridge supports two deployment entry points. Choose one entry point for each inference service and do not deploy the same service through both paths. + +InferNex Bridge currently supports NPU inference workloads only. + +### KServe LLMInferenceService + +Use this mode when KServe is already installed and you want to keep the KServe `LLMInferenceService` workflow. + +Add the `infernex.io/runtime: "true"` label to an `LLMInferenceService`. KServe continues to reconcile the inference engine, Hermes Router, Gateway, `HTTPRoute`, and `InferencePool`; InferNex Bridge reconciles the InferNex enhancement components such as Mooncake KV cache, cache-indexer, PD-Orchestrator, Eagle-Eye, and KServe runtime compatibility patches. + +### InferNexService + +Use this mode when you want InferNex Bridge to manage the full inference service without using KServe as the entry point. + +Create an `InferNexService` that references one or more `InferNexServiceConfig` templates. InferNex Bridge reconciles the inference engine, Hermes Router, enhancement components, and, when intelligent gateway routing is enabled, Gateway API resources. + +## Capabilities + +- **KServe compatibility**: Use the existing KServe `LLMInferenceService` workflow and opt in to InferNex acceleration with the `infernex.io/runtime: "true"` label. +- **Native InferNex APIs**: Deploy inference services directly with `InferNexService` and reusable `InferNexServiceConfig` templates. +- **Prefill-decode disaggregation**: Run P/D inference patterns with proxy-server coordination for prefill and decode workloads. +- **Mooncake KV cache**: Deploy Mooncake KV cache and cache-indexer components for KV cache reuse and coordination. +- **Intelligent gateway routing**: Integrate Hermes Router and Gateway API resources for model-aware request routing. +- **Elastic orchestration**: Use PD-Orchestrator components such as Elastic-Scaler, Tidal, and ResourceScalingGroup when the inference engine replica fields are left for the scaler to manage. +- **Hardware observability**: Integrate Eagle-Eye hardware monitor and diagnosis components when the required observability dependencies are installed. + +For installation on the platform, see [Install InferNex Bridge](./install). + +## Documentation + +InferNex Bridge upstream documentation and key dependencies: + +- **InferNex Bridge User Guide**: [https://gitcode.com/openFuyao/sig-ai-inference/blob/main/docs/zh/ai_inference_infernex/user_guide/ai_inference_infernex_bridge.md](https://gitcode.com/openFuyao/sig-ai-inference/blob/main/docs/zh/ai_inference_infernex/user_guide/ai_inference_infernex_bridge.md) — Upstream user guide covering deployment modes, prerequisites, and usage examples. +- **InferNex Source**: [https://gitcode.com/openFuyao/InferNex](https://gitcode.com/openFuyao/InferNex) — Source code, charts, examples, and release tags. +- **InferNex Bridge Technical Specification**: [https://gitcode.com/openFuyao/InferNex/blob/master/component/InferNex-Bridge/docs/InferNex-Bridge-Technical-Specification.md](https://gitcode.com/openFuyao/InferNex/blob/master/component/InferNex-Bridge/docs/InferNex-Bridge-Technical-Specification.md) — Architecture, ownership boundaries, webhook behavior, and routing contracts. +- **KServe Documentation**: [https://kserve.github.io/website/](https://kserve.github.io/website/) — KServe concepts and `LLMInferenceService` documentation. +- **Gateway API Inference Extension**: [https://gateway-api-inference-extension.sigs.k8s.io/](https://gateway-api-inference-extension.sigs.k8s.io/) — Inference-aware Gateway API resources used by model routing.