Add support for Cluster, Custom Terms and Models in Bulk API#1969
Open
khansaad wants to merge 3 commits into
Open
Add support for Cluster, Custom Terms and Models in Bulk API#1969khansaad wants to merge 3 commits into
khansaad wants to merge 3 commits into
Conversation
Contributor
Reviewer's GuideAdds optional cluster_name, model_settings, and term_settings to the Bulk API and Bulk job pipeline, including validation, passthrough into experiment creation, datasource cluster association, and new recommendation notifications, while keeping backward compatibility and extending docs and tests. Sequence diagram for Bulk API cluster and recommendation settings passthroughsequenceDiagram
actor Optimizer
participant BulkAPI as BulkJobManager
participant Validation as BulkServiceValidation
participant DSc as DataSourceConfig
participant ExpReq as CreateExperimentAPIObject
Optimizer->>BulkAPI: submitBulkJob(BulkInput)
BulkAPI->>Validation: validate(payload, jobID)
Validation-->>BulkAPI: ValidationOutputData
alt [validation passes]
BulkAPI->>BulkAPI: prepareCreateExperimentJSONInput(dsc, namespace)
BulkAPI->>DSc: getDataSourceClusterName()
DSc-->>BulkAPI: clusterNameFromMetadata
alt [bulkInput.getCluster_name() provided and non-empty]
BulkAPI->>ExpReq: setClusterName(cluster_name from BulkInput)
else [no cluster_name in BulkInput]
BulkAPI->>ExpReq: setClusterName(clusterNameFromMetadata)
end
BulkAPI->>ExpReq: setRecommendationSettings(RecommendationSettings)
alt [bulkInput.getModel_settings() not null]
BulkAPI->>ExpReq: RecommendationSettings.setModelSettings(model_settings)
end
alt [bulkInput.getTerm_settings() not null]
BulkAPI->>ExpReq: RecommendationSettings.setTermSettings(term_settings)
end
else [validation fails]
BulkAPI-->>Optimizer: error response
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- When parsing and storing comma-separated cluster names in
KruizeDataSourceEntry.getClusterList()and related code paths, consider trimming each token aftersplit(",")so that configurations like"default, stage"don’t leave leading spaces in cluster identifiers. - There is a potential naming mismatch between the design (
datasources[].cluster) and the implementation (DataSourceMetadataInfoJSONKeys.CLUSTERSandclustersfield onDataSourceInfo/DB); please ensure the JSON key used in datasource configs is consistent and that the code reads the same key to avoid silent misconfiguration. - In
DataSourceInfoAdapter, you serialize the authentication field asauthenticationConfig, while the public design docs referenceauthentication; consider renaming this JSON property (or adding a backwards-compatible alias) so that the ListDatasources API aligns with the documented schema.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- When parsing and storing comma-separated cluster names in `KruizeDataSourceEntry.getClusterList()` and related code paths, consider trimming each token after `split(",")` so that configurations like `"default, stage"` don’t leave leading spaces in cluster identifiers.
- There is a potential naming mismatch between the design (`datasources[].cluster`) and the implementation (`DataSourceMetadataInfoJSONKeys.CLUSTERS` and `clusters` field on `DataSourceInfo`/DB); please ensure the JSON key used in datasource configs is consistent and that the code reads the same key to avoid silent misconfiguration.
- In `DataSourceInfoAdapter`, you serialize the authentication field as `authenticationConfig`, while the public design docs reference `authentication`; consider renaming this JSON property (or adding a backwards-compatible alias) so that the ListDatasources API aligns with the documented schema.
## Individual Comments
### Comment 1
<location path="src/main/java/com/autotune/common/bulk/BulkServiceValidation.java" line_range="51-60" />
<code_context>
+
+ // Kubernetes DNS-1123 subdomain validation constants
+ private static final int MAX_CLUSTER_NAME_LENGTH = 253;
+ private static final String CLUSTER_NAME_REGEX = "^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$";
+ private static final String CLUSTER_NAME_FORMAT_ERROR =
+ "Invalid cluster_name format. Must be a valid Kubernetes DNS-1123 subdomain: " +
</code_context>
<issue_to_address>
**issue (bug_risk):** Cluster name validation does not enforce the DNS-1123 63-char per-label limit despite the comment/spec reference.
Current validation only checks a 253-char max and the general DNS-1123 subdomain pattern, but not the 63-character per-label limit that Kubernetes enforces. Because the error message and constant claim compliance with the Kubernetes DNS-1123 subdomain spec, this may permit cluster names that Kubernetes will later reject. Please either enforce the 63-char per-label constraint (e.g., via a stricter regex or an additional length check) or relax the wording of the error message to match the actual behavior.
</issue_to_address>
### Comment 2
<location path="src/main/java/com/autotune/database/table/KruizeDataSourceEntry.java" line_range="126" />
<code_context>
+ if (clusters == null || clusters.trim().isEmpty()) {
+ return new ArrayList<>();
+ }
+ return new ArrayList<>(Arrays.asList(clusters.split(",")));
+ }
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Split cluster list entries are not trimmed, which may cause subtle matching issues if config uses spaces after commas.
For inputs like "cluster-a, cluster-b", this will return "cluster-a" and " cluster-b" (with a leading space), which can break exact string comparisons. Consider trimming each split element and filtering out any empty strings produced by patterns like ",," before returning the list.
Suggested implementation:
```java
/**
* Get cluster names as a list
* @return List of cluster names, empty list if no clusters defined
*/
public List<String> getClusterList() {
if (clusters == null || clusters.trim().isEmpty()) {
return new ArrayList<>();
}
return Arrays.stream(clusters.split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
.collect(Collectors.toList());
}
import jakarta.persistence.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
```
1. Ensure the `import` section is at the top of the file (above the class declaration) and not inside any method body. In the provided snippet, the `import` lines appear directly after the method, which is not valid Java syntax. Move all `import` statements to the correct place if they are not already there in the full file.
2. If there are existing static imports or other stream-related imports, keep the style consistent (e.g., if `Collectors.toList()` is statically imported elsewhere, you might want to follow that convention).
</issue_to_address>
### Comment 3
<location path="src/main/java/com/autotune/common/datasource/DataSourceCollection.java" line_range="204-214" />
<code_context>
+ if (dataSourceObject.has(KruizeConstants.DataSourceConstants.DataSourceMetadataInfoJSONKeys.CLUSTERS)) {
+ JSONArray clustersArray = dataSourceObject.optJSONArray(KruizeConstants.DataSourceConstants.DataSourceMetadataInfoJSONKeys.CLUSTERS);
+ if (clustersArray != null) {
+ for (int i = 0; i < clustersArray.length(); i++) {
+ clusters.add(clustersArray.getString(i));
+ }
+ LOGGER.info("Clusters for datasource {}: {}", name, clusters);
</code_context>
<issue_to_address>
**suggestion:** Cluster names from config are accepted verbatim without trimming or basic validation.
Since the JSON values are added directly, entries like " cluster-a" or "" can end up in `clusters`. To avoid dirty data and downstream issues, trim each value before adding it and skip empty results (and optionally log or reject clearly invalid names).
```suggestion
// Extract clusters array if present
List<String> clusters = new ArrayList<>();
if (dataSourceObject.has(KruizeConstants.DataSourceConstants.DataSourceMetadataInfoJSONKeys.CLUSTERS)) {
JSONArray clustersArray = dataSourceObject.optJSONArray(KruizeConstants.DataSourceConstants.DataSourceMetadataInfoJSONKeys.CLUSTERS);
if (clustersArray != null) {
for (int i = 0; i < clustersArray.length(); i++) {
String clusterName = clustersArray.optString(i, null);
if (clusterName == null) {
LOGGER.warn("Null cluster name encountered for datasource '{}', index {}. Skipping entry.", name, i);
continue;
}
String trimmedClusterName = clusterName.trim();
if (trimmedClusterName.isEmpty()) {
LOGGER.warn("Blank or whitespace-only cluster name encountered for datasource '{}', index {}. Original value: '{}'. Skipping entry.", name, i, clusterName);
continue;
}
clusters.add(trimmedClusterName);
}
LOGGER.info("Clusters for datasource {}: {}", name, clusters);
}
}
```
</issue_to_address>
### Comment 4
<location path="src/main/java/com/autotune/analyzer/adapters/DataSourceInfoAdapter.java" line_range="39-41" />
<code_context>
+
+ jsonObject.addProperty("name", src.getName());
+ jsonObject.addProperty("provider", src.getProvider());
+ jsonObject.addProperty("serviceName", src.getServiceName());
+ jsonObject.addProperty("namespace", src.getNamespace());
+ jsonObject.addProperty("url", src.getUrl() != null ? src.getUrl().toString() : null);
+
+ // Serialize authenticationConfig
</code_context>
<issue_to_address>
**issue (bug_risk):** The custom serializer will emit explicit nulls for fields that were previously omitted, which may be a breaking change for consumers.
`addProperty` with null produces explicit `"serviceName": null`, `"namespace": null`, and `"url": null` instead of omitting them, unlike the default Gson behavior without `serializeNulls`. If clients depend on these fields being absent rather than null, this changes the wire format. To keep the previous behavior, only add these properties when they are non-null, as you already do for `authenticationConfig` and `clusters`.
</issue_to_address>
### Comment 5
<location path="design/KruizeDatasource.md" line_range="34" />
<code_context>
+| `datasources[].provider` | string | Yes | Provider type (e.g., "prometheus") |
+| `datasources[].url` OR `serviceName`+`namespace` | string | Yes | Connection endpoint |
+| `datasources[].authentication` | object | No | Authentication configuration (defaults to "none") |
+| `datasources[].cluster` | array | No | Array of cluster names associated with this datasource |
### URL vs ServiceName
</code_context>
<issue_to_address>
**issue:** Clarify singular `cluster` vs plural `clusters` field name in examples
The table documents `datasources[].cluster` while the later API response example uses `"clusters": ["default"]`. Please standardize on a single field name (`cluster` vs `clusters`) across the schema, examples, and response to avoid ambiguity about which is correct.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: Saad Khan <saakhan@ibm.com>
Signed-off-by: Saad Khan <saakhan@ibm.com>
9a9b385 to
12982bf
Compare
Signed-off-by: Saad Khan <saakhan@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes # (issue)
Type of change
How has this been tested?
Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.
Test Configuration
Checklist 🎯
Additional information
Include any additional information such as links, test results, screenshots here
Summary by Sourcery
Extend Bulk API and recommendation engine to support cluster-aware configuration and customizable recommendation models/terms, while enhancing datasource metadata and notifications.
New Features:
Enhancements:
Build:
Documentation:
Tests: