subset_data does not converge

Hi, I have a large dataset (>100k samples) that contains a lot of duplicates.
MSPHATE does not converge during the `Calculating partitions...` step.

I can't share the dataset in question, but I think I replicated the effect with some randomly generated data. See the following code and output:

```
import numpy as np
from multiscale_phate import compress, diffuse, condense

np.random.seed(42)

# spoof data
data = np.random.uniform(size=(10001, 200))
data = np.vstack([data, data, data, data, data, data, data, data, data, data])  # highly redundant

# spoof MSPHATE compress step
N, features = data.shape
n_pca = 200
partitions = None

# Computing compression features
n_pca, partitions = compress.get_compression_features(
    N, features, n_pca, partitions, landmarks=2000
)

# modified to display np.max(cluster_counts) and np.ceil(N / desired_num_clusters)
_ = compress.subset_data(data, desired_num_clusters=partitions, n_jobs=8, num_cluster=100, random_state=None)
```

output:

```
Calculating partitions...
np.max(cluster_counts):  3930
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  1120
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  70
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  10
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  10
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  10
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  10
```

The output is the same after many iterations.

Note: I am using python 3.8 and installed using `pip install git+https://github.com/KrishnaswamyLab/Multiscale_PHATE
`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subset_data does not converge #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

subset_data does not converge #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions