结构体 faiss::Clustering1D

struct Clustering1D : public faiss::Clustering

精确的1D聚类算法

因为它不使用索引，所以它不会重载 train() 函数

公共函数

explicit Clustering1D(int k)

Clustering1D(int k, const ClusteringParameters &cp)

void train_exact(idx_t n, const float *x)

inline virtual ~Clustering1D()

virtual void train(idx_t n, const float *x, faiss::Index &index, const float *x_weights = nullptr)

运行 k-means 训练

参数:

x – 训练向量，大小为 n * d
index – 用于分配的索引
x_weights – 与每个向量关联的权重：NULL 或大小为 n

void train_encoded(idx_t nx, const uint8_t *x_in, const Index *codec, Index &index, const float *weights = nullptr)

使用编码向量运行

除了 train() 的参数外，还使用一个编解码器作为参数来解码输入向量。

参数:: codec – 用于解码向量的编解码器 (nullptr = 向量实际上是浮点数)

void post_process_centroids(): 在每次质心更新后对质心进行后处理。包括可选的L2归一化和最近的整数舍入

公共成员

size_t d: 向量的维度

size_t k: 质心的数量

std::vector<float> centroids: 质心 (k * d) 如果在训练的输入上设置了质心，它们将被用作初始化

std::vector<ClusteringIterationStats> iteration_stats: 每次聚类迭代的统计信息

int niter = 25: 聚类迭代的次数

int nredo = 1: 重新进行聚类这么多次，并保留目标最佳的聚类

bool verbose = false

是否输出详细信息: bool spherical = false

是否在每次迭代后标准化质心（对于内积聚类有用）: bool int_centroids = false

是否在每次迭代后将质心坐标四舍五入为整数？: bool update_index = false

是否在每次迭代后重新训练索引？: bool frozen_centroids = false

使用作为输入提供的质心子集，并且在迭代期间不更改它们: int min_points_per_centroid = 39

如果每个质心提供的训练向量少于此数量，则写入警告。请注意，每个质心少于 1 个点会引发异常。: int max_points_per_centroid = 256

限制数据集大小，否则训练集会被二次抽样: int seed = 1234

随机数生成器的种子。负值会导致使用 std::high_resolution_clock 为内部 rng 设定种子。: size_t decode_block_size = 32768

当训练集被编码时，codec 解码器的批量大小: bool check_input_data_for_NaNs = true

是否检查输入数据中是否存在 NaN: bool use_faster_subsampling = false