Tytuł pozycji:
Alternatives for Greedy Discrete Subsampling: Various Approaches Including Cluster Subsampling of COVID-19 Data With No Response Variable
An exhaustive selection of all possible combinations of n = 400 from N = 698 observations of the COVID-19 dataset was used as a benchmark. Building a random set of subsamples and choosing the one that minimized an averaged sum of squares of each variable's category frequency returned similar results as a "forward" subselection reducing the dataset one-by-one observation by the same metric's permanent lowering. That works similarly as k-means clustering (with a random clusters' number) over the original dataset's observations and choosing a subsample from each cluster proportionally to its size. However, the approaches differ significantly in asymptotic time complexity.
1. This paper is supported by the grant OP VVV IGA/A, CZ.02.2.69/0.0/0.0/19_073/0016936 with no. 18/2021, which has been provided by the Internal Grant Agency of the Prague University of Economics and Business.
2. Preface
3. Session: 14th International Workshop on Computational Optimization
4. Communication Papers