Alternatives for Greedy Discrete Subsampling: Various Approaches Including Cluster Subsampling of COVID-19 Data With No Response Variable

Szczegóły
Abstrakt

Tytuł:: Alternatives for Greedy Discrete Subsampling: Various Approaches Including Cluster Subsampling of COVID-19 Data With No Response Variable
Autorzy:: Štěpánek, Lubomír
Habarta, Filip
Malá, Ivana
Marek, Luboš
Data publikacji:: 2021
Słowa kluczowe:: COVID-19
selection
combinations
exhaustive method
verification datasets
wybór
kombinacje
metody optymalizacji
weryfikacja zbiorów danych
Język:: angielski
Dostawca treści:: BazTech
: Artykuł

Przejdź do źródła

An exhaustive selection of all possible combinations of n = 400 from N = 698 observations of the COVID-19 dataset was used as a benchmark. Building a random set of subsamples and choosing the one that minimized an averaged sum of squares of each variable's category frequency returned similar results as a "forward" subselection reducing the dataset one-by-one observation by the same metric's permanent lowering. That works similarly as k-means clustering (with a random clusters' number) over the original dataset's observations and choosing a subsample from each cluster proportionally to its size. However, the approaches differ significantly in asymptotic time complexity.

1. This paper is supported by the grant OP VVV IGA/A, CZ.02.2.69/0.0/0.0/19_073/0016936 with no. 18/2021, which has been provided by the Internal Grant Agency of the Prague University of Economics and Business.

2. Preface

3. Session: 14th International Workshop on Computational Optimization

4. Communication Papers

Informacja

Alternatives for Greedy Discrete Subsampling: Various Approaches Including Cluster Subsampling of COVID-19 Data With No Response Variable