REGIONAL TYPOLOGY OF E-COMMERCE BUSINESS CONSTRAINTS IN INDONESIA: A MACHINE LEARNING APPROACH
Keywords:
E-commerce, machine learning, clustering analysis, regional typologyAbstract
This study analyzes regional disparities in e-commerce business constraints in Indonesia using an unsupervised machine learning approach. Using province-level data from Statistik E-Commerce 2024 published by Statistics Indonesia, the analysis covers 38 provinces. It examines seven major constraints, including funding limitations, skilled labor shortages, limited internet access, fraud, marketing challenges, delivery constraints, and other operational barriers. K-Means clustering with z-score standardization is applied to identify regional typologies of e-commerce business constraints. The optimal number of clusters is determined using the elbow method, the silhouette score, the Davies-Bouldin index, and the Calinski-Harabasz index. The results reveal five distinct regional clusters with different combinations of constraints. The findings show that provinces in Java and Bali are mainly constrained by capital and marketing pressures despite relatively advanced digital infrastructure. Several regions outside Java face balanced structural constraints involving multiple interrelated obstacles, while capital-heavy constraints dominate others. In contrast, Papua Pegunungan and Papua Tengah exhibit severe digital infrastructure constraints, indicating persistent digital divides. This study contributes by providing a province-level typology of e-commerce business constraints using official statistics and machine learning, offering a data-driven basis for designing region-specific strategies to support inclusive e-commerce development in Indonesia.
Downloads
References
Abbas, S. A., Aslam, A., Rehman, A. U., Abbasi, W. A., Arif, S., & Kazmi, S. Z. H. (2020). K-Means and K-Medoids: Cluster Analysis on Birth Data Collected in City Muzaffarabad, Kashmir. IEEE Access, 8, 151847–151855. https://doi.org/10.1109/ACCESS.2020.3014021
Ahi, A. A., Sinkovics, N., & Sinkovics, R. R. (2022). E-commerce Policy and the Global Economy: A Path to More Inclusive Development? Management International Review 2022 63:1, 63(1), 27–56. https://doi.org/10.1007/s11575-022-00490-1
Ahmed, S. R. A., Al-Barazanchi, I., Jaaz, Z. A., & Abdulshaheed, H. R. (2019). Clustering algorithms subjected to K-mean and gaussian mixture model on multidimensional data set. Periodicals of Engineering and Natural Sciences, 7(2), 448–457. https://doi.org/10.21533/PEN.V7I2.484
Aik, L. E., Choon, T. W., & Abu, M. S. (2023). K-means Algorithm Based on Flower Pollination Algorithm and Calinski-Harabasz Index. Journal of Physics: Conference Series, 2643(1), 012019. https://doi.org/10.1088/1742-6596/2643/1/012019
Al Azies, H., & Herowati, W. (2023). Unravelling Income Inequality in Indonesia. Jurnal Riset Ilmu Ekonomi, 3(2), 89–100. https://doi.org/10.23969/JRIE.V3I2.63
Asikin, Z. (2024). Diverse E-Commerce Business Models In Indonesia: A Cluster Analysis From The National E-Commerce Survey. Business Review and Case Studies, 5(2), 319–319. https://doi.org/10.17358/brcs.5.2.319
Azies, H. Al, & Rositawati, A. F. D. (2021). Mapping of the Reading Literacy Activity Index in East Java Province, Indonesia: an Unsupervised Learning Approach. Proceedings of The International Conference on Data Science and Official Statistics, 2021(1), 211–223. https://doi.org/10.34123/ICDSOS.V2021I1.128
Bakri, Rochmah, A. A. N., Safitri, E. A., Indriani, K., & Erlina, S. R. A. D. (2024). E-commerce and Market Penetration Strategies in Overcoming Geographical Challenges in Indonesia’s Retail Industry. Journal of Contemporary Administration and Management (ADMAN), 2(2), 539–546. https://doi.org/10.61100/adman.v2i2.197
Baligodugula, V. (2023). Unsupervised-based Distributed Machine Learning for Efficient Data Clustering and Prediction. Browse All Theses and Dissertations. https://corescholar.libraries.wright.edu/etd_all/2791
Cooksey, R. W. (2020). Descriptive Statistics for Summarising Data. Illustrating Statistical Procedures: Finding Meaning in Quantitative Data, 61–139. https://doi.org/10.1007/978-981-15-2537-7_5
Criveanu, M. M. (2023). Investigating Digital Intensity and E-Commerce as Drivers for Sustainability and Economic Growth in the EU Countries. Electronics 2023, Vol. 12, 12(10). https://doi.org/10.3390/electronics12102318
Dadashpoor, H., Malekzadeh, N., & Saeidishirvan, S. (2022). A typology of metropolitan spatial structure: a systematic review. Environment, Development and Sustainability 2022 25:12, 25(12), 13667–13693. https://doi.org/10.1007/s10668-022-02641-8
Edelmann, D., Móri, T. F., & Székely, G. J. (2021). On relationships between the Pearson and the distance correlation coefficients. Statistics & Probability Letters, 169, 108960. https://doi.org/10.1016/J.SPL.2020.108960
Gratsos, K., Ougiaroglou, S., & Margaris, D. (2023). A Web Tool for K-means Clustering. Lecture Notes in Networks and Systems, 783 LNNS, 91–101. https://doi.org/10.1007/978-3-031-44097-7_9
Inkongngarm, A., Bootkrajang, J., Somhom, S., Trongratsameethong, A., & Luekhong, P. (2024). Enhancing Educational Strategy Through K-Means Clustering: A Study on Academic Departments. Proceedings - 21st International Joint Conference on Computer Science and Software Engineering, JCSSE 2024, 310–315. https://doi.org/10.1109/JCSSE61278.2024.10613649
Li, Kaiming, Wang, L., Yue, L., & Li, Kaishun. (2026). Spatial Heterogeneity and Gradient Governance of Idle Rural Homesteads in Megacities: Evidence from Shanghai. Land 2026, Vol. 15, 15(2), 246. https://doi.org/10.3390/land15020246
Lima, S. P., & Cruz, M. D. (2020). A genetic algorithm using Calinski-Harabasz index for automatic clustering problem. Revista Brasileira de Computação Aplicada, 12(3), 97–106. https://doi.org/10.5335/RBCA.V12I3.11117
Liu, Y., Mu, Y., Chen, K., Li, Y., & Guo, J. (2020). Daily Activity Feature Selection in Smart Homes Based on Pearson Correlation Coefficient. Neural Processing Letters, 51(2), 1771–1787. https://doi.org/10.1007/S11063-019-10185-8/FIGURES/13
Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2020). Big Data Preprocessing: Enabling Smart Data. Big Data Preprocessing: Enabling Smart Data, 1–186. https://doi.org/10.1007/978-3-030-39105-8/COVER
Mishra, P., Pandey, C. M., Singh, U., Gupta, A., Sahu, C., & Keshri, A. (2019). Descriptive Statistics and Normality Tests for Statistical Data. Annals of Cardiac Anaesthesia, 22(1), 67. https://doi.org/10.4103/ACA.ACA_157_18
Monica, M., Ayuningtiyas, N. U., Al Azies, H., Riefky, M., Khusna, H., & Rahayu, S. P. (2021). Unsupervised Learning Approach for Evaluating the Impact of COVID-19 on Economic Growth in Indonesia. Communications in Computer and Information Science, 1489 CCIS, 54–70. https://doi.org/10.1007/978-981-16-7334-4_5/COVER
Nirmal, S. (2008). Comparative Study between K-Means and K-Medoids Clustering Algorithms. International Research Journal of Engineering and Technology, 839. www.irjet.net
Onumanyi, A. J., Molokomme, D. N., Isaac, S. J., & Abu-Mahfouz, A. M. (2022). AutoElbow: An Automatic Elbow Detection Method for Estimating the Number of Clusters in a Dataset. Applied Sciences 2022, Vol. 12, Page 7515, 12(15), 7515. https://doi.org/10.3390/APP12157515
Ros, F., Riad, R., & Guillaume, S. (2023). PDBI: A partitioning Davies-Bouldin index for clustering evaluation. Neurocomputing, 528, 178–199. https://doi.org/10.1016/J.NEUCOM.2023.01.043
SchubertErich. (2023). Stop using the elbow criterion for k-means and how to choose the number of clusters instead. ACM SIGKDD Explorations Newsletter, 25(1), 36–42. https://doi.org/10.1145/3606274.3606278
Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-means clustering algorithm. IEEE Access, 8, 80716–80727. https://doi.org/10.1109/ACCESS.2020.2988796
Sosyal Araştırmalar, A. (2025). Sustainable e-Commerce: Transformation in Environmental, Economic, and Social Dimensions. Akademic Social Studies, 9(31), 261–290. https://doi.org/10.31455/asya.1622024
Sowan, B., Hong, T. P., Al-Qerem, A., Alauthman, M., & Matar, N. (2023). Ensembling validation indices to estimate the optimal number of clusters. Applied Intelligence, 53(9), 9933–9957. https://doi.org/10.1007/S10489-022-03939-W/FIGURES/11
Wijaya, D. R., Paramita, N. L. P. S. P., Uluwiyah, A., Rheza, M., Zahara, A., & Puspita, D. R. (2020). Estimating city-level poverty rate based on e-commerce data with machine learning. Electronic Commerce Research 2020 22:1, 22(1), 195–221. https://doi.org/10.1007/s10660-020-09424-1
Xu, G., Zhao, T., & Wang, R. (2022). Research on the Efficiency Measurement and Spatial Spillover Effect of China’s Regional E-Commerce Poverty Alleviation from the Perspective of Sustainable Development. Sustainability 2022, Vol. 14, 14(14). https://doi.org/10.3390/su14148456