CONTECSI - International Conference on Information Systems and Technology Management - ISSN 2448-1041, 20th CONTECSI - INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGY MANAGEMENT VIRTUAL

Tamanho da fonte: 
Deploying Deep Learning models on cloud: an exploratory investigation in low resource environments
Elayne Rute Lessa Lemos, Rodrigo Ribeiro Oliveira, Jairson Barbosa Rodrigues, Rosalvo Ferreira de Oliveira Neto

Última alteração: 2024-01-05

Resumo


Tech companies are deploying Machine Learning models in the cloud. Hardware requirements are higher when these models involve Deep Learning (DL) techniques. Sometimes, the cloud providers' costs may be prohibitive for users in developing countries. Our goal was to verify the feasibility of a startup company in developing countries to carry out a Proof of Concept (POC) with DL models in the cloud. A cloud scalability test was performed using the GECToR model, a DL solution for Grammatical Error Correction. Ten experiments with different amounts of simultaneous submissions were performed. The benchmark metrics analyzed were real-time latency, hardware resource consumption, and infrastructure cost. The results showed that GPU solutions achieve the best results. However, they had an average cost 300% higher than low-cost solutions without GPU, and low-cost alternatives can be used for POC purposes. More specifically, the results indicate that processor cache memory size is the essential configuration parameter for a solution without GPU. This result indicated a cost reduction of more than 50% of the best solution found in the study. The main conclusion is that carrying out a POC in Cloud with a DL model without GPU considering this is feasible and decreases the POC cost.


Palavras-chave


Cloud Computing; Deep Learning; Proof of Concept; Low Resources; Cost Analysis

Referências


Bates, S., Sienz, J., & Toropov, V. (2004, April). Formulation of the optimal Latin hypercube design of experiments using a permutation genetic algorithm. 45th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics & Materials Conference (p. 2011).

Brazil, B. (2018). Prometheus: Up & Running: Infrastructure and Application Performance Monitoring. " O’Reilly Media, Inc.".

Chi, P., Li, S., Cheng, Y., Lu, Y., Kang, S. H., & Xie, Y. (2016, January). Architecture design with STT-RAM: Opportunities and challenges. 2016 21st Asia and South Pacific design automation conference (ASP-DAC) (pp. 109-114). IEEE.

Dahlmeier, D., Ng, H. T., & Wu, S. M. (2013, June). Building a large annotated corpus of learner English: The NUS corpus of learner English. Proceedings of the eighth workshop on innovative use of NLP for building educational applications (pp. 22-31).

Fantinuoli, C., & Montecchio, M. (2022). Defining maximum acceptable latency of AI-enhanced CAI tools. arXiv preprint arXiv:2201.02792.

Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G. ... Stoica, I., (2009). Above the clouds: A berkeley view of cloud computing. Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS, 28(13):2009.

Ghader, H., & Monz, C. (2017). What does attention in neural machine translation pay attention to?. arXiv preprint arXiv:1710.03348.

González-Carvajal, S., & Garrido-Merchán, E. C. (2020). Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012.

Hameed, F., Bauer, L., & Henkel, J. (2013, March). Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores. 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 77-82). IEEE.

International Labour Office (2019). The Global Labour Income Share and Distribution: Key Findings. Website.

Iyer, L. S., Gupta, B., & Johri, N. (2005). Performance, scalability and reliability issues in web applications. Industrial Management & Data Systems, 105(5), 561-576.

Kostadinov,S. (2019). Understanding Encoder-Decoder Sequence to Sequence Model. Website.

Mell, P., & Grance, T. (2011). The NIST definition of cloud computing.

Mittal, S., & Vaishay, S. (2019). A survey of techniques for optimizing deep learning on GPUs. Journal of Systems Architecture, 99, 101635.

Mufid, M. R., Basofi, A., Al Rasyid, M. U. H., & Rochimansyah, I. F. (2019, September). Design an mvc model using python for flask framework development. 2019 International Electronics Symposium (IES) (pp. 214-219). IEEE.

Ng, H. T., Wu, S. M., Briscoe, T., Hadiwinoto, C., Susanto, R. H., & Bryant, C. (2014, June). Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task. Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task. (pp. 26–27).

Nunes, M. C. L. (n.d) Triagem de fatores relevantes para o tempo de execução de processamento de bigdata: um estudo de caso com parâmetros de configuração no apache spark. Universidade Federal do Vale do São Francisco. http://www.univasf.edu.br/~tcc/000032/00003279.pdf.

Omelianchuk, K., Atrasevych, V., Chernodub, A., & Skurzhanskyi, O. (2020). GECToR–Grammatical Error Correction: Tag, Not Rewrite. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, (pp. 163– 170).

Packianather, M. S., Drake, P. R., & Rowlands, H. (2000). Optimizing the parameters of multilayered feedforward neural networks through Taguchi design of experiments. Quality and reliability engineering international, 16(6), 461-473.

Pais, M. S., Peretta, I. S., Yamanaka, K., & Pinto, E. R. (2014). Factorial design analysis applied to the performance of parallel evolutionary algorithms. Journal of the Brazilian Computer Society, 20, 1-17.

Raheja, V. & Alikaniotis, D. (2020). Adversarial grammatical error correction. Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020, (pp. 3075–3087).

Red Hat (2022). IaaS x PaaS x SaaS. RedHat. https://www.redhat.com/en/topics/cloud-computing/iaas-vs-paas-vs-saas.

Reese, W. (2008). Nginx: the high-performance web server and reverse proxy. Linux Journal, 2008(173):2.

Ribeiro, M., Grolinger, K., & Capretz, M. A. (2015). Mlaas: Machine learning as a service. In 2015 IEEE 14th international conference on machine learning and applications (ICMLA), (pp. 896–902). IEEE.

Ridge, E. (2007). Design of experiments for the tuning of optimisation algorithms [Doctoral Dissertation, University of York]. CiteSeerX. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=fab5db46541cf40cb89d6feee58a8cd3c7db4484

Rodrigues, J. B. (2020). Análise de fatores relevantes no desempenho de plataformas para processamento de big data: uma abordagem baseada em projeto de experimentos [Doctoral Dissertation, University of Pernambuco]. University of Pernambuco repository. https://repositorio.ufpe.br/bitstream/123456789/39207/1/TESE%20Jairson%20Barbosa%20Rodrigues.pdf

Rodrigues, J. B., Vasconcelos, G. C., & Maciel, P. R. (2020, December). Time and Cost Prediction Models for Language Classification Over a Large Corpus on Spark. 2020 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1702-1709). IEEE.

Rodrigues, J. B., Vasconcelos, G. C., & Maciel, P. R. (2021). Screening hardware and volume factors in distributed machine learning algorithms on spark: A design of experiments (doe) based approach. Computing, 103(10), 2203-2225.

Serain, D. A. N. I. E. L. (1995, October). Client/server: Why? What? How?. International Seminar on Client/Server Computing. Seminar Proceedings (Digest No. 1995/184) (Vol. 1, pp. 1-1). IET.

Singh, P. & Singh, P. (2021). Machine learning deployment as a web service. Deploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform, (pp. 67–90).

Staelin, C. (2003). Parameter selection for support vector machines. Hewlett-Packard Company, Tech. Rep. HPL-2002-354R1, 1.

United Nations Development Programme (2020). Human Development Report 2020 - The next frontier: Human development and the Anthropocene. UNDP. https://hdr.undp.org/system/files/documents/hdr2020pdf.pdf

Wang, F. & Hamdi, M. (2008). Matching the speed gap between sram and dram. 2008 International Conference on High Performance Switching and Routing, (pp. 104–109). IEEE.

Wang, Y. E., Wei, G.-Y., & Brooks, D. (2019). Benchmarking tpu, gpu, and cpu platforms for deep learning. arXiv preprint arXiv:1907.10701.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2020, October). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations (pp. 38-45).

Yu, G., Chen, P., & Zheng, Z. (2019). Microscaler: Automatic scaling for microservices with an online learning approach. In 2019 IEEE International Conference on Web Services (ICWS), (pp. 68–75). IEEE.

Yuan, Z. & Briscoe, T. (2016). Grammatical error correction using neural machine translation. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 Proceedings of the Conference, (pp. 380–386).


Texto completo: PDF