A new dataset for coffee rust detection in Colombian crops base on classifiers
DOI:
https://doi.org/10.18046/syt.v12i29.1802Keywords:
Coffee Rust, Classifier, SVR, BPNN, M5Abstract
Coffee production is the main agricultural activity in Colombia. More than 350.000 Colombian families depend on coffee harvest. Since coffee rust disease was first reported in the country in 1983, these families have had to face severe consequences. Recently, machine learning approaches have built a dataset for monitoring coffee rust incidence that involves weather conditions and physic crop properties. This background encouraged us to build a dataset for coffee rust detection in Colombian crops through data mining process as Cross Industry Standard Process for Data Mining (CRISP-DM). In this paper we define a proper data to generate accurate models; once the dataset is built, this is tested using classifiers as: Support Vector Regression, Backpropagation Neural Networks and Regression Trees.References
Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks. Decision Support Systems, 45(1), 110-122. doi: http://dx.doi.org/10.1016/j.dss.2007.12.002
Armstrong, J.S. & Collopy, F. (1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8(1), 69-80. doi: http://dx.doi.org/10.1016/0169-2070(92)90008-W
Balasundaram, S. & Gupta, D. (2014). Training Lagrangian twin support vector regression via unconstrained convex minimization. Knowledge-Based Systems, 59(0), 85-96. doi: http://dx.doi.org/10.1016/j.knosys.2014.01.018
Becker, S. (1979) La propagación de la roya del cafeto: Eschborn, Alemania GTZ.
Bonakdar, L. & Etemad-Shahidi, A. (2011). Predicting wave run-up on rubble-mound structures using M5 model tree. Ocean Engineering, 38(1), 111-118. doi: http://dx.doi.org/10.1016/j.oceaneng.2010.09.015
Cintra, M.E., Meira, C.A.A., Monard, M.C., Camargo, H.A., & Rodrigues, L.H.A. (2011, 22-24 Nov. 2011). The use of fuzzy decision trees for coffee rust warning in Brazilian crops. Paper presented at the Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on.
Cristianini, N. & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods: Cambridge, UK: Cambridge University Press.
Dietterich, T.G. (2000). An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Mach. Learn., 40(2), 139-157. doi: 10.1023/a:1007607513941
Ghosh, J. (2002). Multiclassifier systems: back to the future. Lecture Notes in Computer Sciences [Third International Workshop, MCS 2002 Cagliari, Italy, June 24-26, 2002 Proceedings], 2364, 1-15
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1), 10-18. doi: 10.1145/1656274.1656278
Haykin, S.S. ( 2003). Neural networks: a comprehensive foundation: Prentice Hall.
Huitema, B.E. (1980). The Analysis of Covariance and Alternatives: John Wiley & Sons.
Hyndman, R.J. & Koehler, A.B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679-688. doi: http://dx.doi.org/10.1016/j.ijforecast.2006.03.001
Kim, Y. & Street, W.N. (2004). An intelligent system for customer targeting: a data mining approach. Decision Support Systems, 37(2), 215-228. doi: http://dx.doi.org/10.1016/S0167-9236(03)00008-3
Li, L., Zou, B., Hu, Q., Wu, X., & Yu, D. (2013). Dynamic classifier ensemble using classification confidence. Neurocomputing, 99(0), 581-591. doi: http://dx.doi.org/10.1016/j.neucom.2012.07.026
Luaces, O., Rodrigues, L.H.A., Alves-Meira, C.A., & Bahamonde, A. (2011). Using nondeterministic learners to alert on coffee rust disease. Expert Systems with Applications, 38(11), 14276-14283. doi: http://dx.doi.org/10.1016/j.eswa.2011.05.003
Luaces, O., Rodrigues, L.H.A., Meira, C.A.A., Jos, #233, Quevedo, R., & Bahamonde, A. (2010). Viability of an alarm predictor for coffee rust disease using interval regression. In Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems, Cordoba, Spain, [Vol. 2] (pp.337-346]. Berlin, Alemania: Springer-Varlag
Mannino, M., Yang, Y., & Ryu, Y. (2009). Classification algorithm sensitivity to training data with non representative attribute noise. Decision Support Systems, 46(3), 743-751. doi: http://dx.doi.org/10.1016/j.dss.2008.11.021
Meira, C., Rodrigues, L., & Moraes, S. (2008). Análise da epidemia da ferrugem do cafeeiro com árvore de decisão. Tropical Plant Pathology, 33(2), 114-124.
Meira, C.A.A., & Rodrigues, L.H.A. (2009). Árvore de decisão na análise de epidemias da ferrugem do cafeeiro [Paper - VI Simpósio de Pesquisa dos Cafés do Brasil]. Retrieved from: http://www.sbicafe.ufv.br/bitstream/handle/10820/3466/56.pdf?sequence=2
Meira, C.A.A., Rodrigues, L.H.A., & Moraes, S.A.d. (2009). Modelos de alerta para o controle da ferrugem-do-cafeeiro em lavouras com alta carga pendente. Pesquisa Agropecuária Brasileira, 44, 233-242.
Monedero, I., Biscarri, F., León, C., Guerrero, J. I., Biscarri, J., & Millán, R. (2012). Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees. International Journal of Electrical Power & Energy Systems, 34(1), 90-98. doi: http://dx.doi.org/10.1016/j.ijepes.2011.09.009
Opitz, D. & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.
Pérez-Ariza, C.B., Nicholson, A.E., & Flores, M.J. (2012). Prediction of Coffee Rust Disease Using Bayesian Networks, Proceedings of the Sixth European Workshop on Probabilistic Graphical Models, (pp.259-266). Available at http://arrow.monash.edu.au/hdl/1959.1/821316
Poh, H.L. (1991). A neural network approach for marketing strategies research and decision support [Ph.D Thesis], Stanford University
Ranawana, R. & Palade, V. (2006). Multi-Classifier systems: Review and a roadmap for developers. Int. J. Hybrid Intell. Syst., 3(1), 35-61
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-Validation. In L. Liu & T. Özsu [Eds.], Encyclopedia of Database Systems (pp. 532-538): Springer
Rivillas-Osorio, C., Serna-Giraldo, C., Cristancho-Ardila, M., & Gaitán-Bustamante, A. (2011). La roya del cafeto en Colombia, impacto, manejo y costos de control. In S. Marín [Ed.], Avances Tecnicos Cenicafe. Chinchiná, Colombia: Cenicafé
Shieber, E. & Zentmyer, G. A. (1984). Coffee rust in the western hemisphere Plant disease, 68, 89-93
Smola, A. & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199-222. doi: 10.1023/b:stco.0000035301.49549.88
Suhasini, A., Palanivel, S., & Ramalingam, V. (2011). Multimodel decision support system for psychiatry problem. Expert Systems with Applications, 38(5), 4990-4997. doi: http://dx.doi.org/10.1016/j.eswa.2010.09.152
Vapnik, V.N. ( 2000). The nature of statistical learning theory. New York, NY: Springer.
Vapnik, V.N. (1999). An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5), 988-999. doi: 10.1109/72.788640
Wang, Y., & Witten, I.H. (1996). Induction of model trees for predicting continuous classes. Working Paper Series, 96(23). Retrieved from de http://www.cs.waikato.ac.nz/pubs/wp/1996/uow-cs-wp-1996-23.pdf
Wei, C.-P., Chen, H.-C., & Cheng, T.-H. (2008). Effective spam filtering: A single-class learning and ensemble approach. Decision Support Systems, 45(3), 491-503. doi: http://dx.doi.org/10.1016/j.dss.2007.06.010
Wirth, R. (2000). CRISP-DM: Towards a standard process model for data mining. In Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, Manchester, UK, (pp29-39).
Zapata, J.C. & Ruíz, G.M. (1988). La variedad Colombia: selección de un cultivar compuesto resistente a la roya del cafeto [Premio Nacional de Ciencias, Fundación Alejandro Angel Escobar, 1986]. Chinchiná, Colombia: Cenicafé
Zhang, D., & Tsai, J. J. P. (2007). Advances in MacHine learning applications in software engineering: Hershey, PA: Idea
Zhu, D. (2010). A hybrid approach for efficient ensembles. Decision Support Systems, 48(3), 480-487. doi: http://dx.doi.org/10.1016/j.dss.2009.06.007
Armstrong, J.S. & Collopy, F. (1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8(1), 69-80. doi: http://dx.doi.org/10.1016/0169-2070(92)90008-W
Balasundaram, S. & Gupta, D. (2014). Training Lagrangian twin support vector regression via unconstrained convex minimization. Knowledge-Based Systems, 59(0), 85-96. doi: http://dx.doi.org/10.1016/j.knosys.2014.01.018
Becker, S. (1979) La propagación de la roya del cafeto: Eschborn, Alemania GTZ.
Bonakdar, L. & Etemad-Shahidi, A. (2011). Predicting wave run-up on rubble-mound structures using M5 model tree. Ocean Engineering, 38(1), 111-118. doi: http://dx.doi.org/10.1016/j.oceaneng.2010.09.015
Cintra, M.E., Meira, C.A.A., Monard, M.C., Camargo, H.A., & Rodrigues, L.H.A. (2011, 22-24 Nov. 2011). The use of fuzzy decision trees for coffee rust warning in Brazilian crops. Paper presented at the Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on.
Cristianini, N. & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods: Cambridge, UK: Cambridge University Press.
Dietterich, T.G. (2000). An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Mach. Learn., 40(2), 139-157. doi: 10.1023/a:1007607513941
Ghosh, J. (2002). Multiclassifier systems: back to the future. Lecture Notes in Computer Sciences [Third International Workshop, MCS 2002 Cagliari, Italy, June 24-26, 2002 Proceedings], 2364, 1-15
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1), 10-18. doi: 10.1145/1656274.1656278
Haykin, S.S. ( 2003). Neural networks: a comprehensive foundation: Prentice Hall.
Huitema, B.E. (1980). The Analysis of Covariance and Alternatives: John Wiley & Sons.
Hyndman, R.J. & Koehler, A.B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679-688. doi: http://dx.doi.org/10.1016/j.ijforecast.2006.03.001
Kim, Y. & Street, W.N. (2004). An intelligent system for customer targeting: a data mining approach. Decision Support Systems, 37(2), 215-228. doi: http://dx.doi.org/10.1016/S0167-9236(03)00008-3
Li, L., Zou, B., Hu, Q., Wu, X., & Yu, D. (2013). Dynamic classifier ensemble using classification confidence. Neurocomputing, 99(0), 581-591. doi: http://dx.doi.org/10.1016/j.neucom.2012.07.026
Luaces, O., Rodrigues, L.H.A., Alves-Meira, C.A., & Bahamonde, A. (2011). Using nondeterministic learners to alert on coffee rust disease. Expert Systems with Applications, 38(11), 14276-14283. doi: http://dx.doi.org/10.1016/j.eswa.2011.05.003
Luaces, O., Rodrigues, L.H.A., Meira, C.A.A., Jos, #233, Quevedo, R., & Bahamonde, A. (2010). Viability of an alarm predictor for coffee rust disease using interval regression. In Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems, Cordoba, Spain, [Vol. 2] (pp.337-346]. Berlin, Alemania: Springer-Varlag
Mannino, M., Yang, Y., & Ryu, Y. (2009). Classification algorithm sensitivity to training data with non representative attribute noise. Decision Support Systems, 46(3), 743-751. doi: http://dx.doi.org/10.1016/j.dss.2008.11.021
Meira, C., Rodrigues, L., & Moraes, S. (2008). Análise da epidemia da ferrugem do cafeeiro com árvore de decisão. Tropical Plant Pathology, 33(2), 114-124.
Meira, C.A.A., & Rodrigues, L.H.A. (2009). Árvore de decisão na análise de epidemias da ferrugem do cafeeiro [Paper - VI Simpósio de Pesquisa dos Cafés do Brasil]. Retrieved from: http://www.sbicafe.ufv.br/bitstream/handle/10820/3466/56.pdf?sequence=2
Meira, C.A.A., Rodrigues, L.H.A., & Moraes, S.A.d. (2009). Modelos de alerta para o controle da ferrugem-do-cafeeiro em lavouras com alta carga pendente. Pesquisa Agropecuária Brasileira, 44, 233-242.
Monedero, I., Biscarri, F., León, C., Guerrero, J. I., Biscarri, J., & Millán, R. (2012). Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees. International Journal of Electrical Power & Energy Systems, 34(1), 90-98. doi: http://dx.doi.org/10.1016/j.ijepes.2011.09.009
Opitz, D. & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.
Pérez-Ariza, C.B., Nicholson, A.E., & Flores, M.J. (2012). Prediction of Coffee Rust Disease Using Bayesian Networks, Proceedings of the Sixth European Workshop on Probabilistic Graphical Models, (pp.259-266). Available at http://arrow.monash.edu.au/hdl/1959.1/821316
Poh, H.L. (1991). A neural network approach for marketing strategies research and decision support [Ph.D Thesis], Stanford University
Ranawana, R. & Palade, V. (2006). Multi-Classifier systems: Review and a roadmap for developers. Int. J. Hybrid Intell. Syst., 3(1), 35-61
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-Validation. In L. Liu & T. Özsu [Eds.], Encyclopedia of Database Systems (pp. 532-538): Springer
Rivillas-Osorio, C., Serna-Giraldo, C., Cristancho-Ardila, M., & Gaitán-Bustamante, A. (2011). La roya del cafeto en Colombia, impacto, manejo y costos de control. In S. Marín [Ed.], Avances Tecnicos Cenicafe. Chinchiná, Colombia: Cenicafé
Shieber, E. & Zentmyer, G. A. (1984). Coffee rust in the western hemisphere Plant disease, 68, 89-93
Smola, A. & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199-222. doi: 10.1023/b:stco.0000035301.49549.88
Suhasini, A., Palanivel, S., & Ramalingam, V. (2011). Multimodel decision support system for psychiatry problem. Expert Systems with Applications, 38(5), 4990-4997. doi: http://dx.doi.org/10.1016/j.eswa.2010.09.152
Vapnik, V.N. ( 2000). The nature of statistical learning theory. New York, NY: Springer.
Vapnik, V.N. (1999). An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5), 988-999. doi: 10.1109/72.788640
Wang, Y., & Witten, I.H. (1996). Induction of model trees for predicting continuous classes. Working Paper Series, 96(23). Retrieved from de http://www.cs.waikato.ac.nz/pubs/wp/1996/uow-cs-wp-1996-23.pdf
Wei, C.-P., Chen, H.-C., & Cheng, T.-H. (2008). Effective spam filtering: A single-class learning and ensemble approach. Decision Support Systems, 45(3), 491-503. doi: http://dx.doi.org/10.1016/j.dss.2007.06.010
Wirth, R. (2000). CRISP-DM: Towards a standard process model for data mining. In Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, Manchester, UK, (pp29-39).
Zapata, J.C. & Ruíz, G.M. (1988). La variedad Colombia: selección de un cultivar compuesto resistente a la roya del cafeto [Premio Nacional de Ciencias, Fundación Alejandro Angel Escobar, 1986]. Chinchiná, Colombia: Cenicafé
Zhang, D., & Tsai, J. J. P. (2007). Advances in MacHine learning applications in software engineering: Hershey, PA: Idea
Zhu, D. (2010). A hybrid approach for efficient ensembles. Decision Support Systems, 48(3), 480-487. doi: http://dx.doi.org/10.1016/j.dss.2009.06.007
Downloads
Published
2014-06-30
Issue
Section
Original Research
License
This journal is licensed under the terms of the CC BY 4.0 licence (https://creativecommons.org/licenses/by/4.0/legalcode).