|Components and Development in Big Data System: A Survey
|Jing-Huan Yu, Zi-Meng Zhou
|Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
|  A. Sheth, "Transforming big data into smart data:Deriving value via harnessing volume, variety, and velocity using semantic techniques and technologies," in Proc. of IEEE 30th Intl. Conf. on Data Engineering, 2014, p. 2.
 J. Dean and S. Ghemawat, "MapReduce:Simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008
 M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark:Cluster computing with working sets," in Proc. of the 2nd USENIX Conf. on Hot Topics in Cloud Computing, 2010, pp. 1-7.
 A. E. W. Johnson, M. M. Ghassemi, S. Nemati, K. E. Niehaus, D. A. Clifton, and G. D. Clifford, "Machine learning and decision support in critical care," Proc. of the IEEE, vol. 104, no. 2, pp. 444-466, 2016
 F.-H. Guan, D.-M. Zhao, X. Zhang, B.-T. Shan, and Z. Liu, "Study on the intelligent decision support system for power grid dispatching," in Proc. of the Intl. Conf. on Sustainable Power Generation and Supply, 2009, pp. 1-4.
 X. Wang, W. Dou, Z. Ma, et al., "I-SI:Scalable architecture for analyzing latent topical-level information from social media data," Computer Graphics Forum, vol. 31, no. 3, pp. 1275-1284, 2012
 S.-L. He, J.-M. Zhu, P.-J. He, and M. R. Lyu, "Experience report:System log analysis for anomaly detection," in Proc. of IEEE 27th Intl. Symposium on Software Reliability Engineering, 2016, pp. 207-218.
 J. McHugh, P. E. Cuddihy, J. W. Williams, K. S. Aggour, V. S. Kumar, and V. Mulwad, "Integrated access to big data polystores through a knowledge-driven framework," in Proc. of IEEE Intl. Conf. on Big Data, 2017, pp. 1494-1503.
 A. Alexandrov, R. Bergmann, S. Ewen, et al., "The stratosphere platform for big data analytics," The VLDB Journal, vol. 23, no. 6, pp. 939-964, 2014
 F. Rahman, M. Slepian, and A. Mitra, "A novel big-data processing framwork for healthcare applications:Big-data-healthcare-in-a-box," in Proc. of IEEE Intl. Conf. on Big Data, 2016, pp. 3548-3555.
 C. Roy, S. S. Rautaray, and M. Pandey, "Big data optimization techniques:A survey," Intl. Journal of Information Engineering and Electronic Business, vol. 10, no. 4, pp. 41-48, 2018
 D. Agrawal, S. Chawla, B. Contreras-Rojas, et al., "RHEEM:Enabling cross-platform data processing:May the big data be with you!" Proc. of the VLDB Endowment, vol. 11, no. 11, pp. 1414-1427, 2018.
 D. Agrawal, S. Chawla, A. K. Elmagarmid, et al., "Road to freedom in big data analytics," in Proc. of the 19th Intl. Conf. on Extending Database Technology, 2016, pp. 479-484.
 I. Gog, M. Schwarzkopf, N. Crooks, M. P. Grosvenor, A. Clement, and S. Hand, "Musketeer:All for one, one for all in data processing systems," in Proc. of the 10th European Conf. on Computer Systems, 2015, DOI:10.1145/2741948.2741968
 P. Nikitopoulos, A. Vlachou, C. Doulkeridis, and G. A. Vouros, "DiSTrDF:Distributed spatio-temporal RDF queries on Spark," in Proc. of Workshops of the EDBT/ICDT 2018 Joint Conf., 2018, pp. 125-132.
 L.-Y. Lu, T. S. Pillai, H. Gopalakrishnan, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, "Wisckey:Separating keys from values in SSD-conscious storage," ACM Trans. on Storage (TOS), vol. 13, no. 1, pp. 5:1-28, 2017
 R. Barber, C. Garcia-Arellano, R. Grosman, et al. (2017). Evolving databases for new-gen big data applications.[Online]. Available:https://pdfs.semanticscholar.org/ebc5/2e776b09cf02b063f212a765a0952dc0eff1.pdf
 R. D. Chamberlain, M. A. Franklin, R. S. Indeck, and R. K. Cytron, "Intelligent data storage and processing using FPGA devices," U.S. Patent 15388498, April 13, 2017.
 Y. M. Zaharia, R.-S. Xin, P. Wendell, et al., "Apache Spark:A unified analytics engine for large-scale data processing", Communications of the ACM, vol. 59, no. 11, 2016, DOI:10.1145/2934664
 M. Abadi, P. Barham, J.-M. Chen, et al., "TensorFlow:A system for large-scale machine learning," in Proc. of the 12th USENIX Conf. on Operating Systems Design and Implementation, 2016, pp. 265-283.
 B. C. Ooi, K.-L. Tan, S. Wang, et al., "SINGA:A distributed deep learning platform," in Proc. of the 23rd ACM Intl. Conf. on Multimedia, 2015, pp. 685-688.
 X.-R. Meng, J. Bradley, B. Yavuz, et al., "MLlib:Machine learning in Apache Spark," The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1235-1241, 2016
 B. Feldbaum, "Method and system for access control of a message queue," U.S. Patent 6446206, September 3, 2002.
 Oracle. (2015). Oracle goldengate for big data 12c.[Online]. Available:https://www.oracle.com/middleware/data-integration/goldengate/big-data/index.html
 A. Prahlad and J. Schwartz, "Systems and methods for performing storage operations using network attached storage," U.S. Patent 7546324, June 9, 2009.
 Tableau Inc. (2015). Transform your business with ask data the future of analytics starts now.[Online]. Available:https://www.tableau.com/
 Amazon Web services (AWS). (2018). Cloud computing services.[Online]. Available:https://aws.amazon.com/?nc1=hls
 Tencent cloud. (2018). Get more with tencent cloud.[Online]. Available:https://intl.cloud.tencent.com/
 Presto. (2018). Distributed SQL query engine for big data.[Online]. Available:https://prestodb.io/
 M. Hausenblas and J. Nadeau, "Apache drill:Interactive Ad-hoc analysis at scale," Big Data, vol. 1, no. 2, pp. 100-104, 2013
 S. Melnik, A. Gubarev, J.-J. Long, et al., "Dremel:Interactive analysis of Web-scale datasets," Proc. of the VLDB Endowment, vol. 3, no. 1-2, pp. 330-339, 2010
 M. Kornacker and J. Erickson. (2012). Cloudera impala:Real time queries in Apache Hadoop, for real.[Online]. Available:http://Com/blog/201210/cloudera impala real time queries Apache Hadoop real
 Apache Organization. (2018). Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.[Online]. Available:https://impala.apache.org/index.html
 D. Borthakur, "HDFS architecture guide," Hadoop Apache Project, vol. 53, pp. 1-13, 2008
 A. Thusoo, Z. Shao, S. Anthony, et al., "Data warehousing and analytics infrastructure at Facebook," in Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, 2010, pp. 1013-1020.
 Facebook. (2014). Scribe:Scribe is a server for aggregating log data streamed in real time from a large number of servers.[Online]. Available:https://github.com/facebookarchive/scribe
 K. Chodorow, MongoDB:The Definitive Guide:Powerful and Scalable Data Storage, 2nd ed. Beijing:O'Reilly Media, 2013.
 M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel, "Amazon s3 for science grids:A viable solution?" in Proc. of the Intl. Workshop on Data-aware Distributed Computing, 2008, pp. 55-64.
 J. Dean, "Challenges in building large-scale information retrieval systems:Invited talk," in Proc. of the 2nd ACM Intl. Conf. on Web Search and Data Mining, 2009, p. 1.
 C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, "Pig Latin:A not-so-foreign language for data processing," in Proc. of ACM SIGMOD Intl. Conf. on Management of Data, 2008, pp. 1099-1110.
 A. Thusoo, J. S. Sarma, N. Jain, et al., "Hive:A warehousing solution over a map-reduce framework," Proc. of the VLDB Endowment, vol. 2, no. 2, pp. 1626-1629, 2009
 H.-K. Chen, F.-Z. Wang, and N. Helian, "Entropy4cloud:Using entropy-based complexity to optimize cloud service resource management," IEEE Trans. on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 13-24, 2018
 N. Elgendy and A. Elragal, "Big data analytics:A literature review paper," in Proc. of Industrial Conf. on Data Mining, 2014, pp. 214-227.
 K. K. Das, E. Fratkin, A. Gorajek, K. Stathatos, and M. Gajjar, "Massively parallel in-database predictions using PMML," in Proc. of Workshop on Predictive Markup Language Modeling, 2011, pp. 22-27.
 M. Armbrust, R.-S. Xin, C. Lian, et al., "Spark SQL:Relational data processing in Spark," in Proc. of ACM SIGMOD Intl. Conf. on Management of Data, 2015, pp. 1383-1394.
 S. V. Ranawade, S. Navale, and A. Dhamal. (2016). Analytical processing on Hadoop using Apache Kylin.[Online]. Available:http://www.ijais.org/archives/volume12/number2/ranawade-2017-ijais-451682.Pdf
 Apache KylinTM. (2018). Apache KylinTM home.[Online]. Available:http://kylin.apache.org/.TensorFlow
 TensorFlow. (2018). Introduction to TensorFlow. TensorFlow makes it easy for beginners and experts to create machine learning models for desktop, mobile, Web, and cloud.[Online]. Available:https://www.tensorflow.org/learn
 J. Sanders and E. Kandrot, CUDA by Example:An Introduction to General-Purpose GPU Programming, Upper Saddle River:Addison-Wesley Professional, 2010.
 Cuda Zone. (2018). NVIDIA developer.[Online]. Aailable:https://developer.nvidia.com/coda-zone.html
 Apache Spark. (2018). MLlib:Main guide—Spark 2.4.0 documentation.[Online]. Available:https://spark.apache.org/docs/latest/ml-guide.html
 J.-H. Chen and K.-C. Liu, "On-line batch process monitoring using dynamic PCA and dynamic PLS models," Chemical Engineering Science, vol. 57, no. 1, pp. 63-75, 2002
 ApacheTM Hadoop. (2018). ApacheTM hadoop®.[Online]. Available:http://hadoop.apache.org/
 Apache storm. (2018). Apache storm.[Online]. Available:http://storm.apache.org/
 V. C. Kaggal, R. K. Elayavilli, S. Mehrabi, et al., "Toward a learning health-care system-knowledge delivery at the point of care empowered by big data and NLP," Biomedical Informatics Insights, vol. 8, no. S1, pp. 13-22, 2016
 F. Chang, J. Dean, S. Ghemawat, et al., "Bigtable:A distributed storage system for structured data," ACM Trans. on Computer Systems (TOCS), vol. 26, no. 2, p. 4, 2008.
 RocksDB. (2018). A persistent key-value store for fast storage environments.[Online]. Available:https://rocksdb.org/
 L.-Q. Xu, S.-L. Huang, S.-L. Hui, A. J. Elmore, and A. Parameswaran, "OrpheusDB:A lightweight approach to relational dataset versioning," in Proc. ACM Intl. Conf. on Management of Data, 2017, pp. 1655-1658.
 S. Wang, T. T. A. Dinh, Q. Lin, et al. (2018). Forkbase:An efficient storage engine for blockchain and forkable applications.[Online]. Available:http://dl.acm.org/citation.cfm?id=3242934
 Apache Flume. (2018). Welcome to Apache Flume.[Online]. Available:https://flume.apache.org/
 Apache Kafka. (2018). Apache Kafka® is a distributed streaming platform. What exactly does that mean?[Online]. Available:https://kafka.apache.org/
 Apache Solr. (2018). Solr is the popular, blazing-fast, open source enterprise search platform built on Apache LuceneTM.[Online]. Available:http://lucene.apache.org/solr/
 MatiasBjorling. (2018). Open-channel solid state drives.[Online]. Available:https://openchannelssd.readthedocs.io/en/latest/
 Micron. (2018). 3D XPointTM technology.[Online]. Available:https://www.micron.com/products/advanced-solutions/3d-xpoint-technology
 S.-M. Wu, K.-H. Lin, and L.-P. Chang, "KVSSD:Close integration of LSM trees and flash translation layer for write-efficient KV store," in Proc. of Design, Automation & Test in Europe Conf. & Exhibition, 2018, pp. 563-568.
 J. Xu and S. Swanson, "Nova:A log-structured file system for hybrid volatile/non-volatile main memories," in Proc. of the 14th USENIX Conf. on File and Storage Technologies, 2016, pp. 323-338.
 J. Cong, Z.-M. Fang, M.-H. Huang, L.-B. Wang, and D. Wu, "CPU-FPGA coscheduling for big data applications," IEEE Design & Test, vol. 35, no. 1, pp. 16-22, 2018
 F. Firouzi, A. M. Rahmani, K. Mankodiya, et al., "Internet-of-things and big data for smarter healthcare:from device to architecture, applications and analytics," Future Generation Computer Systems, vol. 78, pp. 583-586, 2018
 M. Elhoseny, A. Abdelaziz, A. S. Salama, A. M. Riad, K. Muhammad, and A. K. Sangaiah, "A hybrid model of Internet of things and cloud computing to manage big data in health services applications," Future Generation Computer Systems, vol. 86, pp. 1383-1394, 2018
 H. Vo, Y.-H. Liang, J. Kong, and F.-S. Wang, "iSPEED:A scalable and distributed in-memory based spatial query system for large and structurally complex 3D data," Proc. of the VLDB Endowment, vol. 11, no. 12, pp. 2078-2081, 2018.
 B. Salimi, C. Cole, P. Li, J. Gehrke, and D. Suciu, "HypDB:A demonstration of detecting, explaining and resolving bias in OLAP queries," Proc. of the VLDB Endowment, vol. 11, no. 12, pp. 2062-2065, 2018
 E. Bortnikov, A. Braginsky, E. Hillel, I. Keidar, and G. Sheffi, "Accordion:Better memory organization for LSM key-value stores," Proc. of the VLDB Endowment, vol. 11, no. 12, pp. 1863-1875, 2018
 I. Nadareishvili, R. Mitra, M. McLarty, et al., Microservice Architecture:Aligning Principles, Practices, and Culture, Sebastopol:O'Reilly Media, Inc., 2016.
||Jia Wu, Xiu-Yun Chen, Hao Zhang, Li-Dong Xiong, Hang Lei, Si-Hao Deng. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization[J]. Journal of Electronic Science and Technology, 2019, 17(1): 26-40.
||Zhi-Ren Tsai, Yau-Zen Chang. Enhancing Design of Visual-Servo Delayed System[J]. Journal of Electronic Science and Technology, 2018, 16(3): 232-240.
||Kein Huat Chua, Yun Seng Lim, Jee Xiong Chew. Economic Assessment of Standby Diesel Generator for Peak Reduction in Commercial and Industrial Buildings: A Case Study in Malaysia[J]. Journal of Electronic Science and Technology, 2017, 15(4): 400-406.
||Cui-Cui Du, Xu-Gang Feng, Jia-Yan Zhang. Improved Bacterial Foraging Optimization Algorithm Based on Fuzzy Control Rule Base[J]. Journal of Electronic Science and Technology, 2017, 15(3): 283-288.
||Shuo-Fu Yen, Jiann-Jone Chen, Yao-Hong Tsai. Efficient Cloud Image Retrieval System Using Weighted-Inverted Index and Database Filtering Algorithms[J]. Journal of Electronic Science and Technology, 2017, 15(2): 161-168.
||Ming-Shen Jian, Jun-Hong Shen, Kuan-Wei Lee, Yi-Chen Jhou, Chien-Tang Mai. Cloud Based Software Computing as a Service in Hybrid Evolution Algorithm with Feedback Assistance[J]. Journal of Electronic Science and Technology, 2016, 14(4): 362-369.
||Roman Kulikov, Svetlana Kolesnikova. Evaluation of Hypervisor Stability towards Insider Attacks[J]. Journal of Electronic Science and Technology, 2016, 14(1): 37-42.
||Deepranjan Dongol, Elmar Bollin, Thomas Feldmann. An Overview to the Concept of Smart Coupling and Battery Management for Grid Connected Photovoltaic Battery System[J]. Journal of Electronic Science and Technology, 2015, 13(4): 367-372.
||Jen-Jee Chen, Zheng-Xun Jiang, Yue-Liang Chen, Wen-Tai Wu, Jia-Ming Liang. Design and Realization of an NFC-Driven Smart Home System to Support Intruder Detection and Social Network Integration[J]. Journal of Electronic Science and Technology, 2015, 13(2): 163-168.
||Jing Yang, Shan He, Jia-Yu Zhao, Lan-Jun Guo, Wei-Wei Liu. Polarization-Dependent Optimization of Fiber-Coupled Terahertz Time-Domain Spectroscopy System[J]. Journal of Electronic Science and Technology, 2015, 13(1): 2-5.
||Wen-Lung Shiau. An Evolution, Present, and Future Changes of Cloud Computing Services[J]. Journal of Electronic Science and Technology, 2015, 13(1): 54-59.
||Hao-Ye Zhang, Jin-Ping Mei, Shi-Bing Zhang. Resource Allocation Algorithm Based on PSO-GA for Multi-User OFDM System[J]. Journal of Electronic Science and Technology, 2015, 13(1): 68-72.
||Jhih-Chung Chang, Chih-Chang Shen. Blind Decorrelating Detection Based on Particle Swarm Optimization under Spreading Code Mismatch[J]. Journal of Electronic Science and Technology, 2014, 12(3): 288-292.
||Wissal Drira, Faouzi Ghorbel. Decision Bayes Criteria for Optimal Classifier Based on Probabilistic Measures[J]. Journal of Electronic Science and Technology, 2014, 12(2): 216-219.
||Jia-Zhou Liu, Zhi-Qin Zhao, Zi-Yuan He, Qing-Huo Liu. DOA and Power Estimation Using Genetic Algorithm and Fuzzy Discrete Particle Swarm Optimization[J]. Journal of Electronic Science and Technology, 2014, 12(1): 71-75.