TSL Blog

Performance Bottleneck on the Web

By TS Le

Introduction

In the early 1960s, the Internet was developed to link various computer networks that were able to function as a connected system that would not be disrupted by local disasters (Brookshear, 2012). In 1989, Berners-Lee introduced the World Wide Web (the Web) that allowed users to access resources in the form of interlinked hypertext language from any place at any time through the media Internet (Jacobs, & Walsh, 2004). Resources are conventional on the hypertext Web to describe Web pages, images, advertisements, product catalogs. The Web has evolved into five generations: (1) Web 1.0 for information connection, (2) Web 2.0 for social connection or verbalization, (3) Web 3.0 for knowledge content or semantic affiliation, (4) Web 4.0 for symbiotic integration, and (5) Web 5.0 for decentralized smart communication (Patel, 2013).

With the rapid Web growth, the data on the Web also increases exponentially and become more diverse, complex, ubiquitous, and sophisticated (Vermesan, & Friess, 2013). Accessing large-scale Web data sets for various applications pose challenges over the unique social-technical Web in all generations. This document will focus on the potential performance bottlenecks and report on the main sources of Web data, possible locations of bottleneck, analyzing root causes, and propose the solutions.

Data Sources

The Web is a unique socio-technical system that employs the inaction between humans, e.g., users’ behaviors and technological networks, e.g., complex infrastructures.

It is about joint optimization such as interrelatedness of social and technical aspects of an organization or the society as a whole (Trist, & Bamforth, 1951). The Web invented by Berners-Lee (1989) has evolved rapidly over five generations:

- Web 1.0 is the first implementation of the read-only web for information connection by 1996. Mostly data is the documents such as HTML documents, web pages owned by companies and organizations. The sources of these documents come from the Web-oriented organizations. People consume these documents.

- Web 2.0 is the second generation of the read/write Web to connect billions of users in 2006. It is also categorized as the Web of verbalization that managed and assembled large global crowd for common interests in social interaction. Data are Web pages, social messages, emails, texts, etc. whose sources of data are shared by communities or large groups of users. People publish the content and others consume it. Companies such as YouTube, Blogger, Wikipedia, etc. build and provide a framework for users who are data sources of providing the content.

- Web 3.0 is the third generation of the Semantic Web for data integration established in 2016. It is read/write and shared data used by trillions of users around the world. It connects knowledge for meaningful information that can be located, evaluated and delivered by software agents for portable or smart applications. The data sources are people-oriented applications for people’s interaction, e.g., human-to-human communication (H2H), human-to-machine communication (H2M), and machine-to-machine communication (M2M). The companies such as FaceBook, Google Maps, Amazon, Linkedin, Twitter, etc. create the platforms to let people provide content and build their applications.

- Web 4.0 is a symbiotic and ubiquitous Web or the Web of integration currently (2017). The social-technical Web with intelligent applications is powerful as the human brain in development of telecommunications on nanotechnology. The machines can read contents of the Web. The Web 4.0 is a read/write concurrency Web. Data sources are people, companies, IoT devices and sensors, intelligent machines such as robots without human control, intelligent applications, artificial intelligence products, migration of online functionality into the physical world.

- Web 5.0 is called as the web of decentralized smart communicator under development. It is the very early nascent stage of the Symbionet Web. People attempt to get interconnected through Smart Communicator such as Smart phones, Tables and Personal Robots. Data sources come from humans, intelligent devices, robots, personal servers, smart communicators, tablets and personal robots, etc. that can autonomously take action by themselves without human involvement or interference.

Where do performance bottlenecks occur?

Accessing the large-scale Web data flow poses a big challenge in performance such as bottlenecks. The bottleneck is a phenomenal problem where a small number of resources limit an entire system’s capacity or performance. Some of the typical performance bottlenecks in Web applications are listed below (Thomas, 2012):

- Extended response time of user

- Extended response time of server

- High CPU usage

- Invalid data returned

- HTTP errors (4xx, 5xx)

- Lots of open connections

- Lengthy queues of requests

- Memory leaks

- Extensive table scans of database

- Database deadlocks

- Pages unavailable

For Web 1.0 and partially 2.0, a bottleneck may occur in the network, storage fabric, within servers on the Web. Consequently, data flow is reduced significantly, and affects application performance, particularly for databases and transaction applications. It can freeze the entire system or even cause the applications to crash. For Web 2.0 and 3.0,

the performance bottleneck becomes a big problem on the network when Web users attempt to access large-scale Web data such as Big Data, IoT data for data analytics in the competitive Web-based market and data-driven economy (Rouse, 2007). The performance bottleneck often occurs at the interface between the gateway and wireless devices or sensors. Dynamic scheduler using particle optimization and the algorithm of better resources allocation also alleviate bottlenecks in the networks (Gubbi, Buyya, Marusic, & Palaniswami, 2013).

Root Causes

With data explosion at 44 zettabytes by 2020 (Vizard, 2014), some research companies find that 85% Web data has new data types; data increases ten times every five years; and data likely contains meaningful information (Juniper Research, 2016). The performance bottlenecks become a great threat to communication loss. The root causes of the performance bottleneck relate to the tremendous amount of data that are generated by humans and machines. The large-scale Web data includes traditional data, user-generated data, and commodity data created and consumed by self-empowered knowledge workers. Technology trend over computing hardware doubles every 18 months by Moore’s Law (Intel, 2015). Mobile computing requires accessing and producing data pervasively. Social networking produced user-generated data such as streaming data, texts massively. Cloud computing reduces the limitation storage. The Internet of Things (IoT) devices, sensors generate and collect data in trillion numbers. Therefore, one of the root causes is the massive volume of messages that causes load imbalance among network entities. Consequently, the load imbalance creates the bottleneck on the Web. Another root cause is the huge data generated by humans and machines, sensors, etc. in daily communication activities may flood the networks and create a bottleneck. A deadlock occurs when two programming tasks want to access the same Web data sets stored in one node in hierarchical groups. A coordinator of a hierarchical group may become a SPOFs (single point of failures) imminently and also potential performance bottleneck in the network (Sakr, & Gaber, 2014).

Proposed solutions

In the new trend regarding “performance bottleneck,” academic and research community has studied Web applications performance symptoms and bottlenecks identification. Researchers have raced to resolve the performance bottlenecks. Some typical solutions include reconfiguring, upgrading or replacing the offending device. For servers, upgrading CPU or memory, and replacing the aging single-CPU servers with dual or quad CPU servers may help. For network level, upgrading switches would work. Proactively monitoring traffic load trends over time and continuing to implement improvements can avoid the performance bottlenecks (Rouse, 2007).

A further study shows that the networked computers establish distributed systems for communication by passing messages from a beginning node to other destination nodes for three main reasons, i.e., high performance, low cost, and high quality of service (QoS). The proposed solutions for performance bottlenecks in distributed systems over the Web include: (1) the strategy of distributing or partitioning the workload, and (2) effective mapping of partitions.

1. Strategy of distributing and partitioning the workload

This strategy distributes and partitions the workload across computer nodes by co-locating communicating nodes together to reduce the pressure on the communicating network such as cloud or mobile networks. Consequently, the attempt also improves performance. It minimizes communication overheads while circumventing computation skew among machines. This technique makes great efforts to obtain effective partitioning of the work across machines that are co-located together.

2. Effective mapping of partitions.

This strategy of partitions machines should be done in a way that is totally aware of the underlying network topology. That means a message will hit the large number of switches before it reaches its destination. The strategy tends to reduce the number of hop nodes in rack topology of the cluster to decrease network latency and improve overall performance. The technique enables the relative locality of nodes to arrive at a reasonable inference regarding the rack topology of the cluster of nodes.

Conclusion

In summary, the document provided a brief overview of the Web evolution in five generations. It identified the main sources of the large-scale Web data in each Web version and discussed some typical places such as load imbalance or interface between wireless sensors and gateway where performance bottlenecks potentially occurred. The root causes of the performance bottlenecks were analyzed and two high-level strategies to eliminate these performance bottlenecks.

REFERENCES

Brookshear, G. (2012). Computer science: an overview. Addison-Wesley Longman Publishing Co., Inc.

Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7), 1645-1660.

Intel (2015). 50 years of moore’s law. Retrieved January 15, 2017 from http://www.intel.com/content/www/us/en/silicon-innovations/moores-law-technology.html

Juniper Research (2016). IoT – internet of transformation 2017. Retrieved January 11, 2017 from https://www.juniperresearch.com/document-library/white-papers/iot-~-internet-of-transformation-2017

Patel, K. (2013). Incremental journey for world wide web: introduced with web 1.0 to recent web 5.0 – a survey paper. International Journal of Advanced Research in Computer Science and Software Engineering, 3(10), 410–417.

Rouse, M. ( 2007). Bottleneck. Retrieved January 4th, 2017 from http://searchenterprisewan.techtarget.com/definition/bottleneck

Sakr, S., & Gaber, M. (Eds.). (2014). Large scale and big data: processing and management. CRC Press.

Thomas, T. (2012). Web applications performance symptoms and bottlenecks identification. Retrieved January 04, 2017 from http://www.agileload.com/agileload/blog/2012/11/27/web-applications-performance-symptoms-and-bottlenecks-identification

Trist, E. L., & Bamforth, K. W. (1951). Some social and psychological consequences of the Longwall method. Human relations, 4(3), 3-38.

Vermesan, O., & Friess, P. (2013). Internet of things: converging technologies for smart environments and integrated ecosystems. Retrieved September 04, 2016 from

http://www.internet-of-things research.eu/pdf/Converging_Technologies_for_Smart_Environments_and_Integrated_Ecosystems_IERC_Book_Open_Access_2013.pdf

Vizard, M. (2014). Digital universe expands at an alarming rate. Retrieved August 30, 2016 from http://www.cioinsight.com/it-strategy/storage/slideshows/digital-universe-expands-at-an-alarming-rate.html

TSL Blog

Thursday, February 9, 2017

No comments:

Post a Comment