Foundations of Computer Science/The Internet and the Web
- 1 The Internet and the Web
- 2 The World Wide Web
- 3 Hyper-Text Markup Language (HTML)
- 4 Finding Information on the Web
- 5 How a Search Engine Works
The Internet and the Web
The Internet and the Web give us the ability to connect to countless resources and is molding the way our society utilizes technology for online storage and services. We will use principles previously learned to examine Internet and Web communication. The principles we will examine are:
- information can be encoded into messages
- a coordination system is a set of agents interacting with each other toward a common objective
- messages can hide information
A computer network is considered a communication sub-system that connects a group of computers enabling them to communicate with each other. When thinking of a computer network you must consider two parts that make it possible:
- network interface card (NIC) - required in order to connect to a local area network
- cabling or antennas - required to carry signals for transmission
- network switches - used to relay signals
- Programs - used to process information (bits) using algorithms
Similar to encoding procedures used for bits, the same idea of a standard must be used for networks. In order for communication to occur it requires a standard for devices, message format, and procedures of interactions. These standards provide ordered processes for communication.
Once we have the standards in place we can examine what actually makes a network tick. As stated previously, a computer network is made of two parts: hardware and software. The physical hardware sets the way for communication to travel, but does not enable the network. The software (programs) are the pieces that make a computer network to allow software-to-software communication.
The focus of this chapter will be on following three software standards:
- the Internet protocol suite
- layers of software
- abstraction being used for simplification
Knowing the definitions from the links provided will give you a foundation for the material in this chapter.
Stack of Protocols
When analyzing the protocols needed to allow communication over a network, we see that different protocols are layered to create levels of abstraction. These abstraction layers are used both for the upper and lower layers (see image below).
Let's say that Computer A wants to send a message to Computer B. Trace through the steps below to see how a message is sent via the two stacks of agents.
- Only A4 and B4 can access the physical mailboxes to send and receive packages
- A1 puts the message into packages
- A2 adds sequence numbers and tracking numbers to packages
- A3 adds address labels
- A4 puts the packages in the outbox
- packages arrive at B4’s inbox
- B3 accepts packages addressed to B
- B2 checks use sequence numbers to put the packages in order and acknowledges the packages using the tracking numbers to A2
- A2 re-sends a package unless acknowledged
- B1 opens the packages to reconstruct the original message
The network protocols work in the same way with A1 to A4 and B1 to B4 being software. The delivery mechanism used between A4 and B4 usually consists of metal wires, fiber optic cables, or radio waves in the air.
Previously, we established how information is transmitted using Computer A and Computer B. Two delivery mechanisms that are used today for communication between networks are circuit switching and packet-switching. When you think about a telephone network, this network requires a connection is established before communication can occur. For example, when you call someone, the phone rings until the other person picks up or voicemail initiates; this type of communication is known as synchronous communication.
The opposite is true for computer networks which use packet-switching. When using packet-switching, each packet (which is a small package of information) is individually addressed and delivered separately. The process mimics how mail packages are delivered via shared media, i.e. trucks, trains, ships, and airplanes. For instance, when you send a letter, you do not wait until the recipient is ready. This type of communication is called asynchronous communication.
We have seen different standards and/or protocols of the Internet.The following describes the different characteristics of the Internet which will be important when distinguishing the Internet from the Web.
- An infrastructure for communication (information highway)
- A global connection of computer networks using the Internet Protocol (IP)
- Uses layers of communication protocols: IP,TCP, HTTP/FTP/SSH
- Built on open standards: anyone can create a new internet device
- Lack of centralized control (mostly)
- Everyone can use it with simple, commonly available software
The World Wide Web
The World Wide Web is often confused with the Internet as it is used in conjunction with the Internet. The web is only one of the services provided through the Internet. It is important to know the characteristics of the Web (see below):
- A collection of distributed web pages or documents that can be fetched using the web protocol (HTTP-Hyper-Text Transfer Protocol)
- A service (application) that uses the Internet as a delivery mechanism
- Only one of the services that run on the Internet along with other services: email, file transfer, remote login, and etc.
There are two roles that work together to make up the web: Web servers and Web clients (browsers).
- Software that listens for web page requests and has access to stored web pages
- Apache, MS Internet Information Server (IIS)
Web clients (browsers)
- Software that fetches/displays documents fetched from web servers
- Firefox, Internet Explorer, Safari, Chrome
Uniform Resource Locator (URL)
The Uniform Resource Locator (URL) is an identifier for the location of a page on the web. The system of URLs is hierarchical (see image below).
- edu: a URL for a school (not .com or .org)
- www.sbuniv.edu: a URL for the Southwest Baptist University (SBU) website
- www.sbuniv.edu/COBACS/CIS/index.html: a URL to a page on SBU’s website under the path
Hyper-Text Markup Language (HTML)
The language used to define web pages is known as HTML. In order to view an example, open another tab and navigate to the Southwest Baptist University CIS Department website. Once you have the page open, right click on the page and select "View Source", this will allow you to see the HTML code that was used to create the web page. The web page itself may content hypertext (clickable text that serves as links). A link is just a defined URL that points to another web page. Web pages and links are what combine to form the Web.
Finding Information on the Web
It is important to note how to find information on the Web. Follow the steps below to see how this process works: Use a hierarchical system (directory) to find the URLs to pages that may have the information
- Use our knowledge to guess, e.g. start from apple.com to navigate to the page for iPhone 5s
- Use a search engine
-we look for information (wherever it is located) not pages -we may find information we did not know existed
How a Search Engine Works
One of the main sources for locating resources can be found using a search engine. However, have you ever thought about how they actually work? There is a series of steps that describe exactly what happens when a search engine is used:
- Gather information: crawl the web
- Keep copies: cache web pages
- Build an index
- Understand the query
- Determine the relevance of each possible result to the query
- Determine the ranking of the relevant results
- Present the results
Measure of Important Pages
Once a search is performed relevant pages are provided. However, not all relevant pages displayed are considered important. A web page does not gain importance until it has been ranked by credible sources. One of Google’s innovations is page rank - a measure of the “importance” of a page that takes into account the external references to it. A page is considered more important based on the number of important pages that link to that page. For example, an electronic article from the New York Times would have a higher level of importance or page rank than a personal blog due to the number of important pages linked to that online article.