Knowledge Science

From Wikibooks, the open-content textbooks collection

(Redirected from Knowledge science)
Jump to: navigation, search

Knowledge Science is the discipline of understanding the mechanics through which humans and software-based machines "know," "learn," "change," and "adapt" their own behaviors. Throughout recorded history, knowledge has been made explicit through symbols, text and graphics on media such as clay, stone, papyrus, paper and most recently, as digitally stored representations. The digital effort began in the early 1970's when knowledge science was recognized as a vigorous field of study beginning with the development of natural language learning programs funded by the National Science Foundation (NSF). Now, knowledge science experts are engaged in a debate between meaning as represented by language-based propositions that adhere to universal truth-conditions, and the quantum relativist view, that meaning exists as a condition under which it can be verified and certified as acceptable without regard to universal truth-conditions. Knowledge science and knowledge representations encompass philosophical, epistemological, and ontological considerations. This articles presents an overview of this new field of study, relying on historical data to provide insight and understanding, while it addresses the two schools of knowledge science.

Contents

[edit] Early Beginnings

The challenge of digitally communicating knowledge (as opposed to simple who, what, when, where and how much data and information) started in 1969/70 under the sponsorship if the National Science Foundation (NSF). Three initial projects were founded for the study of "Natural Language Processing." These projects included the University of California, Irvine Physics Computer Development Project, headed by Alfred Bork and Research Assistant, Richard L. Ballard. The Mitre Ticit Project conducted at the University of Texas, later moved to the University of Utah. And, the Plato Project, University of Illinois, Champaign. Over 140 natural language dialogue programs were created between 1970 and 1978. UCI's California Physics Computer Development Project conducted approximately 55 educational programs and spearheaded development throughout the UC system. Initial projects were conducted on Teletype Type33, paper tape punch machine that operated at a 110 baud rate. In 1976, NSF cited Richard L. Ballard, then co-director of the Physics Computer Development Project at Irvine, for the "first application of artificial intelligence to conceptual learning" (Expert Systems)."

In 1984, Doug Lenat started the Cyc Project with the objective of codify, in machine-usable form, millions of pieces of knowledge that comprise human common sense. CycL presented a proprietary knowledge representation schema that utilized first-order relationships.

This effort in 1989, permitted the Swiss High-Energy Particle Physics lab (SGML), Conseil Européenne pour la Recherche Nucleaire (CERN), to develop an instance of the Cyc project for their needs. This set into motion the work of Tim Berners-Lee, of CERN at the time, who conceived and developed anchor tags, also called hyperlinks, to link text and documents.

Thereafter in August 1991, Tim Berners-Lee released to the world in the Usenet forum alt.hypertext. This development ultimately enabled the decentralized publishing and exchange of information through the use of HTML (HTML) and its accompanying transmission protocol (HTTP).

In the same year Richard L. Ballard compiled a new technology called Mark 2, a system that utilized a single relational database table with four fields designed to store unique identifier codes that were pre-defined, stored, then later looked-up to populate the value-model/object/attribute/value table fields. Ballard called his value-based knowledge system "theory-based semantics," signaling a divide between language-based and theory-based knowledge representation systems. Mark 2 was deployed through "statements of work" with projects for the DoD, U.S. Navy, U.S. Air Force.

[edit] An Emerging Knowledge Science

In October 1994, the World Wide Web Consortium (W3C) was founded at the Massachusetts Institute of Technology, Laboratory for Computer Science (MIT/LCS) in collaboration with CERN, where the World Wide Web originated. This effort received support from the Defense Advanced Research Projects Agency (DARPA) and the European Commission. This organization would serve as a focus point for standardizing mechanisms of information exchange over the Internet. Thereafter, in 1995, Richard L. Ballard released Mark 2, v2.0 that utilized the earlier theory-based semantic representation technology, but included an ad hoc, 2nd order "if this, then that" knowledge representation. Mark 2 v2.0 was deployed through "statements of work" with the DoD, U.S. Navy, NASA, DEA, FAA, Office of the White House and private enterprise.

Then, in 1995, DUBLIN CORE National Center for Supercomputing Applications and the OCLC (OCLC) held a joint workshop to discuss metadata semantics in Dublin, Ohio. At this event, called simply the "OCLC/NCSA Metadata Workshop", more than 50 people discussed how a core set of semantics for Web-based resources would be extremely useful for categorizing the Web for easier search and retrieval. The participants dubbed the result "Dublin Core metadata" based on the location of the workshop. Shortly thereafter, in 1995, HTML (HTML), inspired by SGML and incorporating hyperlinks, was first standardized. This language allowed the transmission and viewing of web pages and lead to an explosion in the popularity of the Internet.

Advancing previous efforts, in 1998, XML (XML) was introduced, authored predominantly by Tim Bray and C.M. Sperberg-McQueen. Its goal was to combine the approachable simplicity of HTML with the extensibility of SGML while avoiding the shortcomings of each. Its popularity sparked widespread interest in tagging text in an effort to advance the goal of achieving machine-based knowledge representation.

At this same time, conducting parallel research and advanced engineering on the Mark 2 product, Richard L. Ballard completed the last of 50 projects called the Avionics Prototype Tool. That project demonstrated that theory-based knowledge representation would support virtual integation of disparate database systems by modeling theory, the conditional reasoning power of the human brain, to understand and make-sense of the data content stored within database systems. This project, in turn, demonstrated that pre-defined schemas, language-based conventions, and a reliance on self-consistent logic were the cause of conventional system complexity and interoperability. In response to these findings, Ballard redirected his work to the development of a revolutionary technology called Mark 3, theory-based semantics. Mark 3 would demonstrate that knowledge could be explicitly represented within the machine environment, and that software-based machines could reason with this content to answer value-based questions the way people do.

[edit] Language-based Knowledge Representation

Language-based knowledge representation was advanced in 1999 when Resource Description Framework (RDF), an XML-based extension of an earlier 1996 PICS technology, was deployed to enhance content description. RDF drew upon submissions by Microsoft, Netscape and the Dublin Core/Warwick Framework. RDF is used primarily to organize and express document properties. The specific needs of different resource types, such as authorization structures or versioning, necessitated a schema language similar to XML DTD, called the RDF Schema (RDF-S) language.

On January 12, 1999, Executive Order 13111 called SCORM (Sharable Content Object Reference Model), was signed, tasking the United States Department of Defense to take the lead in working with other federal agencies and the private sector to develop common specifications and standards for technology-based learning. SCORM was developed as a way to integrate and connect the work of these organizations in support of the DoD's Advanced Distributed Learning (ADL) Initiative.

In 2001, a new markup language was specified called DAML+OIL DARPA Agent Markup Language. This new specification was based on RDF, XML and SGML. The Ontology Interface Layer, or Ontology Interchange Language (OIL) component. This new specification provided a means for description logic to be integrated into the programming language for the purpose of extracting meaning from the ontological framework.

Within two years, Web Ontology Language (OWL) emerged for defining and instantiating Web ontologies. OWL was designed for use by applications that need to process the content of information instead of just presenting information to humans. It facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has since evolved into OWL-S and other formats.

Currently languaged-based researchers are working on a numerical reprentation using integers to define a language-independent structured knowledge exchange.

[edit] Theory-based Knowledge Representation

Building on previous research and development, Richard L. Ballard continued the development and articulation of a Physical Theory of Knowledge and Computation delivered as a keynote presentation at the September 8-12, 2003 Seybold conference entitled: "The Future of Publishing Technology - Part 1". Within this presentation, Ballard defined the bit capacity of human thought, how theory and information produce situation awareness, and how software-based machines can faithfully represent every form of knowledge and reason with that knowledge the way people do.

Theory-based semantics is the phrase used by Richard L. Ballard to describe knowledge representations that are based on the premise that the binding element of human thought is "theory," and that theory constrains the meaning of concepts, ideas and thought patterns according to their associative relationships. For this reason, knowledge stores of theory-based semantic representations do not just represent meaning, they precisely embody the very knowledge they are intended to represent - they KNOW. Ballard's Knowledge Science says that knowledge (Knowledge = Theory + Information), is any input of theory or facts that reduces question uncertainty. From this perspective, theory represents 85% or more of knowledge with information (data, facts of situations and circumstances), representing 15% or less. The built-in, a`priori intelligence of theory also defines concept values and purpose, which in turn, determines each concepts influence on every other concept, idea or thought pattern with which it is associated. Theory-based semantics holds that this state of intelligence is valid whether a concept is held in the mind, or is represented within the machine environment. Once learned by people or machines, theory endures with great tenacity, changing only when new paradigms of thought subsumes or replaces the well-justified theories that we use to understand our world. The endurance of data and facts, however, are quite different. They are in a constant state of flux as situations and circumstances dynamically change one moment to the next. From the standpoint of theory-based semantics, most appearances of change are not new - they are only new to us. The facts may be different, but most often, the theory that defines situations and circumstances remains the same.

""Foundations in Theory-based Representations"" Biological science and philosophies such as epistemology, offer unique viewpoints on how we "know," but the definition of knowledge and exactly how we know is an on-going debate among many in the academic and scientific communities. In the case of theory-based semantics, the most influential foundations of understanding come from Carl Sagan and his landmark production called Cosmos: A Personal Voyage. In Episode 11: "The Persistence of Memory", Sagan presents the argument that once the requirements for evolution exceeded the capacity of DNA, nature emerged with the miracle of the brain. Sagan argues that what makes the brain unique is its capacity to store knowledge and to remember with a capacity many magnitudes greater than DNA. Sagan presented neurological explanations of how the brain works, bridging the brain's memory function to the conclusion that repeated patterns of thought, such as those produced by theory (the conditional reasoning power that we learn from enculturation, education, life experience and deep analytical thought), work hand-in-hand. Theory-based semantics was born out of these teachings and originally formulated by Ballard between 1987 and 1993 while at UC, Irvine working on natural language systems, and later, while developing complex decision support systems for the U.S. Government. Ballard reasoned that the brain and its methodology for storing and remembering content was the ideal model for software systems that KNOW.

[edit] KNOWLEDGE = THEORY + INFORMATION

Knowledge and Ballard's descriptive formula "Knowledge = Theory + Information," is a core principle underlying theory-based semantic technologies.

The knowledge formula asserts that knowledge is an expression of knowingness that results through the interaction between theory and information. Theory and information are dependent upon each other before an expression of knowledge is achieved. For this reason, theory and information are essential to theory-based semantic representations, and a machine's capacity to KNOW and reason like people. Knowledge can be faithfully represented from documents, drawings, illustrations, forms, spreadsheets, books, contracts, policies and procedures, reference sources and from the very minds of employees, consultants and subject experts. This is achieved because every metaphysical, physical and dataform representation can be faithfully modeled using theory-based semantics. The following breakdown helps to better illustrate the point.

THEORY represents 85% or more of knowledge. It is the thought element that gives meaning to concepts, ideas and thought patterns, by justifying the relationships and meanings of the facts of situations and circumstances. Theory answers our "how," "why" and "what-if" questions (whether they are conscious or unconscious). Circumstances and situations may change rapidly, but the underlying theory that gives them meaning, do not. Theory is “A’ Priori” (known before the fact). It is learned through enculturation, education, life experience and deep analytical thought. It is the theories in our brain that shapes our behavior and the way we assess our world. Well-justified theories such as those validated most successful by science, engineering, business, law and so forth, are most valuable. Theory considers all possibilities regardless of a given situation or circumstance. For this reason, theory is universal. Learn once, use forever.

INFORMATION represents 15% or less of knowledge. Information is the instances of fact that exist in time and space that can be processed by the senses, measured and counted. It is the "who," "what," "when," "where" and "how much" facts of situations and circumstances. Information is “A’ Posteriori”, known after the fact. The data and objects commonly captured and transported through conventional data systems is information, not knowledge, though they may contain knowledge. People are required to use the theory in their brains to make sense of the data in conventional data systems.

[edit] Enduring Value of Knowledge

Theories endure for decades, centuries and for thousands of years. Most of the theories in our brains that shape our understanding of basic social relationships were conceived 20,000 to 40,000 years ago, passed down through the generations. Most of our core financial theories, such as "buy low, sell high," or "the principles of simple interest" and "compound interest," were conceived and put into use by our ancestors millenniums ago. New theories are constantly being conceived, such as social trends and fads, but only those that are most practical to society endure.