Sunday, February 24, 2008

Things to know about the Semantic Web

For future reference, Bernard Lun on the ReadWriteWeb blog has a post entitled "11 Things To Know About Semantic Web" that makes a number of key points about the current state of affairs with the Semantic Web. One that stood out as far as potential for me personally was:

3. If you have a firm grasp of the theoretical underpinnings of the semantic web, things like RDF, tuples, Sparql and OWL that make my brain hurt, you will be able to charge a fat premium in consulting fees for a while, as not many people really understand this stuff. But make hay while the sun shines, as some entrepreneur will surely figure out how to abstract this stuff and make it accessible for the masses.

Yes, that is what I aim to do: "make it accessible for the masses." I want to be that entrepreneur, or at least one of them.

There is a lot of great research work going on with the Semantic Web, and some initial industrial and commercial uses (e.g., RDF for blog web feeds), but most of the true power of the Semantic Web is still very far from being ready for general consumption by "the masses."

Consumers are the ultimate audience that I am really after, with software agents mediating the interface between consumers and Semantic Web data.

I will have more to say about this once I start my new blog dedicated to Semantic Web technology.

-- Jack Krupansky

Monday, December 24, 2007

Space, the final frontier?

We are all familiar with the intro to the old Star Trek show, "Space, the final frontier", but in the virtual universe of the online "world" what is the nature of "space" and is it really a frontier?

Most people would agree that linear distance is completely irrelevant in the online world, where computer systems thousands of miles apart might as well be in the next room and a click could take you to data less than an inch away or a world away. An exception is that once we start communicating outside of the physical earth (e.g., Mars or deep space probes), latency becomes a very real issue.

Density of "space" (in terms of computing nodes or locations of files) is similarly completely irrelevant in the online world.

Space in terms of quantity of bits and bytes and data fields and database records is also completely irrelevant in the online world, with the exceptions of 1) occasional lack of local storage space due to artificial "quotas", and 2) latency and access time.

The next form of space is page layout. In print, writers have very hard and fixed boundaries for the amount of text and graphics that can be included in their stories. Getting an extra inch or page requires mighty effort. The Web page has no such limits. As such, space on web pages is effectively infinite and not a frontier at all.

But, there is another form of space online, screen size. The client device, typically a PC, does in fact have a relatively limited amount of space available. Sure, you can scroll and page through your large web pages, but there is a usability factor at work as well. Most "readers" do not read sequentially at all, but scan and bounce around. Their attention span for "viewing" a web page is limited, so asking them to scroll and page and click to get to the rest of the content is frequently too much to ask. The average reader has an unlimited number of content sources and will migrate to wherever screen size limitations are most respected.

Blogs and RSS readers introduce another layer  of space constraint. Sure, you can still page and link to get to unlimited amounts of space, but there is a clear premium value given to terse and concise blog posts that convey the essential meaning of a post in a single "view" in a small subset of the total screen space without demanding extra effort on the part of the user.

Finally, there is an even more intense constraint, or frontier if you will, imposed by accessing online content on a handheld mobile device such as a smartphone. Sure, you can certainly zoom and scroll and page and link to access an infinite amount of content, there is a clear premium value given to content providers who can format and express essential meaning in small-screen chunks.

So, in some sense the online world frees us of the limits and frontiers of three-dimensional and print space, but our access devices and human perceptual limitations give us new frontiers to tackle. We can look forward to a wealth of innovation in how to express, chunk, format, view, and navigate within online content in the years to come. Even the vaunted iPhone only scratches the surface. Even Google has not yet mastered the small screen.

Given the ease with which we can construct large computer networks with vast amounts of data storage and the vast, unlimited expanse of the Web, it certainly does feel as if the small screen of handheld devices is in fact a true frontier where opportunity is unlimited and existing solutions are quite limited.

-- Jack Krupansky

Sunday, October 07, 2007

How safe is your personal data in the hands of Web-based vendors?

An article in The New York Times by Denise Caruso entitled "Securing Very Important Data: Your Own" illustrates the benefits and downsides of sharing personal data on the Web:

This type of sensitive, sometimes proprietary information was once locked up on hard drives or in file cabinets far away from anything resembling a global or even a local distribution network. Yet none of the users flocking to these services seem perturbed that they have relinquished personal control over this data to companies that, even with the best of intentions, may not be able to keep it safe.

The incidence of data theft -- from wallets to data breaches, computer viruses or Dumpster diving -- is soaring. This year alone, the security of nearly 77 million Americans' records has been breached, according to the Identity Theft Resource Center in San Diego, nearly a fourfold increase over 2006.

Governments around the world are passing and enforcing laws that increasingly hold businesses financially accountable for avoidable data losses. Just last month, the TJX Companies, which owns T.J. Maxx, Marshalls and other retail stores, made a settlement offer, subject to court approval, to victims of a huge data breach, in which 45.7 million customers' credit- and debit-card data was exposed to identity thieves.

As a result, some security experts are starting to ask whether the "identity data-for-services" business model, which is the engine for virtually all e-commerce companies, is a fair trade -- not just for consumers, but for business as well.

In response, they are coming up with new protocols and frameworks for collecting, using and governing identity data. Given that virtually all businesses today collect and use these kinds of data, they aim to shift the status quo in ways that could help companies both improve their reputations with customers and avoid the mounting legal liabilities that now face companies that lose control of customer data.

"The myth is that companies have to know all this information about you in order to do business with you," said Drummond Reed, vice president for infrastructure at Parity Communications, an identity technology company in Needham, Mass. "But from a liability perspective, the less I know about my customers the better."

Parity is sponsoring a number of open software projects to shift more control to the users whose identity data is at risk. One of the most intriguing is called the CloudTripper Project, which is developing a way for individuals to "take their data with them" as they traverse the Web, just as they keep their wallets and checkbooks with them as they move around in the real world.

My own solution is to propose a research effort for something I refer to as The Consumer-Centric Knowledge Web. Cobbling together an ad-hoc approach in a piecemeal fashion is likely to cause more harm than good. OTOH, the more ad-hoc efforts that go forward and highlight the inherent problems in this area, the quicker people will warm up to the need for a hard-core research effort such as I have proposed.

-- Jack Krupansky

Sunday, May 27, 2007

One-way trip to Mars

I had an idea a couple of years ago and neglected to blog about it and now I read that somebody else has proposed the same idea: one-way trips for human colonists to Mars. Writing in response to John Brockman's 2007 Edge question "What are you optimistic about?", physicist Paul Davies of Arizona State University and author of The Cosmic Jackpot, writes in his essay response entitled "A One-Way Ticket To Mars" about the significant logistical benefits of sending supplies and people to Mars withot the burden of returning them to Earth.

There are a zillion interesting issues that crop up, but I find the concept quite appealing and would consider it myself. It's an opportunity to be a true pioneer, a real colonist.

In truth, most of the early colonists and immigrants to America ame here knowing that going back was not an option.

It would be interesting to contemplate a Mars Colony simulator. It wouldn't be practical to directly simulate the lower gravity, but it should be quite practical to simulate the isolation, the atmosphere, the sights and views, the sounds, the terrain, the delayed communication with Earth.

As a starting point, do we have the necessary resolution to create a dome of hi-res display screens that would simulate the view from inside a dome on the surface of Mars as well as a "vehicle" (simulator) with displays for the windows viewing the landscape as the vehicle "moves". One issue is that current display technologies don't have the raw brightness to simulate the Sun. We do have the experience with Biosphere 2 for constructing a simulated environment to draw upon.

To be honest, merely being an astronaut isn't that exciting to me, but being a Mars Colonist is an entirely different matter.

-- Jack Krupansky

Sunday, May 20, 2007

Quantum information technology

One far out field that may have dramatic implications for computing and software design in the coming decades is the emerging research in quantum information technology.

For example, from the Multidisciplinary University Research Initiative (MURI) program of Massachusetts Institute of Technology (MIT) and Northwestern University (NU), we read that:

Quantum superposition and quantum entanglement are the bedrock on which new theoretical paradigms for information transmission, storage, and processing are being built. The preeminent obstacle to the development of  quantum information technology is the difficulty of transmitting quantum information over noisy and lossy quantum communication channels, recovering and refreshing the quantum information that is received, and then storing it in a reliable quantum memory.  

With support from the Multidisciplinary Research Program of the University Research Initiative (MURI), we have assembled a truly interdisciplinary team from researchers at MIT and Northwestern University to overcome this obstacle. The focus of our program is an architecture we have established for long-distance, high-fidelity qubit teleportation. Its key elements are:

  • ultrabright, narrowband sources of polarization-entangled photon pairs;
  • long-distance transmission of entangled photons over standard telecom fiber;
  • qubit storage and processing in trapped atom quantum memories.

Although some of these concepts may make more obvious sense down at the bit, byte, chip, and machine language levels, I suspect that the concepts may have even greater potential if they can be transplanted to the level of software, software components, and software agents.

Try to imagine what quantum information technology might mean at the level of the Semantic Web and Web Services.

Try to imagine a large number of swarms of software agents interacting via the exchange and sharing of quantum information and built upon the concept of quantum entanglement.

-- Jack Krupansky

Sunday, May 13, 2007

What are the kids up to?

I am in the middle of reading John Brockman's Edge question for 2007: What are you optimistic about?, and although it is all very interesting, it strikes me that almost all of these "visions" are rather dated and even somewhat stale, probably because these are the ideas that people of the "boomer" generation grew up with in the 1960's, 1970's, 1980's, and even most of the 1990's. Enough of the stuff already. What I really want to know is: What are the kids up to? Not in the sense of what toys do they play with and what tools do they work with, but what ground are they beginning to break and what visions of the future do they have that are their own creation and not something that was spoon-fed to them or rammed down their throats by a well-meaning but misguided elite.

By "kids", specifically I mean young people who:

  • Grew up with the Internet and the Web as their earliest significant computing experience, or at least since they were juniors in high school
  • Experienced 9/11 while in high school or freshmen in college, at a time when it had a chance to dramatically shape the way they started to view the geopolitical world
  • Just assume that global warming and climate change are "real" since the concepts were not "new" to them even when they were juniors or seniors in high school
  • Have been exposed to open source software in college
  • Have had a cell phone since high school and most of their classmates in high school had cell phones
  • Are no older than 25 (or maybe 26 or 27) and consider people who are 28 or 29 or 30 as already "too old" to "understand"
  • Are not deeply attracted to and attached to traditional politics and political parties such as the Republicans and the Democrats, and have their own politics and world view
  • Have been blogging since high school
  • Since high school have had teachers and professors who are challenging traditional views of economics, politics, and social structures

What I am interest in is:

  • What fields of intellectual study are they most attracted to?
  • What aspects of computing excite them the most?
  • Are they breaking any new ground, or simply "refashioning the wheel"?
  • What are examples of computing breakthroughs by the 20 to 25-year olds?
  • What are some hard-core examples of great leaps that kids have made compared to Ray Kurzweil, Dan Bricklin, Bill Gates, Steve Jobs, Steve Wozniak, Larry Ellison, Bill Joy, et al when they were of this same age (20-25)?

Is it really true that "change is accelerating"? If so, we should see a much larger list of breakthroughs than for those "old-timers."

I'd also like to see two lists: one for applications, but primarily one for underlying technological fundamentals. Applications like YouTube, Digg, and Facebook go on that first list, but what I am primarily interested in is what fundamental technology ground is being broken by "the kids"?

-- Jack Krupansky

Sunday, May 06, 2007

Four levels of language for semantics and knowledge

The open question of language level comes up when considering the open question of how to represent, access, manipulate, and otherwise use knowledge and meaning in the form of a distributed knowledge web or semantic web. I certainly do not have any  immediate answer, but I was thinking that rather than envisioning one unified "knowledge" language, possibly we need a multi-level language model:

  1. Low-level "assembly" language - work with meaning and knowledge at a very atomic level
  2. High-level language - a very expressive language that focuses on higher-level structured meaning, probably leveraging contextual meaning
  3. Scripting language - a concise, terse, convenient method for working with knowledge that emphasizes broad expressive power rather than specific detail and nuanced meaning
  4. Natural language - heavily dependent on context and very ambiguous, but very easy and natural to use. Appropriate for "display" of knowledge structures.

One issue is whether natural language is really a separate level or can be used at all three of the other levels.

Another issue is that we need to be able to work with meta knowledge, to treat packages of knowledge as black boxes and networks of interconnected black boxes to be manipulated in an abstract sense, separated from the actual, true meaning of the contents of those black boxes. Possibly this is a requirement at each of the four language levels.

Another issue is multiplicity, the number of distinct languages at each level. Obviously we have many natural lanuages. Multiple scripting and higher-level languages makes sense. Having a single, common, foundation "knowledge assembly language" has a lot of appeal, but is it really viable and is it clearly really advantageous relative to having a multiplicity of low level knowledge and meaning paradigms? I don't see any clear answer at this point in time.

Is the current Semantic Web at this assembly level or high-level or scripting level? I suspect that the answer is that it is not clear. Clearly there is some amount of scripting being done in XML. Clearly there is some amount of high-level semantics being done (e.g., RSS and web feeds). And clearly there is a fair amount of very low level semantic use. But is XML more of a lexical and syntactic language than a true semantic language? It seems to me that its only current power at the meaning level comes when the people and programs a priori agree on shared meaning representation conventions, and that is precisely what we would like to see inherently embodied within a true knowledge language at any of the levels that I have proposed. In short, it appears that there is very little in the way of meaning represented within the Semantic Web, and that the only real meaning is hard-coded into the user agents that communicate via the Semantic Web. In other words, at best, the Semantic Web is everything but semantics, and that the Semantic Web simply facilitates the exchange of information that user agents can interpret as meaning using hard-wired or agreed semantics.

-- Jack Krupansky