Thursday, March 31, 2005

Unicode: now, later, or beyond?

I've known about Unicode and internationalization issues for many years, but still haven't settled my mind on it. After reading this interesting post on the topic, the question returns to haunt me: Is it finally time to bite the bullet and adopt full-bore 16-bit Unicode, or can we continue to defer the issue? Now, the question that occurs to me is deeper: Is there something significantly beyond Unicode that we should be considering?

Personally, I think the idea of fixed-character strings is an archaic artifact of our computational "early years" and it's time to move on. Take a look at HTML or XML or SGML They have the concept of a "character entity", where an extended character code or even a name for a character can be encoded. I don't want to suggest that we use XML as our new character representation format, but at least it's worth considering, and it's already there as a high-level external representation format.

Oddly, people are still concerned about the storage space and performance of 16-bit characters. Geez, get over it already.

Think about it: 8-bit character codes, they fit in a "byte"... how quaint and useless for computing in the 21st Century.

-- Jack Krupansky

Multiple levels for network personal identity

I remain convinced of the need for a whole new regime of Network Personal Identity (NPI). Not only does an NPI exist as distinct from your real-world identity, but there are different categories of need, different levels of disclosure, and different roles that users play when engaing in network-based activities. In other words each individual user may have a significant number of network identities.

I'll summarize some of the more interesting levels of network personal identity, without intending to be comprehensive at this time:
  1. Full network identity. This would include the maximal amount of personal identification information that you would ever disclose to a network-based application. You in fact may have multiple of these identities since you may only reluctantly disclose some information to one application because it is required, but not want to disclose it to some other application for which it is not abaolutely required. You may opt to have a master identity profile and some specialized sub-profiles for specific uses. Some of the information could be entered manually, but some would be required to be entered by a trusted "identity service provider". You can fake a lot of info, but not your full legal name, address of residence, birth date, SSN, and a few other pieces of info. But you would be able to select your nick name, preferred mailing address, other preferences. None of the info in the full network identity would be available to any network app, unless you explicitly offer it. The info would be kept at one or more network identity servers that the user selects. No disclosure would be possible to any network app except to the degree the user authorizes. Techniques such as email confirmation would be used to assure that disclosure is authorized.
  2. Credit identity. This would have the minimal information to transact a financial transaction which reuires a credit card number. May also include address and contact information needed for a transaction, but only to the degree that the user opts in that information. Your may have adistinct credit identity for each account or even each application to tailor the disclosure.
  3. Real name identity. For applications which require your real name. Would typically also include your city and state of residence. This is essentially what you would provide for a letter to the editor of a newspaper. Might be typically used for an initial employment application. In general, this would have very little if any additional personal information.
  4. Selected disclosure. Based on specific application requirements, the use may opt to specifiy as much detail as desired. As a stypical use, an application might offer a list of information items that it requires or optional requests and the user can decide whether to abort or opt-in from their full network identity.
  5. Unique identity. May be a concacted user "id" or other pseudo name. This is the level of identity needed for most net applications that do not involve a business transaction.
  6. One-time identity. Typical use might be to make an inquiry for which you wish a return answer, but also want the security of knowing that your identity might be "kept on file". The application would merely query a common identity server for validation of the id, and any additional info that the user has opted to disclose.
I envision that disclosure of personal information is a multi step process:
  1. User initiates contact with application.
  2. Application discloses a contact id.
  3. User contacts their chosen identity server with the app contact id.
  4. User selects level of disclosure for this contact.
  5. User passes a contact id from the identity server to the application.
  6. The application passes the contact id back to the identity server along with its own contact id.
  7. The identity server verifies that the contact ws authorized, possibly with an email confirmation for some transactions, and then passes the authorized information (securely encrypted) back to the application.
None of those steps would require more than a single click by the user.

Again, this is not yet a full architecture for network personal identity, but simply another increment of detail along a path that will evolve over time.

-- Jack Krupansky

Wednesday, March 30, 2005

Network Personal Identity

All of the discussion of identity theft and spam and spoofing, coupled with my interest in what I've called a Distributed Virtual Personal Computer (DVPC), lead me to believe that there really is a real need for a Network Personal Identity (NPI). Some people will of course still insist on anonymity, but for people who really do want to store data and access personalized services in a networked environment, a Network Personal Identity is an absolute requirement.

There are lots of ideas and techniques and gimmicks floating around, but maybe it's simply time to step back and try to take a big picture view of the requirements.

Requirement number zero is that we absolutely do not wish to be laying the groundwork for "big brother" or "big business" to further exploit "the common man". The goal is to lower barriers to pursuing freedom and access, not enable control of larger organizational entities. There probably does need to be some form of authorized law enforcement access, but the mehcanism should be sufficient opaque and cumbersome so that it cannot be easily abused.

For starters, I'd say that there is no reason why a Network Personal Identity (NPI) would have to be directly linked to a person's real-world identity. Each of us should be able to exercise our own choice of how much to selectively disclose about our identity. In fact, we each may want to have any number of distinct NPIs to meet different needs and to further enhance our sense of security.

I'm not sure if biometrics are "the answer" or just one ingredient, or even of any value, but it is an area to be considered.

Certainly a NPI should not be stored on a centralized server, and certainly not on any service vendor's server. Rather, there should be a new class of service, the identity service, of which there can be many providers, and which users and vendors of services could mutually validate. In many cases, it shouldn't even be necessary for a vendor or service provider to know anything about your personal details, other than to verify aspects that are relevant to the service being rendered. A user, at their discretion, might opt to disclose details to provide more personalized service, but disclosure should never be a requirement.

I'm not thinking that a NPI would be a fixed identifier (with password), but more of a dynamic id, possibly using an electronic equivalent to a "one-time pad" with each id good for only one transaction and requiring that the software at the user location request the one-time id from a selected identity server, and then a target service would do a one-time validation through the same cluster of identity servers, and then that id would forever be invalid. This is just one idea. I'm sure there are others that may be far superior. My goal is simply to set a minimum threshold for quality and security.

Plenty more thinking is required. This is just a starting point, but having the right jumping off point can make or break any project.

My DVPC concept does in fact need a robust network identity for authentication and validation of access. I suspect that what's good enough for protecting my data on multiple remote DVPC data servers might be good enough for a lot of people and services.

-- Jack Krupansky

Sunday, March 27, 2005

Automated transparent scalability

It's amazing how bad many software system designs are. It's as if nobody had even imagined that scalability wouldbe important. The problem facing us is that people do recognize that sclability is more important now, but the approach is almost always the same: throw more bodies at the problem, rather than to pursue more modern approaches to design of the software itself.

Sure, companies like Google and Amazon have the big bucks to throw at the problem and burn through prodigious numbers of talented software professionals, all in a struggle to compensate for the fact that we're using really bad software system designs.

The key technology needed for future software system design is that scalability needs to be done transparently and it needs to be be automatically, at the system software level. If a developer messes up, we need to have software in place that will detect such mistakes.

The solution to scalability is not more hardware, more money, or more people, but simply better software tools and more enlightened software design. The measure of a good developer is not how many lines of code they write or how complex their code is, but how simply it goes about solving the problem at hand. A better software infrastructure architecture will eliminate much of the burden that is currently misguidedly placed on the shoulders of application developers.

We need to shift more of the current application burden back where it belongs, on better system software.

One of the keys to enhancing application performance is to exploit the parallel and concurrent processing that is inherent in networks. Applications can sort of do so via brute force techniques, but that makes the applications more unwieldly, fragile, and less easy to update. Only by shifting the burden back onto system software, where it belongs, can we ever hope to dramatically reduce the cost to design, develop, deploy, and maintain sophisticated distributed computing applications.

In short, scalability is of paramount importance, but it needs to be done transparently and automatically, and not dumped in the laps of applications developers and systems administrators.

-- Jack Krupansky

Thursday, March 24, 2005

Structured blogging and real-time blogging

Consistent with some of my interests (including software agents), I now see that people are starting to talk about "structured blogging" as well as real-time alerts. All of this is only going to stress our polling-based internet architecture much further.

The real point here is that all sorts of real-world applications become much more interesting when you start getting serious about structure, hard-core information flow, and time-sensitive applications.

-- Jack Krupansky

New blog on blogging

Since blogging is now such a big part of what's going on right now, I've decided to focus most of my blogging-specific comments to their own blog, "Jack Krupansky on Blogging", including my experiences with blogging software and services, current problems, etc.

That said, hard-core technology-related aspects of blogging will continue to appear in this blog.

-- Jack Krupansky

More on how to move beyond polling for information distribution

In response to some reader comments, I'd like to elaborate a little on my thoughts for how to move beyond polling. I'm still not prepared to offer my full model for an information distribution architecture, but a few comments are in order. What follows is my response to the reader...

I actually didn't mention any specific interrupt method, and in fact I clearly noted that the internet has a host of problems that would need their own solutions. I neglected to mention spam, but it certainly is in the "etc." that I mentioned.

To be clear, interrupts are NOT analogous to email. After all, email reception is a POLLED transport mechanism. I would also note that email itself is a good candidate for my approach to information distribution.

Spam wouldn't be a problem with the kind of approaches that I can envision. Of course, spam is always in the eye of the beholder.

The main reason spam wouldn't be an issue is that any TRUE information distribution architecture would have a wide range of controls so that random idiots couldn't simply spoof real people. There are lots of techniques to address these issues that real people are already using.

It's kind of silly to say that "Polling an RSS feed allows us to 'turn off' the channel ourselves", since a true information distribution architecture would certainly allow you to do exactly that, and to do it in a way that doesn't have the negative consequences of polling. Polling is a particularly bad choice for turning off an information channel. It's far better to directly inform your channel connection of your intentions. "Off" is only one choice. A wide range of filtering options, including delivery at specified times of your choice, would be far better.

You said that "we don't even get the chance to 'opt in' for unsolicited email", but I think there is software that does do precisely that. I've sent unsolicited email to a few people and then had to go through a verification negotiation mechanism before my mail would be let through. It wasn't a big burden on me and it was a big benefit for them. The point should be that we all should have those options without having to acquire, install, configure, and manage our own mail server software. In any case, this isn't an argument against a non-polled information distribution mechanism.

You wrote "polling isn't efficient, but it's a great alternative to being interrupted by anyone with a computer", but that isn't even true. As you've noted, spam and other email abuses are a real problem, but email itself is based on polling (timed polling unless you turn the timing feature off).

Although I haven't elaborated my proposed mechanism yet (it has a number of layers and distributed controls), it does have some features in common with a hardware interrupt, but many differences as well due to the special nature of the internet.

I'm not using the term "interrupt" as in "being interrupted by anyone", but more in the hardware (and system software) sense of being able to enable an "interrupt" for *CHOSEN* information sources and events. I would note that hardware interrupts (and system software event triggers as well) have various "masking" features to give the "user" the control they need to accomplish their tasks to meet their goals.

The hardware concept of "interrupt level" and the system software concept of "queuing" are also applicable and mesh nicely with the way real people switch hats, modes, gears, etc. to vary the flow of information that they want to see presented to them.

You might assign a "noise level" to each feed you subscribe to, and then globally set your current "noise threshold" so that various feeds would automatically be suppressed as being too noisy for your current mode, but then automatically re-enabled when you decide to open the spigot for how much information you're interesting in receiving at the moment. This concept could be implemented via polling as well, but as the number of "channels" I have a potential interest in rises exponentially, polling becomes even more problematic.

Also, with the concept of Google Desktop Search, you'd actually LIKE to have fairly significant flows of information from chosen sources that gets cataloged by Google without you even having to see it rush by. Or you might want to catalog it all, but set up a filter for which items actually should pop up in email. All of that information would currently have to be polled.

Just to be clear... I'm not advocating an *identical* mechanism to hardware interrupts, but simply using hardware as an analogy.

There's no reason why real people should have to suffer as a result of really bad information architecture decisions, but today we're all forced to suffer. I find the lame excuses of lazy, misguided, and arrogant infrastructure architects to be quite offensive. The crazy thing is that we're currently at their mercy because so much of the internet infrastructure is fairly centralized (in a relatively small number of individuals). Someday, the internet software architectures really will be sufficiently decentralized and users really will have interesting choices that they can make.

In any case, thanks for the comments and for giving me a reason to elaborate at least a little more of my thinking.

-- Jack Krupansky

Refresh considered harmful

Along the same lines as my recent post entitled "Polling and pinging considered harmful", I'd like to add the ubiquitous "Refresh" function to the list of "bad system design" features. The only real excuse for having a user-visible "Refresh" (or even timed-refresh) feature is the lack of a decent underlying information distribution architecture.

There's no need to clog the network will excessive "refresh" traffic. And there is certainly no excuse for having the poor dumb user manually poll information sources. There's no good reason for having unchanged data flow all the way across the net on each refresh request. Many of those refresh requests are simply because the dumb user can't be sure whether the visiable information is up to date.

A solution to the "polling and pinging" problem for feed files (such as RSS) would enable the elimination of the user-visible refresh function as well.

-- Jack Krupansky

Polling and pinging considered harmful

As I've gotten deeper into the whole "blog thing", it bothers me how primitive the web architecture and infrastructure really are. After all, hardware designers discovered eons ago that polling was a truly lousy way of doing I/O, especially in a high traffic environment. So, why are users of the web and blogs, especially bloggers and people hoping to read blogs in a timely manner reduced to tedious and inefficient stone-age manual "pinging" and "polling" to announce and detect the availability of new information? It makes no sense to me. Sure, I know why mediocre software designers would resort to such a least-common denominator solution and that they feel that it's "okay" to push work off onto the dumb users, but it hardly seems fit for networked life in the 21st century. We really do deserve a 21st century approach to information propagation.

Hardware designers came up with the concept of an "interrupt" and high-performance computing exploits it nicely. The web needs a similar capability. Sure, there are special problems that need to be addressed as well, such as the nature of distributed computing, hackers, denial-of-service attacks, rogue publishers, peak demand, load balancing, etc., but there is no end of ideas and talent for attacking such issues.

There is no excuse for the propagation of information on the internet to be far less efficient than for old-fashioned radio, television, telegraph, and telephone.

I do have some specific architecture ideas in mind, but I want to polish them some more.

-- Jack Krupansky

Tuesday, March 22, 2005

How Is Open Source Special?

Mitch Kapor of the Open Source Applications Foundation has written an article entitled "How Is Open Source Special?" in which he espouses his rationale for believing that Open Source is a fundamentally different approach to developing computer software and that the academic world needs to give it special attention. He concludes:
I believe the leverage of open source, being fundamentally a more efficient as well as democratic way of developing software, can offer great advantages. But the academic world needs to get more involved in open source, to get more familiar with its mechanisms—how it works and how it doesn’t work. I think this kind of research will benefit not only the academic world but open source in general.
I (Jack Krupansky) personally believe that Open Source has a very real role in the business of developing computer software, but I also believe that it's a means, not an end. It bothers me a little when someone uses language such as "will benefit not only the academic world but open source in general" since I see no reason to build artificial fences that try to separate "Open Source" from both commercial software and other forms of "free" software (e.g., public domain). I see no need for such a special "brand" of software.

-- Jack Krupansky

Wednesday, March 16, 2005

Ecologies of innovation

I just saw a reference by Mitch Kapor to "ecologies of innovation", whatever that really is.

Maybe it's a reference to an "ecosystem of innovation" or an "ecosystem for innovation".

Whatever it is, I suspect that we all need to get more of it.

In any case, from my perspective the question is what technologies and tools are needed to support such ecologies or ecosystems. My belief is that intelligent software agent technology will fit at least part of the bill.

-- Jack Krupansky

Tuesday, March 15, 2005

Mitch Kapor on whether Groove should have gone open source

Mitch Kapor, of Lotus fame and now the Open Source Applications Foundation, has some interesting comments on the question "Should Groove have gone Open Source?". His current schtick is promoting open source applications, but here he is saying as his bottom line (literally) that "Going the open source route ought to be considered but it is not always really viable given the resources at hand." He notes that it is much easier to go open source if you start out with that model in mind rather than if you start with a proprietary model and then have to jump through hoops to switch to the open source community model. It's an interesting piece.

Mitch's discussion of the acquistion of Groove Networks by Microsoft is also quite illuminating.

Sunday, March 13, 2005

Position paper on Open Source

I've posted a mini position paper on "open source" software on our main web site.

It's not designed to be a definitive white paper, but I felt I should add my voice to the public debate.

I'm not opposed to so-called "open source" software and I believe that it does have some value, but I find the rhetoric and intentions and motives of some proponents to quite dubious.

Is it a metaphor or a theory?

I was thinking of something and wanted to label it as a "metaphor", but something didn't seem quite right. So, I checked the dictionary entry for "metaphor", and didn't find what I had expected. I had thought that a metaphor was synonymous with a model or a paradigm, but in fact it's simply a "figure of speech", a form of analogy.

A "window" is a metaphor on a computer not because it's a model or approach, but simply because it is analogous to the function of a window in the real-world. Ditto for folders and files. Strictly speaking, a "database" is not a metaphor (on a computer) since it does not have any obvious "figure of speech" analog in the real world.

There is a second meaning in the dictinary, which is that a metaphor can be a symbol that represents an object, activity, or idea. But that's still quite a distance from model or paradigm. In fact, I'm not interested in a mere symbol or name at all, but wanting an umbrella term that technically and descriptively covers the totality of the concept that I'm thinking of. Maybe I simply need to bite the bullet (which is a metaphor) and call my conceptual idea a "theory".

On the other hand, maybe the dictionary is still a few years behind and the general parlance of a metaphor as a model or paradigm will eventually get swept up by the dragnet of the dictionary editors. It would seem "safe" to use metaphor as a synonym for model or pattern or paradigm, but I'm not fan or "re-purposing" normal English words that already have a rather clear meaning.

Elegance versus ugly theories

I'm intrigued by the concept of elegance as it applies to technology and science, in addition to the arts, and life in general. At least in computer science and computer software, approaches that can be labeled "elegant" are frequently dramatically superior to "ugly" solutions. Not always, but quite frequently. The same appears to be true in science. Truly robust theories seem not only to explain a phenomenon, but to do so in an extremely elegant manner, with a sense of beauty.

So, the question is whether this is all simply an illusion and an example of selective thinking, or is there something going on here.

Simple question: does anyone know of any well-settled and extremely robust theories that are truly "ugly" to contemplate?

I suspect that it's also possible that some theories are still too incomplete to have streamlined themselves to the point of elegance. Theorists keep adding warts to explain away anomalies until they finally get to the point where they "see the light" and can then replace all the warts with an elegant theoretical model.

I have a handout from Sir Roger Penrose's recent talk here at the Boulder Book Store in Boulder, Colorado that has some diagrams from his new book, The Road to Reality : A Complete Guide to the Laws of the Universe. The diagram for a string theory D-brane is truly ugly. Penrose insists that you don't need any higher-order dimensions. In contrast, his diagram of a twistor (two nested toruses) is extremely elegant.

As an aside, here's a link to a conference on "Twistor String Theory", where the goal is to do what string theory is trying to do, but using twistors rather than higher-order dimensions.

And take a look at the PDF for his hand-drawn Twistor Theory lecture notes, which includes a variation of the nested torus diagram from his book.

Just to be clear, I'm not suggesting that appearance of elegance makes a theory or conjecture correct or that ugliness necessarily makes it incorrect, but simply that there does seem to be a somewhat positive correlation between elegance and truth, even if there is no demonstrable causality.

File all of this under the aesthetics category of philosophy.

Saturday, March 12, 2005

Concurrent programming with Occam

I've been thinking about concurrent programming for some time, especially as it applies to software agents, but it has just been brought to my attention that the Occam programming language is in fact still around.

Occam is a (concurrent) parallel processing language designed by a team at INMOS in conjunction with the design of the Transputer parallel computing microprocessor, and based on T. Hoare's ideas of Communicating Sequential Processes (CSP).

The Transterpreter project started as an effort to get the Occam runtime (which includes a Transputer bytecode interpreter) running on the LEGO Mindstorms, but has now taken on a life of its own and is now available (or at least could become available) on a number of other computing platforms, including robots.

The whole point of a concurrent programming language is to be able to have a single computer program in which multiple tasks can be simultaneously running and interacting to pursue common goals.

Wednesday, March 02, 2005

Display strips

One of the things I'd like to see is small, very cheap "display strips", which would be simple, battery-operated, wireless displays that you could place around your work (or home) environment and treat as adjunct displays for your main PC desktop.

In other words, you might start some simple apps and drag each of them to a particular "display strip" where they would display information of value to that particular location.

For example, you might put one near your door which would display reminders relevant to when you are walking out the door. Or, have a couple out on your desk that always display specialized information regardless of what you happen to be working on at the time on your main display.

You might even place a few of these around the edge of your monitor, much like the ubiquitous PostIt stickers.

You might even have a "tree" of these displays, each mounted on a flexible branch so that you can arrange them in three dimensions to prioritize them.

I supposed you could also call these "PostIt Displays", but somebody has a trademark there.