Tuesday, April 26, 2005

Symposium: Roles, an interdisciplinary perspective

Related to my interest in network personal identity, there is an interesting symposium coming up at the 2005 AAAI Fall Symposium Series entitled "Roles, an interdisciplinary perspective":

The notion of role is ubiquitous not only in many areas of artificial intelligence, but also in many other areas of computer science, like programming languages, software engineering, coordination and databases, multiagent systems, computational linguistics and conceptual modelling, and also in other scientific fields, like formal ontology, sociology, cognitive science, organizational science and linguistics.

In sociology, on the one hand roles are often described as expected behavior of entities or agents, on the other hand roles are seen also as presentations of selves. In organizational science roles encompass more formal aspects such as rights and duties. Three different main viewpoints characterize research on roles:

  • roles as named places in relationships (especially in linguistics, databases and conceptual modelling)
  • roles as dynamic classification of entities (especially in programming languages and databases)
  • roles as instances to be adjoined to the entities which play the role (especially in ontologies, multiagent systems and programming languages).

Undisputed distinguishing features of roles seem to be their dependence on some other entities and their dynamic character (Sowa 1984). These properties contrast roles with the notion of natural types. Natural type seems to be essential to an entity: if an entity changes its natural type, it loses its identity; in Guarino (1992)'s terms, roles lack the rigidity which natural types possess. Masolo et al. (2004) elaborate the relational nature of roles, highlighting their definitional dependence on other concepts.

Undisputed distinguishing features of roles seem to be their dependence on some other entities and their dynamic character (Sowa 1984). These properties contrast roles with the notion of natural types. Natural type seems to be essential to an entity: if an entity changes its natural type, it loses its identity; in Guarino (1992)'s terms, roles lack the rigidity which natural types possess. Masolo et al. (2004) elaborate the relational nature of roles, highlighting their definitional dependence on other concepts.

Discussions on roles are important not only to have a better understanding of theories using this notion, but also from the applicative point of view. E.g., integration of ontologies, programming languages, databases, simulation can benefit from the introduction of a well founded notion of role. 

The concept of roles is absolutely essential to the discussion of identity.  This symposium demonstrates the extent to which we still lack an adequate conceptual and theoretical foundation upon which to build a rock-solid computational infrastructure for identity mechanisms.

Monday, April 25, 2005

The Five Roles of Identity

I ran an interesting blog post entitled "The Five Roles of Identity":

At it's heart the problem of network identity is how to manage the model of the user available to web sites. User's dream of a design that's explicit, practical, and respects their privacy. Web sites covet different aspects of the user-model model. The fashion web site may desire to know the user's hair color. The travel web site may desire to know when your employeer is planning a summer shutdown. The bank site may desire to know a statement of account of your current mortgage.

The demand for better models of visitors is what drives the market for solutions in the identity market. For example it's what keeps DoubleClick in business. DoubleClick aggregates a statistical model of users from their browsing habits and then sell that to web sites. Web sites then use that to target their marketing. For all I know if you tell one of their clients your hair color then DoubleClick may well add that to their model.

I haven't read the essay with a fine-toothed comb yet, but it does seem relevant to my interests in Network Personal Identity. The five roles are:
  1. Users
  2. Sites
  3. Intermediaries
  4. Solution vendors
  5. Rule setters (law, government, standards bodies)
-- Jack Krupansky

Sunday, April 24, 2005

Inverting computing protection

Traditionally, the lower levels of a computing system have been considered to be the most trusted and the higher application layers the least trusted.  That's fine for traditional computing environments, but is actually wrong for more modern distributed computing environments.
In particular, applications now handle a lot of sensitive data which should not be compromised by problems at lower layers of software, whether those problems be bugs, viruses, human error, sabotage, or criminal behavior by those administering the systems.
And in the case of the internet, the Web, web services, et al, one system or user on a system wishes to communicate with an application process on another computer and would like to be assured that their sensitive data will not be compromised by problems at lower levels of software.
Traditionally, we've had various software and hardware protection mechanisms or security rings, etc., but those mechanisms were primarily based on the assumption that problems come from above, with no recognition that the lower levels were being granted access to data beyond their "need to know".
So, what we need now is an inverted protection mechanism that guarantees the security of data within a set of levels and permits the lower levels to merely "handle" data packets without actually being granted detailed access.
This is not an easy problem, but we're never going to see appropriate solutions until we can attack the core issues.

Taxonomy versus Ontology

As people begin to get deeper into knowledge management and semantic webs, they start to talk about taxonomies and ontologies, sometimes as if the two were synonyms.
Put simply, an ontology is a specification of the characteristics of a domain.  In other words, precisely what it mean for something to be in a particular domain.
A taxonomy is simply a hierarchical categorization or classification of entities within a domain.
For example, when people talk about clustering of search results, they are actually talking about arranging search results in a taxonomy, where that taxonomy is determined by the ontological characteristics of each search result.
Another way of looking at it is that an ontology is the set of all possible characteristics of the entities in a domain and a taxonomy is simply grouping of subsets of the domain based on common characteristics that have been chosen for the particular taxonomy.
Note that there isn't a strict one-to-one relationships between the ontology of a domain and a taxonomy.  There may be any number of taxonomies for a domain (or ontology), based on any number of chosen subsets of ontological characteristics.

Saturday, April 23, 2005

Need for a simple form of network personal identity (NPI)

Just the other day somebody (who I don't personally know) wanted to post a comment on one of my blogs.  All of my blogs are hosted on Blogger and require merely that you have a Blogger id, but there are people who don't have Blogger ids and may not really want yet another id to keep track of.  In this case, the person did have one but had forgotten the password since he doesn't use Blogger very often.  I personally also have a TypeKey id so I can comment on TypePad blogs.  Yes, it's necessary to have some sort of identity authentication to verify that the commenter is a real person (sort of), but the fragmentation of the identity mechanisms is rather disappointing.  It would be much better to have a "federation" of identity mechanisms that can exchange identity authentication.
My real point is that if we're having so much trouble even with simple blog comment identity mechanisms, how are we ever going to succeed with more complex issues such as email, ecommerce, privacy, law enforcement, national security, and other complex issues?
We have a very long way to go on the path.

Tuesday, April 19, 2005

The Institute for Backup Trauma starring John Cleese

One of the major points about my Distributed Virtual Personal Computer (DVPC) concept is to completely eliminate the tedium and anxiety over backing up your data.  But given that most companies have to depend on traditional "backup" technologies, a company named LiveVault has an interesting online, remote, service-oriented approach to backup, including real-time backup.  They have an entertaining video starring Monty Python's John Cleese, hypothetical director of The Institute for Backup Trauma.  I'd tell you that it's hilarious, but too many of us have experienced data losses to feel cheerful about the topic.  In any case, check out (or should say check yourself into) The Institute for Backup Trauma.
I heard about this video from Alex Barnett's blog on MSDN.

What does it mean to be scalable?

I just saw a reference to Linux being "scalable", but the term has been so abused in recent years that it has lost all its true meaning. To my mind, the issue should be whether the operating system can automatically re-balance system load between processors and host computers, and even dynamically re-partition individual applications (at least at the thread level), rather than the system designer or administrator having to tediously set up scripts and parameter files (or even be in the loop with a GUI) to do the balancing. In theory, you should be able to just add boxes (or "blades") to a LAN (or WAN) and the network operating system infrastructure does the balancing. Does Linux (or any other OS) actually do this today?

I believe that is an essential requirement for future robust networked applications, but I strenuously object to claims that is here today, since that pulls the rug out from under any efforts to initiate the type of research projects that are needed to underpin this important computing concept.

I recognize that a common usage of the term is that an OS is "scalable" if the designers and developers of the OS are able to repackage the OS for hardware computing platforms of different sizes, ranging from embedded systems to mainframe-class servers. Unfortunately, that does not make software applications themselves truly "scalable" to deal with loading issues.

-- Jack Krupansky

Notes for Distributed Virtual Personal Computer (DVPC) and PC Magazine review of BeInSync

I see a review of BeInSync on the PC Magazine web site.  It does have some relevance to my concept of a Distributed Virtual Personal Computer (DVPC), but it really is a rather distinct product concept since DVPC is designed so that your entire personal content is maintained "in perpetuity" with your PC hard-drive being only a cache and with all of its history rather than simply "syncing" a set of machines with the current values of shared data.  Still, I do have to note their product and its relevance since people unfamiliar with my DVPC concept can easily confuse the two (e.g., "But isn't DVPC simply syncing of data?").  The PC Magazine product review begins:
When the inaugural version of BeInSync made its debut last September, it proved to be a wonderfully convenient alternative to remote-control apps like Citrix Online's GoToMyPC. It not only provided Web-based remote access, it actually synchronized files, e-mail, contacts, and bookmarks across multiple machines, making sure your important data was always where you needed it—right in front of you.
See the BeInSync web site, which proclaims:
BeInSync™ seamlessly and securely keeps your files and emails in sync between your PCs, making them available where and when you need them. The end to sending yourself emails or using complicated remote access products!
It is not my intention to slight BeInSync in any way, but simply to highlight the distinctions between other products and the DVPC concept.

Intel dual-core support for hyper-threading

In a recent post I referred to Intel's hyper-threading technology, but mentioned that I wasn't sure whether the new dual-core processor chip supported hyper-threading for each core, but I just read a post on an Intel web feed ("Intel® 955X Express Chipset") that in fact refers to the new 955X Express Chipset as having the "Ability to manage four software threads simultaneously".  In other words, HTT supports two simultaneous threads in a processor, so double that for dual-cores.
Now how to design system software and optimize your application software around that architecture is another story.
I saw a reference on some Microsoft blog that implied they were seeing a 20% benefit from HTT with a specific application.

Sunday, April 17, 2005

Jack's One Law For Everything

I was reading and commenting on a post about yet another instance of identity "theft", when I decided to crystallize a principle that seems rather obvious to me, but deserves to be treated as a "law".  I call it Jack's One Law For Everything:  Without a rock-solid problem statement, there can be no joy.
Almost every time I see a problem, it usually has arisen because somebody had a misconception about the nature of some other problem that they thought they were solving.  Problems tend to cascade, unless you have a sufficient handle on the problem to truly stop it in its tracks.
Get the problem statement "right", and you stand a much better chance of achieving a durable success with far fewer unintended consequences.
Rock-solid problem statements are essential for technology ventures, especially those intended to be platforms or bases upon which other technologies are to be built.

Saturday, April 16, 2005

Comments on identity reform

[The rest of this post is a comment that I posted on a post on Chris Ceppi's blog entitled "Identity Reform".]

My best guess is that if there is such intense anxiety and mistrust about something, then there is an excellent chance that the problem statement simply hasn't been done "right". In this case, maybe the discord isn't really about identity per se, but it just happens that identity is the "button" that triggers all the emotions.

Let me ask a simpler question... Why is "digital identity" so important? The superficial answer is that we wish to deliver services that are personalized for the individual. More technically, I would say that the service "depends" of the individual. Is this really "your" data? Have you paid for the service? Are you old enough for the service? Does the service "group" recognize you as being a member? Do law enforcement authorities wish to track access or preclude your access to the service? Where should the service be delivered to? How is billing and payment handled? What service customization options do you want the service to record to facilitate future interactions? etc.

In terms of anxiety, the big question is always "Why do you need to know?" Until we can establish a track record of a few millennia (or at least a few generations) of non-abuse (by government, by business, by criminals, by busybodies, etc.), this abuse-related question will be the keystone of identity-related discourse. Of course the answer is almost always one of a handful of possibilities: 1) to deliver the service you've requested, 2) to deliver better service, 3) because the government requires the information. "Better service" can be a euphemism for exploiting data for purposes not directly related to delivering the contracted service (e.g., selling the data or bartering it for cross-marketing purposes).

If you want to talk about reform in a way that resonates with consumers, how about focusing on virtually eliminating the data that vendors can possess about their customers. Call it "privacy abuse [by businesses] reform".

I've sketched out one idea for trying to balance privacy with legitimate business needs in a concept I call a "Data Union" where businesses can get access to information they need to deliver service without needing their own privacy-violating databases. I'm sure there are other approaches, but if we can't address even "merely pragmatic" issues related to businesses, how are we going to tackle the thornier issues related to government intrusion abuse and national security issues?

See: http://basetechnology.com/data_union.htm

Introduction to Hyper-Threading

Intel has a web-based tutorial on their new Hyper-Threading Technology (HT Technology) entitled "Introduction to Hyper-Threading", which tells you how the technology works and how you can exploit it in your software.
Hyper-threading is an advanced processor design technique that allows a single processor to run more than one code thread simultaneously.  It takes advantage of the fact that not all "units" of the processor are simultaneously in use when executing a single thread.
Please note that HT is rather different than the new "dual-core" chips that are coming out.  The latter actually have two completely distinct processors ("cores").  It's not clear to me whether each of Intel's dual-core cores also support HT.

Simplicity: zero training

I ran across an interesting post by Tim Bray that formalizes a characterization brought up by Sam Ruby: "Operational definition of simplicity: Zero Training."  That's my sentiment exactly
This in fact is my definition for "intuitive":  no training required, and should be the standard that we should be applying as we seek to bring new technology into the world.

What's its format?

If you're a software developer in need of specifications or even code to cope with some oddball data file format, check out Paul Oliver's Wotsit's Format, which is dedicated to being "the complete programmer's resource on the net."

Friday, April 15, 2005

New sentencing guidelines for aggravated identity theft

This isn't a technology issue per se, but does highlight the intense interest in addressing identy-related concerns. The U.S. Sentencing Commission has just voted to set new sentencing guideliens for "aggravated identity theft":
At its public meeting April 13, 2005, the United States Sentencing Commission voted unanimously to adopt sentencing guideline amendments that will increase penalties for antitrust offenses. The Commission also voted unanimously to create a new guideline for aggravated identity theft. The Sentencing Commission promulgated the amendments after receiving written public comment and testimony at a public hearing from a variety of sources that included prosecutors, defense attorneys, and probation officers.
In response to the Identity Theft Penalty Enhancement Act of July 15, 2004, the Commission has promulgated a new guideline for aggravated identity theft. The guideline is consistent with the new statutory provisions enacted by Congress that provide for consecutive mandatory minimum sentences of two and five years, depending on the underlying associated offense involving the misuse of stolen identification. The Commission also increased penalties for defendants who exceed or abuse the authority of their position in order to obtain unlawfully or misuse means of identification. The new amendments to the sentencing guidelines will be submitted to Congress by May 1, 2005, and will take effect November 1, 2005, unless Congress disapproves them during a six-month review period.
-- Jack Krupansky

Wednesday, April 13, 2005

Bob Frankston: It's not "Identity Theft"!

Bob Frankston has a nice post (an essay, actually) related to identity issues entitled "It’s not “Identity Theft”!" which discusses some of the nuances and issues that intersect identity and so-called identity theft. As he says, it's not your "identity" that's being stolen, but some of the "tokens" that are used to engage in transactions. As he suggests:
Imagine a simple alternative — instead of giving your “identity” information over the phone or typing it into a website you have a third party who can vouch for you and the merchant. Instead of recording personal information about you the merchant would simply get a token (a unforgeable code number). Note that this doesn't require that the third party know anything about you — you can choose to have a “bearer” relationship which means you simply pay cash.

I've read his essay, but I need to read it again with a finer-tooth comb and then let him now about some of my own thinking about identity, including my thoughts on Network Personal Identity (NPI) and my concept of a Data Union for enabling access to identity-related information without giving away the store.

For those of you who are not one of us old geezers, Bob was the programmer working with Dan Bricklin on the original VisiCalc spreadsheet software at Software Arts. Bob and Dan (and their cohort David Reed) were also involved with Multics, upon which Unix was loosely based.

-- Jack Krupansky

URI versus URN versus URL

This post is mostly a bookmark so that I can easily find this reference to a discussion of using URI's versus URN's. See Sean McGrath's post entitled "HTTP URIs rather than URNs for identifiers." A URI (Uniform Resource Identifier) is essentially a URL used for the purpose of permanently detailing the specific identity for a resource. A URL (Uniform Resource Locator) specifies the location of a resource. A URN (Uniform Resource Name) specifies a resource as a name in a name space.

Here are some web resources:
Technorati: , , .

-- Jack Krupansky

Monday, April 11, 2005

Phil Windley: Distributed Back-up Systems

Phil Windley has an interesting post entitled "Distributed Back-up Systems" that intersects with my interests in a Distributed Virtual Personal Computer (DVPC).  He mentions two solutions that he is aware of:
  • PStore is a secure P2P storage solution from some researchers at MIT. Overall, the feature set seems quite nice, but the code is not available and it doesn’t incorporate erasure codes as far as I know.
  • DIBS is a similar idea written in python that does use erasure codes. The UI is something only a geek could love.
  • He says that he's "enamored with erasure codes for reliability."
    My original DVPC proposal was not based on P2P file-sharing, but I'm now thinking that there must be a middle ground between true servers locked away in expensive data centers and amateurish file-sharing on local computers.  More thinking is required here.  The big issue is that I'm looking for 100.0% reliability, whereas most people would settle for a 90% solution.

    Sunday, April 10, 2005

    Draft white paper: The Nature of Identity

    I've starting writing a white paper on identity issues entitled "The Nature of Identity". It's still a very rough draft, but at least sketches out a few ideas. Your feedback is welcome. You can also blog comments on this post as well.

    I've also started a link page for identity issues.

    I'm pursuing identity issues since my Distributed Virtual Personal Computer (DVPC) concept critically depends on the concept of identity authentication, as does my interest in software agent technology.

    -- Jack Krupansky

    Sunday, April 03, 2005

    Optimization considered harmful

    There is nothing wrong with optimization per se. Incremental improvements are almost always welcome. But there is something wrong when we spend an excessive amount of our time and resources on the incremental improvements that come from optimization while missing out on the kinds of radical improvements that can spring from true innovation. And, the real problem with optimization is that it has a tendency to reduce the flexibility of a given solution, hence making it more difficult to adapt the solution in the future as the environment evolves.

    Sure, there are times when a focus on optimization is "better" than innovation, such as when it would be too disruptive to interrupt the operation of an existing process, where people are very dependent on not rocking the boat. Incremental and evolutionary improvements can frequently be made with minimal disruption.

    But, there are frequently times where the modest benefits from optimization simply aren't good enough. Sometimes band aids are insufficient to achieve the necessary level of improvement and radical surgery may be needed. Sometimes people are so disappointed by the status quo that the kind of revolutionary improvements that can come from true innovation are preferable, no matter how disruptive they may be in the near term.

    My suggestion: We need to focus more of our resources and talent on radical, revolutionary, earthshaking, disruptive innovations and defer more modest innovations that frequently consume more resources than they save. Let the chips fall where they may. In the long term, we'll be more satisfied with the results of taking giant steps and quantum leaps than focusing on incremental enhancements.

    [This post was based on something I originally wrote on a web site of mine.]

    -- Jack Krupansky

    DARPA cutting funding for basic computer science research

    There was an article in the New York Times on Saturday entitled "Pentagon Redirects Its Research Dollars" which describes how the Department of Defense has shifted its priorities somewhat away from the funding of academic research in favor of "narrowly defined projects that promise a more immediate payoff." I have mixed feelings about this. Yes, I'd like to see more support for basic research (including work that I do on software agent technology). On the other hand, the research programs at a lot of universities have drifted over the years and gotten rather out of control, or even too focused on strictly commercial interests such as e-commerce and the whole "dot-com thing". And finally, I was never vey happy with so much research "beholdin'" to the Department of Defense.

    The good news is that the overall DARPA computer science research budget has risen, even as the share targeted to academic research has fallen.

    If there is a silver lining to this dark cloud, it's that academic researchers should be getting a message loud and clear: shift back to hard-core research that has some hard-core fundamental value, rather than continuing to chase after the lingering vestiges of "the dot-com thing". And finally, it's about time that we start ramping up research that isn't so dependent on defense spending. Who knows how many research projects were not even considered because of the implicit defense bias of DARPA.

    -- Jack Krupansky

    Saturday, April 02, 2005

    Data Union for protection and use of personal information

    My latest thoughts about network personal identity (NPI) reminded me of an idea I had many years ago, something I called a Data Union. The idea is that a "data union" (the term is derived from "data bank" and "credit union") is a place where consumers can safely and securely store personal information, in such a way that it is only disclosed to third-parties to the extent authorized by the consumer. The intention is to give consumers a safe way to earn some "interest" from the aspects of their personal information which are of significant value to businesses that seek to market to consumers. The flip side is that the high quality of the data is very attractive to marketers.

    So, a Data Union is simply yet another example of the pressing need for a Network Personal Identity (NPI).

    -- Jack Krupansky

    Friday, April 01, 2005

    Identity heating up

    Out of my own interest I happened two write two recent posts on the topic of identity on the net ("Network Personal Identity (NPI)" and "Multiple levels for network personal identity") and now I'm seeing posts by others suggesting that interest in this area is really heating up. For example, Robert Scoble posted "It's time to put identity on the top of tech world's agenda" last night that references yet some other posts.

    Network identity is nothing new per se, with Passport and Liberty and other efforts, but somehow there was never quite a critical mass. My perspective is that we never had a rock-solid problem statement, let alone a rock-solid requirements specification. Much effort has been wasted due to posturing and seeking to gain competitive advantage. Still, even besides all of that, there is this really tough problem of wanting to have control by users and a computational trust-building mechanism, while at the same time avoiding even the slightest tinge of "Big Brother".

    Personally, I'm opposed to indentity solutions that are being promulgated by large organizations, whether it be Microsoft or Sun or IBM or the government or whoever. I'm also opposed to any lame, brain-damaged "open source" children's crusade approach. I think the ultimate solution(s) will involve public domain source code, but not done in some fanatical way that is merely a ruse to disrupt intellectual property (IP) rights.

    Ultimately, I think what will happen is that lots and lots of ideas will be floated around and shot at and eventually a few ideas will survive and interest will coalesce around them. It's not necessary to have a single identity concept, but rather we need a close but relatively loose umbrella that will support any number of viable candidates. This would be analogous to "feeds", with multiple competing "standards" (RSS, Atom, etc.) that have just barely enough in common to form a critical mass that everyone can rally around.

    -- Jack Krupansky