Jeff Duntemann's Contrapositive Diary Rotating Header Image

Revisiting The All-Volunteer Virtual Encyclopedia of Absolutely Everything

24 years and some months ago, I published an article in PC Techniques, on the END. page, which was where I put humor, crazy ideas, and non-of-the-above. The article was “The All-Volunteer Virtual Encyclopedia of Absolutely Everything!” and as I recall it generated a lot of mail. The idea was this: We should create a way to capture knowledge, even highly eccentric knowledge, in a browsable online encyclopedia. Remember that I had this idea in 1993, when the Web was not so much in its infancy as still in utero, and broadband outside of an office or university was practically unheard of. That’s why I imagined the Encyclopedia as a central index with pointers to encyclopedia articles hosted on machines owned by the authors of the articles, with caching for popular items. You browse the index, you click on an article link, and then retrieve the article text back to your machine as a file via FTP, where it would be rendered in a window in a standard layout. (The now-defuct DMOZ Web directory worked a little like this.) HTTP would work even better, but in 1993 I’d barely heard of it.

I chewed on the idea for several years, and then went on to other things. In 2001, Wikipedia happened, and I felt vindicated, and even though the vision had an utterly different shape, it was still an all-volunteer virtual encyclopedia.

Of absolutely everything, well, not so much.

As good as it is, Wikipedia is still trying to be a paper encyclopedia. You won’t find articles on pickled quail eggs in a paper encyclopedia, because paper costs money, and takes up space. These days, with terabyte disk drives going for fifty bucks new, there’s no reason for an online encyclopedia not to cover everything. Yet Wikipedia still cleaves to its “notability” fetish like superglue; in fact, in reading the discussion pages, I get the impression that they will give up almost anything else but that. My heuristic on the topic is simple and emphatic:

Everything is notable to somebody, and nobody can judge what will be notable to whom.

In other words, if I look for something on Wikipedia and it’s not there, that’s a flaw in Wikipedia. It’s a fixable flaw, too, but I don’t expect them to fix it.

Several people have suggested that my Virtual Encyclopedia concept is in fact the Web + Google. Fair point, but I had envisioned something maybe a little less…chaotic. Others have suggested that I had at least predicted the MediaWiki software, and if Wikipedia won’t cover everything, that’s their choice and not a shortcoming of the machinery behind it.

Bingo.

Some years back I had the notion that somebody should build a special-purpose wiki to hold all the articles that Wikipedia tosses out for lack of notability. I thought about some sort of browser script that would first search Wikipedia for a topic, and if Wikipedia didn’t have it, would then look it up in WikiDebrisdia. I never wrote this up, which is a shame, because something similar to that appeared last year, when Theodore Beale (AKA Vox Day) launched Infogalactic.

It’s a brilliant and audacious hack, fersure: When a user searches Infogalactic (which, like Wikipedia, is MediaWiki-based) for a topic, Infogalactic first searches its own articles, and if the topic isn’t found, then searches Wikipedia. If the topic is available on Wikipedia, Infogalatic brings the article back and serves it to the user, and retains it in a cache for future searches. This is legal and fully in keeping with Wikipedia’s rules, which explicitly allow re-use of its material, though I’m guessing they weren’t imagining it would be used in fleshing out the holes in a competing encyclopedia.

There’s considerably more to Infogalactic than this, but it’s still very new and under active development, and its other features will have to wait for a future entry. (Note that Infogalactic is not concerned with Wikipedia’s deleted articles; that was my concept.) One of the things I find distinctive about it is that it has no notability fetish. Infogalactic states that it is less concerned with a topic’s notability than it is about whether the article is true. That’s pretty much how I feel about the issue: Notability is a holdover from the Age of Paper. It has no value anymore. What matters is whether an article is true in all its assertions, not how important some anonymous busybody thinks it might be.

I’m wondering if the future of the All-Volunteer Virtual Encyclopedia of Absolutely Everything is in fact a network of wikis. There are a number of substantial vertical-market wikis, like WikiVoyage (a travel guide) and WikiSpecies, which is a collection of half a million articles on living things. I haven’t studied the MediaWiki software in depth, so I don’t know how difficult this would be, but…how about a module that sends queries to one or more other wikis, Infogalactic-style? I doubt that Wikipedia has articles on all half a million species of living creature in WikiSpecies, but if a user wanted to know about some obscure gnat that wasn’t notable enough for Wikipedia, Wikipedia could send for the article from WikiSpecies. Infogalactic already does this, but only to Wikipedia. How about a constantly updated list of wikis? You broadcast a query and post a list of all the search hits from all the wikis on the list that received the query.

This is the obvious way to go, and it’s how I envisioned the system working even back in 1993. Once again, as I’ve said throughout my career in technical publishing, the action is at the edges. It’s all about how things talk to one another, and how data moves around among them. There’s a distributed Twitter clone called Mastodon with a protocol for communication between servers. That’s the sort of thing I’m talking about.

Bottom line: I admit that “absolutely everything” is a lot. It may be more than any one single encyclopedia can contain. So let a thousand wiki encyclopedias bloom! Let Wikipedia be as much or as little of an encyclopedia as it wants to be. The rest of us can fill in the gaps.


Note well: Theodore Beale has controversial opinions, and those are off-topic and irrelevant to this entry. I mentioned one of his projects, but the man and his beliefs are a separate issue. Don’t bring them up. I will delete your comments if you do.

9 Comments

  1. Bob says:

    Web encyclopedias remind me of the relevance of the traditional public libraries. My neighborhood had an event and the city government parked a bookmobile there. I cannot believe these things still exist. The truck/bus probably cost say $100K but the employee who drives it around probably costs the taxpayers over $150K per year with the usual government benefits. (The Frisco sidewalk poop cleaners each cost about $175K/year) But the library carries very few ebooks. Why don’t they shut this thing down and use the money to buy ebooks??

    Which brings up why there are no indie writer ebooks in public libraries? I have never seen one even big sellers on Amazon. Jeff, do you know why this is the case and what could be done to get them to carry some of your books?

    1. Traditional publishers aren’t the only ones with business models. Libraries buy books from publishers, and thus inherit certain elements of traditional publishing’s business model. I’m not a librarian and am not certain of this, but I suspect that a book (print or digital) must have an ISBN to “exist” and be purchased by a public library. Small press does sell books into public libraries. The Colorado Springs Public Library bought two copies of The Cunning Blood in hardcover from the small press that published it. The Colorado Springs Public Library also has ebooks, and a fairly sophisticated system for loaning them. The breeder from whom we bought our dogs used to take out ebooks from the CS library all the time, and basically picked them clean. (I think she now uses KU.)

      So the Indie problem may be nothing more than lacking an ISBN, or perhaps there are now so many indie books that libraries have selection criteria. It’s a solid and interesting question.

    2. Carrington Dixon says:

      At least part of the difficulty with ebooks in public libraries is DRM. When I buy an ebook, I expect to be able to do anything I want with it. When I checkout one from a public library, no so much. I know the McKinney library (not far from Frisco) has some scheme to allow ebooks to be checked out and returned; i have not investigated it.

  2. Christian R. Conrad says:

    From https://en.wikipedia.org/wiki/Quail_eggs : “Quail eggs are considered a delicacy in many parts of the world, including Asia, Europe, and North America. In Japanese cuisine, they are sometimes used raw or cooked as tamago in sushi and often found in bento lunches. …” 🙂

    1. Well, I’m always happy to be proven wrong. (Emphasis on “proven,” I should add, for the benefit of Certain People.) I suspect that Wikipedia’s notability requirements have loosened as servers have gotten faster computationally and storage cheaper. I also think that there is now a strong political vector in deciding which people are notable and which are not.

      Phenoms like Infogalactic and Everipedia (which I just heard about and am researching) will definitely help fill in Wikipedia’s self-inflicted gaps.

  3. TRX says:

    I thought a lot about your original article when you printed it, and I thought about it again when I saw the first mention of Wikipedia. And each time Wikipedia makes the news for its various internal problems, I’ve thought of it again…

    Not bad for something you wrote, what, 25 years ago?

    The “multiple wiki search” shouldn’t be that hard; as you noted, IG does part of that already, and that’s how “aggregator” search engines like Dogpile work.

    Back in ancient times when disk space was dear and bandwidth was ridiculously narrow (my “56K” modem often dropped down to 2400 baud due to poor phone lines) Wikipedia’s notability requirement made a bit of sense.

    Frankly, there’s room for a lot of competition in the online encyclopedias. It would be a logical thing for, say, the US Library of Congress to set one up. It could be argued that their charter requires it…

    Every organization is going to slant articles and results their own way, but having more than one source would be nice.

    1. Infogalactic has an unimplemented feature that supposedly neutralizes a certain amount of bias, though I’m not sure I understand how it will work. It would be interesting in spades to do a diff on articles that exist on both IG and WP and have been edited by both.

      I’ve also heard that some IG-original articles are being absorbed by Wikipedia, though I have no proof of that. If I were twenty years younger I would try to at least design an automated article-sharing extension to MediaWiki, so that encyclopedias could be queried for their newest articles (and perhaps edits) and allow the editors to select which articles to add to their own databases. For all I know somebody’s already done this (like I said, IG does it, at least for Wikipedia) but I’ve never heard of a generalized system.

    2. I wrote the article in 1993, but didn’t actually put it in the magazine until the middle of 1994. So next summer will be its 25th anniversary of publication. I’ll probably do another revisit of the idea here. Stay tuned.

      1. TRX says:

        I didn’t see it until 1996. I’d seen a couple of copies of Visual Developer and was impressed; after getting a new job at a much higher pay scale than previously, I splurged by buying a complete set of back issues as well as a subscription.

        I remember the Jiminy too. Considering how smartphones work, we’re close that…

Leave a Reply

Your email address will not be published. Required fields are marked *