[geeks] Any opinion on "Exist": XML Database

Mon Jun 10 11:16:16 CDT 2002

On Mon, Jun 10, 2002 at 05:17:44AM -0500, Jonathan C. Patschke wrote:

> #ifdef SKEPTICAL_PROGRAMMER
> 
> To me, it looks like a solution in search of a problem.  It looks very
> elegant and fairly simple to use, but I can't imagine that it would come
> close to the speed of a well-indexed real database.  Just remember that if
> you can express what you want in set-theory, you can express it in SQL.
> 
> For example, the example they give could be expressed in SQL as a table of
> plays, a table of speakers (where each row relates to a play), a table of
> acts (where each row relates to a play), a table of scenes (where
> each row relates to an act), a table of lines (where each row relates to a
> speaker and a scene, and has a line number).  If you wanted speeches set
> apart from the rest of the play, you would create a table of speeches
> where each speach related to a scene.
> 
> The real win for XML is that it's easy to create, import, and edit
> data--you don't have to worry about the relations in specific, as they're
> implied; you also don't have to worry about the care and feeding of a
> DBMS.  The real win for SQL is (typically) speedy access and easy
> description of what data you want to retrieve; you also can draw relations
> that might not've been conceived earlier, if you plan your database
> correctly.  The biggest win for SQL is being able to tag a datum with an
> ID so that you know that when pieces of identical data are references to
> the same thing--without that, you just have to guess.
> 
> Combining the two sounds like a real lose for general-purpose work,
> espeically if you get a -lot- of queries.  It would make much more sense
> to craft a database, import the XML, and export it, as needed.  XML, if
> I'm remembering correctly, was initially created as an interchange
> mechanism, not a storage/retrieval mechanism.

I seem to remeber reading something in Dr Dobbs awhile ago about a
database made for XML.  But it stored the data in it's own indexed
binary format and just took XML in and spit it out.

This is as opposed to other databases that can speak XML, but they do
so by just allowing XML in their predefined schema to be written, and
they spit out XML in the sama schema.  This database took whatever XML
you wanted to give it.

I think Lore might have been what I was reading about
(http://www-db.stanford.edu/lore/), but I'm not positive.

> There are very valid reasons for wanting to query SGML or XML, but I think
> it'd a bad idea to get in the habit of relying upon it as a permanent form
> of storage, unless you absolutely cannot get anything else working
> (example: a app that runs on both Unix (which typically only has BDB) and
> Windows (which typically only has Jet or ODBC), but doesn't use Java) 
> predictably.

If you are writing you cross platform code in C, just take your
prefered unix lib over with you.  Most of them have probably been
ported by now.

> My personal opinion is that XML is like the "Object-Oriented" of the early
> 90s", the "Web-enabled" of 1996, and the "Java-powered" of 1997--it's new
> and everyone wants to use it because it's new.  It has very real potential
> in a few areas, but it's not a panacea, and it's actually worse for some
> applications than more traditional technologies are.

My feel on the XML issue is that you are partially correct.  I also
think that a lot of people are over welmed when it comes to wanting
small/simple solutions and they are face with databases where the
monstrously large MySQL is the smallest.

Further, people want to be able to store more complex data
structures.  Yes, you can store them in SQL, but often XML is closer
to the way they are used in memory.

And further still, XML lends itself to rapid prototyping schemas.

So, these are three things that XML doesn't do well, but nothing else
really seems to, and XML has such a preponderance of development tools
that using it is pretty simple, unless you hit the scalability
problems.

> The things I would use XML for would be news stories on a CNN-like site,
> recipes, or any other project that meets the following criteria:
> 
>   1) High amount of implied metadata (flour is a dry ingredient, for
>      example).
>   2) Content should be auto-formatted, rather than having formatting
>      embedded, and there is a lot of formatting to be done (what if CNN
>      wants to alter the positioning of the ads between paragraphs?).
>   3) Searches are basically limited to full-text and a -tiny- bit of
>      metadata (You'll probably want to search by author, text, and title
>      at CNN, but you really don't care about most of the other metadata).
>   4) You -don't- need many-many relationships.  Doing this in SQL is ugly
>      enough.  I can't see how you'd do it in XML without emulating the
>      way you'd do it in SQL (thereby killing your usability bonus).

I don't really know what you mean in point 1.  Is flour supposed to be
a tag or something?

I'm not really sure that any of those are a good reason to use XML.  2
might be a good point, but why not do the formatting in an XSLT sheet
applied to XML being spit out by a SQL or object database.  And even
then it is an ugly solution.  A prettier one would be to adapt XSLT to
work directly on the datatree returned by the DB without having to do
a pair of intermediate XML conversions.

Of course, at this point we are starting to dance around the fact that
XSLT, etc, is just function programming, and the work that has gone
into tranformational compilation and transformational editors in the
function programming world is relevent.

As to point 3, simple metadata searchs is easy with SQL.  Full test
searchs aren't easy with generic SQL, but many DBs have proprietary
modules that make them easy and fast.  Oracle's is supposed to be
especially good.

I guess with all your points, these are points for if these aren't met
then don't use XML for storage.  But what needs to be met to say that
you should use XML for storage?

> But I've been told that people should take what I say with a grain of
> salt.  I still primarily code in C.  I still use LaTeX for work
> processing.  I'm still not sold on top-down design.  I still use HTML 3.2
> for web-markup.  I don't -think- I'm afraid of new technology, but I
> certainly don't seem to use a lot of it.

I didn't know that top-down design was the universal standard.  And
what's wrong with using and old markup system like latex if it is
superior to anything made before or since?

As to C, that is probably a habit that is best to get out of.  Not
that I've managed to myself (I keep running into pain w.r.t. lack of
universal tools for other languages).

-- 
Joshua D. Boyd