« Nose to the grind stone | Main | I wish it was that easy »

Because I need to explain it to someone to find an answer that fits

There are changes aftoot in the graphics arena, so I'm working on
other parts of the project. Or at least I'm trying to work on other
areas. One of our testing servers is constantly running out of
space. The other testing server doesn't serve: it eats files which sit
in its belly waiting for something to happen to them. So I've decided
to work on the links database, and in that respect, I'm having a bit
of nerd-equivalent writers block.

Warning: highly technical mumbo-jumbo in the rest of this entry

update: I was right. It worked. I found the answer. Normalize the data first, then parse it. Writing this entry jogged my memory with normal forms.

I've got a tab-delimited text file. There's one link on each line,
along with meta-data about which section the link is in, the
description of the link, and so on. The single file is really the
cross join of about about three tables, one table each for links and
the sections, and a table for an implicit hierarchical tree structure
on the sections.

For example: A link to the Steven Marx website is in the courses
section of the "Courses and Study Materials" section in the main
index section. There is just so much implicit information in that one
sentence that I'm having trouble working it out. Firstly, there is a
main section, which contains a "Courses and Study Materials"
subsection, and in that subsection there is a link to Marx's website.

I'm trying to write a script (in Perl, of course) to parse each line
and add the link a database. Of course, the section may or may not be
in the database. There is no way to tell, just by looking at the
single line. And, if the section needs to be added to the database,
so does some information about the tree structure (that is, which
section the section is in).

The problem is even more complicated: the sections can be nested up to
four levels deep. There are links in the main section, as well as
subsections, each of teh subsections has subsubsections and links,
and...

I think what I really need is some sort of automated tool to convert the system
into a BCNF, or 2NF relational algebra. Such a tool doesn't exist of
course, as the entire problem is NP-Complete. I do have notes from my
CSC370 class (I got an A!). In fact, they explain the process for
normalizing the data, and getting a normal form from it!

Thanks Blog!

About

This page contains a single entry from the blog posted on April 29, 2005 4:18 PM.

The previous post in this blog was Nose to the grind stone.

The next post in this blog is I wish it was that easy.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33