Managing Multimedia and Unstructured Data in the Oracle Database
上QQ阅读APP看书,第一时间看更新

Structures

A traditional data warehouse will usually not contain structures within it. The data will be stored in tables and joined together and queried as required. Summary and dimensional tables are also built to improve performance and give dimensional views of the relational data.

With a multimedia warehouse, the focus is different. Each digital image is viewed as an object with its associated metadata describing that object. The objects are still queried in an ad hoc fashion, and dimensional and summary tables are still built, but the objects are put into structures to help manage and control them. For the user querying the warehouse, these structures might be hidden, or they might be used to add intelligence or control to the queries performed.

The following describes some of the structures that can be deployed into a multimedia warehouse. Whether these structures are actually used is dependent on the type of objects being stored and the purpose of the multimedia warehouse.

Collections

A collection is a group of digital objects. An object typically belongs to one collection but can live in multiple collections. Attributes can be assigned to a collection, including security, metadata, and categorization structure.

A museum would have multiple collections. Each collection could equate to a physical section in the building (objects in the east wing or handel building), a time period (16th century art), or objects similar in type (pottery, paintings, tapestry).

Collections

A government department might equate each collection to a department.

A photo laboratory might equate each collection to a photo shoot (the Jones wedding, the university student photo shoot of 2012, the motocross race).

In most cases, a collection has an owner who is the manager of the set of objects. Grouping the digital objects together enables actions to be done en-mass to the whole collection. Each digital object might have its security set or its metadata updated.

A collection can be assigned a name, enabling it to be easily referred to.

Groups

A group is a set of collections. Groups can be nested and contain other groups.

If a government organization sets up each section to have its own collection, then it might group these sections into a branch and each branch into a department.

A photo laboratory might group multiple collections (where each one is a photo shoot) into a photographer, where that photographer owns all the digital objects.

A museum might create a group for public digital objects, where all the other groups, which are marked as private, contribute their public images to the group.

Like collections, having groups makes it easier to classify digital objects and work on them en-mass. Security attributes can be applied to the whole group. A group can be taken offline.

Groups

Categories

Within a collection, digital objects can be stored in a hierarchical structure called a category. The aim of the category is to enable these digital objects to be classified and provides an alternate method for finding and viewing digital objects.

A digital object can belong to multiple categories. A category can be nested. Though a category structure is typically hierarchical, there is no requirement for this to be adhered to.

Category structures can have security attributes and rules associated with them to make them easier to manage. A digital object can inherit the security roles of the category when assigned to them.

Even though a digital object can belong to multiple categories, for management, it's best if it belongs to a primary category.

A category can be compared to a file system structure. A category structure can map exactly to a file system structure but not necessarily the reverse. Categories transcend the limitations imposed by file system and enable more creative and flexible methods of handling digital objects.

Categories can be virtual or dynamic. They can be based on attributes of the image. A good example is date on which the image was created. The dynamic category structure enables a hierarchy to be built using year, month, day, and hour.

A category can also be based on the metadata in the image. If the metadata within a digital object includes address, this can be linked to Google Maps (11) and, the address can be reversed engineered into spatial co-ordinates. These co-ordinates can then be used to enable a category structure based on location, including country, city, suburb, and street.

There is no limit to the type of categories that can be created virtually using metadata or physical attributes of the digital object.

Lightbox

A lightbox can be described as a play area or holding area for images. Lightboxes can be private or shared with others. Nearly identical in structure to a category (and could even be called a type of virtual category), a lightbox is slightly different; in that, it is created by the user and images are put into it manually. It's also similar in concept to a shopping basket. A shopping basket is primarily private and session specific. A lightbox can be just for a session or kept permanently. Some other unique characteristics of a lightbox include:

  • A lightbox contents can be manually ordered. Depending on the interface, a lightbox contents can be sorted in three or more dimensions (one additional dimension being time).
  • A lightbox can be shared with others, even though other users don't have permission to access the images. Permission is inherited via the lightbox. This, of course, is a feature that might not be suited for some secure multimedia warehouses.
  • Actions can be performed on a lightbox. Its contents could be printed or e-mailed to a person. A request might be put in to transform, convert, or fix the contents of the lightbox. Additionally, mass editing of metadata can be done against all the images in the lightbox.
  • Lightboxes can be merged or set operations performed on them. Find the intersection of two lightboxes, meaning find the images common to both. Also, take one lightbox and minus another lightbox from it, meaning find the images in the first lightbox that do not exist in the second one.
  • Lightbox contents can be checked out or in. The check-out process puts a lock on the digital object, saying it's been exclusively locked for modification by a user. Check in releases that lock. The lock should not be confused with a database lock, which is part of a transaction. A check out lock is independent of the status of the database and immune to database restarts. Check out locks can have expiry dates and override locks on them to make it easier to manage them.

The visual metaphor for a lightbox is a person taking a set of photographs, putting them on a table, and sorting through them and keeping the ones they want. Historically, a lightbox was a plastic box with a back light that photo laboratories used for sorting out images for a photographer and determine which ones were suited for printing.

Relationships

A relationship is a many-to-many link between two digital objects. The type of relationship can be used to describe characteristics. Information can be stored in the relationship and that can adapt over time, resulting in network intelligence.

Standard relationships include:

  • Master: This is the official or best-quality image in relation to all the other images. This relationship links multiple images together and specifies that one is the master to be used for viewing or printing. It's assumed the other images are similar in relationship to the master.
  • Duplicate: This is the opposite of the master. If one image is the master, the other can be referred to as a duplicate. It can also be thought of as a backup.
  • Parent/Part: One image in the relationship is marked as the parent. This might be a complete view of the digital object. The part images are subsets of the image. There might be different views of that image. A part can also be a master with its own duplicates. A part can also be referred to as a child.
  • Related: In this, two images look similar but are not the same. This is like a see also. Two images might be related because they were taken by the same photographer, or there might be pictures of objects made by the same artist.
  • Dynamic: Relationships can be derived based on analysis or ad hoc pieces of information. In a criminal investigative multimedia warehouse, digital images of different people might be associated with each other based on the fact that they were in the same place at a particular time.

Relationships can be time-based. Meaning that they are valid for a set period of time or can change over time. It should be possible to perform queries based on time.

Using basic neural network algorithms, relationship information can change over time based on usage. A simple counter might be used when a relationship is created. As this relationship is reused, this counter increases conveying information about its importance.

Google uses this concept in its search algorithm to establish the importance of a web page based on how many other pages access it. In this case, the relationship is between two web pages, and the counter increases for every page referencing it. Pages with relationships with large counter values are deemed to be important.

In the case of a criminal investigative multimedia warehouse, the counter can be used to note every time two people either met or were in the same vicinity of each other (if surveillance is used). In such a scenario, patterns of behavior can be ascertained based on the strength of the relationship. The strength is subjective based on the counter value.

In a museum warehouse, relationship information can be stored based on how often an image is clicked on and linked to a search. Or how often an image is accessed if other images are also accessed. In this case, the relationship is established. If other people click on the same image combination, the strength of the relationship is increased.

The way the counter value increases can be linear or geometric. It can also be time-based and relationship strength values can decrease over time if not used.

Though not a true neural network, a large amount of information can be captured between digital objects based on usage and access by users. Intelligence can be added to the multimedia warehouse, which might not be possible using conventional means. This concept adds value to the warehouse.

Thesaurus

A thesaurus can be described as a set of terms linked together based on similarity. The terms belong to a controlled vocabulary. This is important, as new thesaurus terms cannot be added without clarification by an authority. A thesaurus can be hierarchical but does not have to be. A thesaurus conforms to a defined standard. There are numerous standards with a popular one in usage being the Z39.19-1993 monolingual thesaurus.

The terms in a thesaurus are linked together using relationship constructs. The most common two are broader term and narrower term. The following are examples:

  • Geography Thesaurus: Broader term is Australia. Narrow terms are New South Wales, Queensland, Australian Capital Territory, Victoria, South Australia, Tasmania, Western Australia, and Northern Territory.
  • Furniture Thesaurus: Broader term is Bedroom. Narrow terms are bed, clock, radio, mirror, chair, and wardrobe.

Relationships are one way, but common usage indicates bi-directional support. Terms can be self-referencing, and it's possible to have circular references, but this is discouraged.

A digital object can be mapped to one or more thesaurus terms. A user can navigate through the thesaurus, then perform a search for all digital objects that match the term. Searches can be hierarchical, and do not have to match exactly to a digital object. For example, a digital object can be mapped to Victoria, but should still be returned if a search on Australia is done.

A digital object can be manually mapped to a thesaurus term or mapped dynamically using its metadata. A manual mapping is required if there is no sufficient or accurate metadata to determine which thesaurus term or terms the digital object belongs to.

Additional thesaurus concepts include Used For, Related Term, and Use Reference.

Taxonomy

A taxonomy is similar to a thesaurus, with the addition that it contains preferred terms and is used mainly by science. It is a classification whereas a thesaurus is a store of related terms. The terms are contained within a hierarchy and the terms conform to a well-defined vocabulary. A taxonomic hierarchy is also well-controlled. In the life science taxonomy, different levels in the hierarchy are fixed and equate to values such as genus, species, and subspecies.

Taxonomic examples include taxonomies for fossils, plants, psychology, and even business. Taxonomic structures can vary in their meaning, use, and strictness of adherence. Most major taxonomies conform to an internationally agreed standard to ensure that the structure remains consistent and accurate. As there is meaning in the structure, knowledge can be associated with the results that are returned. Ensuring the taxonomic structures are correct can be considered to be very important.

Due to the well-structured nature of a taxonomy, ad hoc queries performed against digital objects can be returned in a taxonomic structure.