GCCC Banner

Thomas F. Saffell Library
For library information or services, phone 620-276-9511, fax 620-276-9630, or email mailto:library@gcccks.edu

GCCC Home | Library Home | How to Use Library | Intro to Research | Online Research Guides
Web Searching Advice | Searching Concepts | Finding Books, Dewey Etc. | Dewey Number Table

Library Research – Concepts and Techniques for Searching Databases

Selecting the Right Databases

Databases may be classified by scope, coverage, and function.  Whether the researcher adopts these terms or not, it is important to be alert to the kinds of things which can and cannot be found in each database that is used, or that should be used. To illustrate the division of databases by scope, we see that the  WW Wilson company publishes print and electronic indexes with titles such as General Science Index, Index to Legal Periodicals, Humanities Index, etc. as well at the Readers Guide. The scope of subjects included is indicated by the titles.

A database may function only as a bare index or may include a collection of the full text of articles, or even books. The function of an index is to provide a list of words such that each word is connected to a pointer that describes the location of information described by that word. A bibliographic index locates and points to books and/or periodical articles. A full text database contains the reading material that is the ultimate object of the search as well as indexing functions to find the relevant reading material -- so it functions as a library of materials as well as being an index.

The coverage of a database refers to the dates of materials indexed or included in full text, as well as other limitations such as geography and language. Some databases may include coverage only materials intended for a popular or general readership and some only materials for academics or professionals. Some may cover a large list of titles, others a small list.

A list of available databases by subject matter is useful start for searching. First Search, Lexis-Nexis, and INFOTRAC all organize databases roughly by subject matter. Periodical Abstracts in First Search and Expanded Academic ASAP in INFOTRAC are general interest databases that try to provide indexing and full text for a cross section of magazines and journals in most subject areas. Some general databases are better than others for specific subjects. Each database (general and specialized) indexes a different list of magazines and journals, although you will also find many titles are common to competing databases. This is a much bigger issue when searching for full text. All databases that have some (or all) full text have many unique or exclusive titles that other do not have -- and, of course, do not have the titles that are exclusive or unique to a competing database.

Research that goes beyond finding a couple of articles should use several databases for magazine and journal articles. If the local library is not a very large one (or if the subject is very specialized) library catalogs beyond the local library may be needed. Even on the Web, more than one search engine should be used.

When no database specific to the subject is accessible, it can be useful to learn which  index that is available may have the most relevant material. For example, the ERIC database, which is discipline specific to education, contains indexing to many periodicals in counseling and psychology as well as education.

A  bibliographic index is a locator which identifies the books or periodicals containing the information sought -- it does not contain the information, ie. full text of the books or articles. A library online catalog is a bibliographic database which identifies books (and sometimes periodicals) owned by a specific library or in some cases a group of libraries. Recently, many libraries have added e-books to their catalogs so that even the catalog may now have some full text.  Periodical indexes may simply index articles by words in the title, subject words tagged to articles, names of authors, and often abstracts describing the contents of articles (also indexed by the words in the abstract). Many now have full text of articles with full indexes of all the words in the article . World Wide Web indexes such as Goole are analogous to bibliographic indexes, giving the Uniform Resource Locator (URL) of a Web page.

Selecting Search Words Effectively

The usefulness of a search in any database (the search uses one or more indexes whether the user is aware of them or not) depends upon how well the word used to search matches the words under which the target is indexed.

Keyword indexes include every word that is contained in a defined set of fields in the database. A typical keyword search will look in indexes that include all the significant words in the title, authors, abstract, and subjects fields. Sometimes there is an option to select also searching words in the full text of databases that include the full text of articles and/or books.

Subject indexes only include words that are added to the database record for the purpose of  describing the contents of the article, book, etc. Usually the subject words are taken from a dictionary (thesaurus) with a deliberately limited list of words to describe all subjects within the scope of the database. By using only one word for one specific subject, this type of index gathers all the articles on that subject together if the searcher uses the same word the indexers are required to use. To take advantage of this feature, the user must use a term that is known to be in the subject dictionary for the specific database. Reference librarians can help with this. Using the subject words displayed by the database in the record for an article or book that is right on the subject is another way to identify useful subject words.

For help learning how to select words to search see Selecting Words Defining a Search

The kinds of information recorded in a library catalog are based upon national and international standards which specify the structure of the database fields and define the content for those fields. Subject fields in U.S. libraries generally use one of two standards for the controlled vocabulary dictionary of subject headings used to describe the intellectual content of the book (Library of Congress or Sears). In the past contents notes for books giving chapter titles or article titles in a collection, short story titles in a collection, etc. have been optional. Such contents information appears in some but not all of the records in most library book catalogs -- it is becoming more common. When periodicals are identified in a library catalog, it is the bibliographic information about the periodical itself, not the articles published in issues of the periodical. Thus, an entry for Time Magazine would give the correct title, previous and subsequent titles if there had been a name change, publisher, dates of publication, dates of issues held by the library, etc. Library catalog records for books rarely contain abstracts of the book's contents.

Periodical databases do not follow a single standard for either the structure of fields or the content of the fields. Several periodical databases have become dominant in their fields, so that their proprietary (not industry standards based) dictionaries of controlled vocabulary and data structures are widely used. Many of these have available instructional materials for effective use of the subject vocabulary for the database. In the field of education, the ERIC system has a very elaborate thesaurus of descriptors (subject terms) and of descriptive codes classifying articles as research, literature review, etc. To do a thorough search in education, the Wilson Education Index must also be used, because many articles are found in it which are not found in ERIC (and vice versa). Medline is another database which is its own standard, using an elaborate and technical thesaurus. Both ERIC and Medline are government funded and are available free on the World Wide Web. They are also available in the database list of First Search, a paid subscription only service. Many other subject areas have specialized indexes and full text databases.

Some Details of Database Design

The greater the amount of information in the record describing a book or article, the greater the number of access points by which the item may be found using a search engine.  Search engines are programs which search for words or phrases (made up of a few words in a specific order) in the database.  How the  searches may be combined to retrieve relevant items is discussed below. The most basic information in a bibliographic database is title (of a book or article), author,  publisher (or title of periodical), and date. If the database is structured, these elements are in predefined parts of the record that describes an item. So, there is a title field that always has only the title, an author field that only has the author(s)' names (usually last name first), etc. Many databases have a field for controlled vocabulary of subject words or phrases. Some databases also have an abstract added. The abstract adds access points by adding uncontrolled terms in the abstract field to the controlled vocabulary in the subject descriptor field. It also adds information which helps the searcher decide whether to spend time looking at the items itself or pass on to the next item.

There are two basic types of abstracts. Some databases have summary abstracts which give part of the ultimate content of the item. Others have only descriptive abstracts which list the topics covered, the type of treatment of the subject etc. but do not summarize ultimate content. For example an abstract of a Consumer Reports article on cars  which said the article compares Ford Taurus and Honda Accord on handling, fuel economy, etc. would be descriptive whereas a summary which said Taurus was found to have the best fuel economy, Honda the best handling, etc. would summarize the content.

A free text search looks through an unstructured block of text for strings of characters (words for example). This is usually contrasted with a search through controlled vocabulary words in a subject descriptor field. Titles can be thought of as free text. The term full text is usually reserved for databases which include the complete text content of a book, article, etc. In the current state of technology, many databases which have full text do not have the full content of an article. They may omit photographs, drawings, and graphs. Tables are sometimes provided and sometimes not in theses text only databases.

Some of the power of a database and search engine combination derives from the number of access points and how the access points can be combined. A database such as ERIC has many access points added to the basic title, author, subject. For example, ERIC articles and documents are classified as research, literature review, etc. so that only articles reporting original research may be found, leaving out all other kinds of articles. This is an example of a value added piece of information. Human beings doing abstracting and indexing work create a record such as a library book catalog record or an ERIC record. WWW systems of index and search engine may create added value index information using automated procedures. For example, http://www.alltheweb.com/ allows users to search only for pictures, only for audio, etc.

When searching databases, the success of the search can be expressed in terms of precision and recall. High precision is the retrieval of the highest possible percentage of relevant items and the smallest percentage of false drops (irrelevant items). High recall is the retrieval of the greatest possible number of relevant items. In practice, the techniques which give the highest precision compromise recall -- they fail to retrieve some very relevant items. Conversely, going for highest recall usually results in also retrieving lots of irrelevant items. If you need a few good articles fast, go for high precision. If you need to know everything written on your subject, go for recall and allow lots of time to sift through big results lists.

World Wide Web search engines such as Google and AllTheWeb are keyword indexes to publicly accessible web pages. They typically use web crawler programs to troll through web addresses, retrieve words from the accessible pages, and build and index tying the words to the URL (Uniform Resource Locator) addresses which contain those words. Because there are only minimal (and not obligatory) standards for fields and organization of content, searching such a web index is likely to be less precise than a well planned search in an elaborately structured database such as ERIC.

Some Details of Database Organization and Behavior

In order to do computer based searching effectively, you should have some basic information about the mechanics of databases and searching as well as the substance of the ideas you are working with.

The simplest database is a table of information, organized by records and fields, such as the following:

Periodical Abstracts

Article Title

Magazine

Title

Author

Date

Controlled Subject Descriptors

Text

Famous Four-Footed Favorites

Spectator

Blaikie, Thomas

Dec 4, 1999

Nonfiction ; Pets ; Royalty ; History

(Abstr.) Blainkie reviews “Reigning Cats and Dogs …

Corgi and Bess

TLS

Turner, E S

Oct 22, 1999

Nonfiction ; History ; Royalty ; Pets ; Cats ; Dogs

(Abstr) Turner reviews ....

The table above is a simplified example of the Periodical Abstracts database. The lines are records and the columns are fields. So, the record Famous Four-Footed Favorites is made up of six fields that hold predefined types of information about a single magazine article.

When you search this database in the default simple search mode, you are performing a keyword search with Boolean logic. You type a word or words in a single blank and click on a radio button (or press enter) to transmit the search request to the search engine. The default search form blank is labeled “Search for.” In the Periodical Abstracts database, this  search actually looks at several different fields including Article Title, Controlled Subject Descriptors, and Abstract. There are many kinds of search procedures a database may use. One of the oldest , most common, and most useful is the logical or Boolian AND. That is what the Periodical Abstracts simple search does if words separated by spaces are typed in the blank. In order to speed searches, common words that usually carry no meaning are considered stop words and are ignored in the search. Conjunctions and prepositions are usually stop words, ie.: a, the, an, and, if. But, the words AND, OR, and NOT can be Boolean Operators and may be interpreted as commands to change the type of search procedure the database uses.

Boolean Operators

An operator is a symbol that indicates a mathematical or logical procedure to be followed -- in arithmetic, the addition operator "plus" or " "+" between two numbers would tell you to add them -- in logic, the Boolean operators tell you to perform certain procedures on sets or lists of items.

 Boolean operators are commonly used by computer based library book catalogs, magazine article databases, and WWW search engines. Some of these also use more complex rules to make searching more effective. In the case of a service such as Google, the exact procedures used are trade secrets and the user cannot tell exactly how the results were selected by the search engine.

If you wish to make the computer procedure for a search more specific ( more narrow) you will probably want to use the AND operator. If you wish to make the computer procedure for a search more general or broader you will probably want to use the OR operator. Naturally, the words you use in the search will usually be the most important factor, but the logic your use can also be very important. In First Search databases such as Periodical Abstracts, the AND operator is the default operator if words separated by spaces are typed in the simple search blank. Other services, however, have different default assumptions. For example, the AltaVista WWW search engine assumes a logical OR. Logical OR is explained in a following paragraph. INFOTRAC assumes the input is a phrase -- that all of the words typed in the blank must be found in the exact order they are types in.

The AND Operator

The logical AND operation is usually performed by typing a search with the word "and" or a symbol representing "and" between two or more terms typed in the search blank. In the above diagram, the set of all items containing term A is represented by one circle and all items containing term B by the other circle. Assume you are searching a database of magazine articles. If term A were "dogs" and term B were "cats," then the set of articles represented by circle A would be all articles containing the word "dogs" and circle B would be the set of articles containing the word "cats."  When we say the article contains the words we mean the word searched appears at least once in one of the fields searched.

If you were searching for articles which mention both dogs and cats, you could enter the search "dogs AND cats." The resulting list would be represented on the above diagram by the shaded intersection of circles A and B. The intersection is labeled C. In the example search, the area C would be the set of articles which contain both the word "dogs" and the word "cats." The articles which only mention one or the other are excluded from the results. A computer might perform the operation by taking the first word in your search blank and making a list of all articles containing "dogs."  Then it might select out of the "dogs" list only those articles which also mention "cats." The result C cannot be larger than the smallest of A and B -- in the real world it will generally be much smaller.

Use this operator to make searches more specific. For the first search word, use the broadest term that describes your topic. Then use the AND operation to connect a word that makes the term more specific. You may connect several search words with logical AND operations. If the database automatically assumes the AND operator, you would simply type the words with a space between them. If you don't get enough results, experiment with different combinations of words and use fewer and more general words in your search blank

The OR Operator

The logical OR operation is usually performed by typing a search with the word "or" or a symbol representing "or" between two or more terms typed in the search blank. In the above diagram, the set of all items containing term A is represented by one circle and all items containing term B by the other circle. Assume you are searching a database of magazine articles. If term A were "dogs" and term B were "cats," then the set of articles represented by circle A would be all articles containing the word "dogs" and circle B would be the set of articles containing the word "cats."  The result of A OR B would include all of A plus all of B. Articles containing "dogs", articles containing "cats" and articles containing both would all be included. When we say the article contains the words we mean the word searched appears at least once in one of the fields searched. Both circles and their intersection are shaded.

Use this operator to retrieve a larger set of results than with AND or with a single search word. You can make a list of words connected by OR to find the kinds of articles that would deal with any of them. If your topic required information about the most popular pets, the "dogs OR cats" search could be a reasonable one.

The NOT Operator

The NOT operator excludes members of a set from the set of final results. The shaded area of circle A above is the result of the A NOT B operation on the sets A, B. The NOT operator can exclude records you did not mean to exclude if you do not anticipate all possible contexts in which the term after NOT may appear. Use this operator with caution. If you were looking for articles on the Roman god Mars you might search "Mars NOT planet" to filter out all of the astronomy items. One problem would be that mythology articles which just mentioned the connection between the planet and the Roman god would also be excluded.

Operators That Connect and Separate Words and Parts of Words

In addition to Boolean Logic, many search engines accept commands to search for groups of words and for parts of words.

You may wish to bind words together in a phrase to search for "easy street" rather than searching for easy AND street. This search would not retrieve an item that was worded "it is easy to find this street." For the purpose of Boolean Logic it is useful to be able to use phrases (and other ways of tying bits of text together) in the same way individual words can be used. Several databases assume that any words typed in the search blank are a phrase. The database help button will almost always tell you if that is or is not the case.

You may also wish to allow for variations in word order, such as "Bush, George" in addition to "George Bush." You may wish to allow words to intervene between target words so that your search for "George Bush" will find "George Herbert Walker Bush". This is especially useful when you are searching through the full text of an article or other long body of text. Databases such as Lexis-Nexis have proximity operators that let the searcher specify that an article will be retrieved in the first search word is within, say, 10 words of the second search word in any order. The format would be "George w/10 Bush". Searching "George pre/10 Bush" would only retrieve when George was the first word.

You may wish to search for either dog or dogs or both with one word. Truncation one way to do that. In some databases, the * may stand for any number of characters following in the same word. So, dog* would retrieve both dog and dogs. It would also retrieve Dogbert. Sometimes a ? will substitute as a wild card for any one character, but not more than one. Some systems automatically search for plurals without any special input.

Syntax, Grammar, and Punctuation for Database Queries

 

There is no single convention used by all databases to interpret words typed as input to one or more blanks for input. Following is a table of some examples on input and its interpretation by the database search engine.

Examples of the ways databases interpret queries:

Database

Mode

Search Words

Procedure

Where Searched

Periodical Abstracts

Simple -- subject

 dog cat

Logical AND

Descriptors, Title, Abstract

Periodical Abstracts

Simple -- subject

dog or cat

Logical OR

Descriptors, Title, Abstract

Expanded Academic ASAP

Keyword

dog cat

Proximity  -- dog near to cat within 1 or 2 words in any order

Title, Citation, Abstract – option to include full text

Expanded Academic ASAP

Keyword

dog AND cat

Logical AND

Title, Citation, Abstract – option to include full text

Google

Default

dog cat

Logical AND plus proprietary operations

Not given

Google

Default

+where +cat

+ includes "noise" words that would be dicarded by default without this operator. Then Logical AND and other proprietary procession done

Not given

Periodical Abstracts

Simple  -- subject

dog+

Search plural forms but not other truncated words

Descriptors, Title, Abstract

Expanded Academic ASAP

Keyword

dog*

Truncation –  any number of characters  -- would retrieve dogs – also would retrieve dogma

Title, Citation, Abstract – option to include full text

Expanded Academic ASAP

Keyword

dog!

Truncation –  1 or 0 characters  -- would retrieve dogs but not dogma

Title, Citation, Abstract – option to include full text

Google

Default

“dog cat”

Proximity – dog followed immediately by cat – an exact phrase

Not given

Periodical Abstracts

Simple -- subject

 Dog w1 cat

Proximity – dog followed immediately by cat – an exact phrase

Descriptors, Title, Abstract

 

 

 

 

 

Limiters

 

Many databases have search engines which have specific switches or controls which may be set to further limit the results retrieved. If the searcher could apply Boolean commands to all fields in a database, such limits could be included in a properly constructed Boolean search. Instead, limiters provide a simpler way to ask for search results that make the search still more specific. Common limiters are date (usually publication date), format (only articles having full text) (only research articles), language, sound file, etc.. Limiters also often specify certain parts of the database, ie. search only a specific magazine title, only government domain on the Web, etc..

Updated  9/18/03

Updated by Library Services Updated 9-18-03
If you have problems or questions regarding these pages please contact the Library Webmaster