CS245A  Summer 1994                                              Handout #9

Assignment 4
due in class on Thursday July 21st.

PROBLEM 1

Do Problem 8.10 in the textbook.
Assume that the number of hash bits, "b", is 3.

PROBLEM 2

For the same keys, hash function and assumptions
as PROBLEM 8.10 in the textbook, show the *linear*
hash structure for this file.
Assume that "b" is 3 and that "m" is 5.

PROBLEM 3

Show equivalent relational algebra expressions to the ones given
below, where the selects are done as early as possible.

  (a)  SELECT(P and Q and M)[ R NATJOIN S ]

  (b)  SELECT(P or Q)[ R NATJOIN S ]

where P is a predicate that only involves R attributes,
Q is a predicate that only involves S attributes,
and M is a predicate that involves both R and S attributes.

NOTE: To keep things in ASCII, I am not using special symbols.
SELECT(P) is the same as sigma subscript P.
R NATJOIN S is the natural join of relations R and S.

PROBLEM 4

Suppose you have 2 relations, R(A, B, C) and S(B, C, D, E).

You have a clustered unique (no duplicate keys) B+-tree index on
attribute A for relation R.
Assume this index is kept entirely in memory (i.e., you do not need
to read it from disk).

For relation S, you have two indices:
* a non-clustered non-unique B+-tree index for attribute B
* a clustered non-unique B+-tree index for attribute C

Furthermore, assume that these two indices are kept in memory (i.e. you
do not need to read them from disk). Also, assume that all of the tuples
of S that agree on attribute C are stored in sequentially adjacent blocks
on disk (that is, if more than one block is needed to store all of the tuples
with some value of C, then these blocks will be sequentially located on
the disk).

Other relevant data:
* 1000 tuples of R are stored per block on disk.
* n_R= 1,000,000 (number of tuples of R)

* 100 tuples of S are stored per block on disk.
* n_S= 200,000 (number of tuples of S)
* V(B,S)=40,000 (image size of attribute B in S)
* V(C,S)=400 (image size of attribute C in S)

You want to execute the following query:

SELECT *
FROM R, S
WHERE (R.B=S.B) AND (R.C=S.C)

We present you with two query plans:

Plan 1:

For every block B of R, retrieved using the clustered index on A for R
    For every tuple r of B
         Use the index on B for S to retrieve all
             of the tuples s of S such that s.B=r.B
             For each of these tuples s, if s.C=r.C, output r.A, r.B, r.C,
                                                            s.B, s.C, s.D, s.E


Plan 2:

For every block B of R, retrieved using the clustered index on A for R
    For every tuple r of B
         Use the index on C for S to retrieve all
             of the tuples s of S such that s.C=r.C
             For each of these tuples s, if s.B=r.B, output r.A, r.B, r.C,
                                                            s.B, s.C, s.D, s.E


You should analyze each of these plans carefully in terms of
their behavior regarding accesses to disk, and explain
which of the plans is therefore better. Be sure to include in your
analysis what accesses to disk are sequential accesses and which ones 
are random accesses.

END OF HOMEWORK4