Wednesday, February 11, 2009

Object Oriented Database Management Systems- Object Oriented Programming

January-2004 [6]
7. Write short notes on the following:
a) Virtual functions [6]

Virtual functions are functions declared in the base class and overloaded in a derived class. They provide another form of polymorphism in C++, in which a tree structure of parent classes and their subclasses is defined; each subclass within this tree can receive one or more messages with the same name. The selection of which member functions to invoke among a group of overloaded virtual functions is dynamic.
In object-oriented programming, a virtual function or virtual method is one whose behavior can be overridden within an inheriting class by a function with the same signature. This concept is a very important part of the polymorphism portion of object-oriented programming (OOP).
The concept of the virtual function solves the following problem:
In OOP when a derived class inherits from a base class, an object of the derived class may be referred to (or cast) as either being the base class type or the derived class type. If there are base class functions overridden by the derived class, a problem then arises when a derived object has been cast as the base class type. When a derived object is referred to as being of the base's type, the desired function call behavior is ambiguous.
The distinction between virtual and not virtual resolves this ambiguity. If the function in question is designated "virtual" in the base class then the derived class's function would be called (if it exists). If it is not virtual, the base class's function would be called.
Virtual functions overcome the problems with the type-field solution by allowing the programmer to declare functions in a base class that can be redefined in each derived class.
For example, a base class Animal could have a virtual function eat. Subclass Fish would implement eat() differently than subclass Wolf, but you can invoke eat() on any class instance referred to as Animal, and get the eat() behavior of the specific subclass.
This allows a programmer to process a list of objects of class Animal, telling each in turn to eat (by calling eat()), with no knowledge of what kind of animal may be in the list. You also do not need to have knowledge of how each animal eats, or what the complete set of possible animal types might be.
The following is an example in C++:
#include
using namespace std;

class Animal
{
public:
virtual void eat() { cout << "I eat like a generic Animal." << endl; }
};

class Wolf : public Animal
{
public:
void eat() { cout << "I eat like a wolf!" << endl; }
};

class Fish : public Animal
{
public:
void eat() { cout << "I eat like a fish!" << endl; }
};

class GoldFish : public Fish
{
public:
void eat() { cout << "I eat like a goldfish!" << endl; }
};


class OtherAnimal : public Animal
{
};

int main()
{
Animal* anAnimal[5];

anAnimal[0] = new Animal();
anAnimal[1] = new Wolf();
anAnimal[2] = new Fish();
anAnimal[3] = new GoldFish();
anAnimal[4] = new OtherAnimal();

for (int i = 0; i < 5; i++) {
anAnimal[i]->eat();
delete anAnimal[i];
}

return 0;
}
Output with the virtual method Animal::eat():
I eat like a generic Animal.
I eat like a wolf!
I eat like a fish!
I eat like a goldfish!
I eat like a generic Animal.
Output if Animal::eat() were not declared as virtual:
I eat like a generic Animal.
I eat like a generic Animal.
I eat like a generic Animal.
I eat like a generic Animal.
I eat like a generic Animal.
[edit] Java
In Java, all methods are by default "virtual functions". The following is an example in Java:
public class Animal {
public void eat() { System.out.println("I eat like a generic Animal."); }

public static void main(String[] args) {
Animal[] anAnimal = new Animal[4];

anAnimal[0] = new Animal();
anAnimal[1] = new Wolf();
anAnimal[2] = new Fish();
anAnimal[3] = new OtherAnimal();

for (int i = 0; i < 4; i++) {
anAnimal[i].eat();
}
}
}

public class Wolf extends Animal {
public void eat() { System.out.println("I eat like a wolf!"); }
}

public class Fish extends Animal {
public void eat() { System.out.println("I eat like a fish!"); }
}

public class OtherAnimal extends Animal {}
Output:
I eat like a generic Animal.
I eat like a wolf!
I eat like a fish!
I eat like a generic Animal.


January-2005 [20]
2.
c) Implement any four methods declared above in C++ or Java language. [6]
d) Identify object-oriented features used in the above code. [4]
6.
c) If an object is created without any reference to it, how can it be deleted? [4]
7.
c) What is JDBC? How does this works? Explain in brief. [6]

The JDBC API defines a set of Java interfaces that encapsulate major database functionality, such as running queries, processing results, and determining configuration information. Because JDBC applications are written in Java, applications work on any platform. How Does JDBC Work?
Simply, JDBC makes it possible to do the following things within a Java application:
• Establish a connection with a data source
• Send queries and update statements to the data source
• Process the results
The following figure shows the components of the JDBC model.

The Java application calls JDBC classes and interfaces to submit SQL statements and retrieve results.
The JDBC API is implemented through the JDBC driver. The JDBC Driver is a set of classes that implement the JDBC interfaces to process JDBC calls and return result sets to a Java application. The database (or data store) stores the data retrieved by the application using the JDBC Driver.
The main objects of the JDBC API include:
• A DataSource object is used to establish connections. Although the Driver Manager can also be used to establish a connection, connecting through a DataSource object is the preferred method.
• A Connection object controls the connection to the database. An application can alter the behavior of a connection by invoking the methods associated with this object. An application uses the connection object to create statements.
• Statement, PreparedStatement, and CallableStatement objects are used for executing SQL statements. A PreparedStatement object is used when an application plans to reuse a statement multiple times. The application prepares the SQL it plans to use. Once prepared, the application can specify values for parameters in the prepared SQL statement. The statement can be executed multiple times with different parameter values specified for each execution. A CallableStatement is used to call stored procedures that return values. The CallableStatement has methods for retrieving the return values of the stored procedure.
• A ResultSet object contains the results of a query. A ResultSet is returned to an application when a SQL query is executed by a statement object. The ResultSet object provides methods for iterating through the results of the query.
Why Do We Need JDBC?
Why can't application developers use ODBC (Open Database Connectivity) on the Java platform? After all, it's an established standard API for database access. You can use ODBC; however, ODBC isn't appropriate for direct use from the Java programming language because it uses a C interface. The JDBC API was modeled after ODBC, but, because JDBC is a Java API, it offers a natural Java interface for working with SQL. JDBC is needed to provide a "pure Java" solution for application development.
Types of JDBC Drivers
Today, there are four types of JDBC drivers in use:
• Type 1: JDBC-ODBC bridge
• Type 2: partial Java driver
• Type 3: pure Java driver for database middleware
• Type 4: pure Java driver for direct-to-database
For most applications, the best choice is a pure Java driver, either Type 3 or Type 4. Type 4 drivers (such as DataDirect Connect for JDBC drivers) are the most common and are designed for a particular vendor's database. In contrast, a Type 3 driver is a single JDBC driver used to access a middleware server, which, in turn, makes the relevant calls to the database. A good example of Type 3 JDBC driver is the DataDirect SequeLink for JDBC driver. Type 1 drivers are used for testing JDBC applications against an ODBC data source. Type 2 drivers require a native database API to be used. Both Type 1 and Type 2 mix a Java-based API with another API.
The following figure shows a side-by-side comparison of the implementation of each JDBC driver type. All four implementations show a Java application or applet using the JDBC API to communicate through the JDBC Driver Manager with a specific JDBC driver.

Click Graphic for a full-sized image

July-2005 [4]
1.
c) Under what conditions two objects of the same type are called deep equal? Also describe how is this equality different from shallow equal? [4]

Objects can be copied using the operation `copy' from class ANY. Thisform of copy copies all fields from one object onto another. Both the sourceand target objects must be non void. The copy operation can be redefinedin subclasses to conform to local behavior. There is also a `frozen' versionthat cannot be redefined called `standard_copy'.
A copy can occur in two forms: shallow or deep copy. A shallow copywill copy the objects fields including the current contents of those fields.If the content is a reference to another object, no attempt is made torecursively copy the objects attached to the reference.
A deep copy, on the other, hand will recursively copy the entire objectstructure beginning with the source object. This is performed using theoperation `deep_copy'.
Objects can also be `cloned'. i.e., A clone will return a new objectthat is field by field equivalent to another. For example, the operation:
x := clone (y)
will attach a new object to x that is a clone of y. X, in this case,does not need to be attached to an object before the clone, as in a copyoperation. There is also a deep_clone operation that will recursively clonethe entire object structure of y.
Clone is generally defined in terms of copy and so its definition isfrozen and cannot be redefined.
The field-by-field equality of two objects attached to entities x andy can be determined by the expression:
equal (x, y)
The standard version of this expression (as defined in ANY) has a numberof rules that determine the result [1]:
1. If both x and y are Void then the result is true.
If one is Void and the other is not then the result is false.
2. If both x and y are bit sequences then equality is
determined by bit-by-bit equality. Cases 3 to 6 assume x and
y are not bit sequences.
3. If y is not the same type as x or a descendant type then
the result is false. Cases 4 to 6 assume y conforms to x.
4. If y is a simple type (character, integer, boolean real,
double or pointer) then the result is true if both objects
have the same value. After possible coercion of the heavier type.
5. If x and y are special (strings or arrays) then the value
is true if the sequences of values have the same length and each
field is (recursively) identical.
6. If x and y have the same fields and each field is identical.
i.e., every corresponding reference field of x and y points to
the same object and every object field of y is recursively equal
to the corresponding field of x.
There is also a deep_equal that will recursively test the equality ofeach reference field of x and y in case 6.
January-2006 [18]
2.
b) Explain single and multiple inheritances and how Java supports them. Illustrate with suitable examples. [6]
1.Inheritance is oops feature by which we can use the methods and functions of base class
2.Single Inheritance the type of inheritance where there is only one deriived class for base class
example:
public class derived imlements baseclass
3.multiple inheritance is the type of inheritance where there is one derived class for two base class.it is done using the help of interface and abstract class
example:
public class derived extends abstract implements inter
3.c) Explain abstraction and encapsulation concepts in object-oriented technology with a suitable example. Can abstraction and encapsulation be achieved in C programming language? If yes; then illustrate with an example in C otherwise explain. [8]
Encapsulation
Encapsulation is binding data and member functions together.
that is nothing but a Class.
for eg:
class A
{
int a;
public:
void display()
{
a=10;
cout<}
}
inheritance can be implemented by making use of following syntax
class derived_classname:class Base_classname Access specifier
{

..............................................
;;;;;;;;;;;;;;;;;;;;;;;;;;

}
Leaving objects members to public access can bring serious data integrity problem like changing the stored size of an array without adjusting the array size. One way to protect data and methods is to declare them as private. Private members can only be accessed by their class or subclasses (see Inheritance). So it is wise to declare private all members that should not be accessible by objects users.
Private specification can be applied to the object data and methods as well as to class data and methods. Sometimes, for simplicity or efficiency, it is also useful to have direct access to object members like for the complex number object:
#define OBJECT complex
BASEOBJECT_INTERFACE
double public(real);
double public(imag);
BASEOBJECT_METHODS
/* no virtual function */
ENDOF_INTERFACE
Public and private members and methods can be intermixed without any problem. To reach a public member of an object, you do exactly as for structures:
t_complex *const j = complex.alloc();
j->m.imag = -1;


Encapsulation
Encapsulation is the ability to package data with functions into classes. This concept should actually come as very familiar to any C programmer because it’s quite often used even in the traditional C. For example, in the Standard C runtime library, the family of functions that includes fopen(), fclose(), fread(), and fwrite() operates on objects of type FILE. The FILE structure is thus encapsulated because client programmers have no need to ac-cess the internal attributes of the FILE struct and instead the whole interface to files con-sists only of the aforementioned functions. You can think of the FILE structure and the as-sociated C-functions that operate on it as the FILE class. The following bullet items sum-marize how the C runtime library implements the FILE "class":
• Attributes of the class are defined with a C struct (the FILE struct).
• Methods of the class are defined as C functions. Each function takes a pointer to the attribute structure (FILE *) as an argument. Class methods typically follow a com-mon naming convention (e.g., all FILE class methods start with prefix f).
• Special methods initialize and clean up the attribute structure (fopen() and fclose()). These methods play the roles of class constructor and destructor, respec-tively.

This is exactly how QF/C implements classes. For instance, the following snippet of QF/C code declares the QActive (active object) "class". Please note that all class methods start with the class prefix ("QActive" in this case) and all take a pointer to the attribute structure as the first argument "me":
typedef struct QActiveTag QActive; /* Active Object base class */
struct QActiveTag {
QHsm super_; /* protected member super (inheritance from class QHsm) */
. . .
uint8_t prio__; /* private priority of the active object */
};
/* public methods */
int QActive_start(QActive *me, uint8_t prio,
QEvent *qSto[], uint16_t qLen,
void *stkSto, uint32_t stkSize,
QEvent const *ie);
void QActive_postFIFO(QActive *me, QEvent const *e);
void QActive_postLIFO(QActive *me, QEvent const *e);
/* protected methods ...*/
void QActive_ctor_(QActive *me, QPseudoState initial);
void QActive_xtor_(QActive *me);
void QActive_stop_(QActive *me); /* stopps thread; nothing happens after */
void QActive_subscribe_(QActive const *me, QSignal sig);

7. b) If an object is created without any reference to it how can it be deleted? [4]

Garbage collection is different for various programming environment.

If you are using Java, you could invoke System.gc(), as quoted from the Java Documentation:

Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects.

The call System.gc() is effectively equivalent to the call:
Runtime.getRuntime().gc()

Class System
public static void gc()


July-2006 [4]
1.
d) What is the virtual member function? How much does it cost to call a virtual function compared to calling a normal function? Can destructor be virtual? What is the purpose of a virtual destructor? [4]

A virtual function allows derived classes to replace the implementation provided by the base class. The compiler makes sure the replacement is always called whenever the object in question is actually of the derived class, even if the object is accessed by a base pointer rather than a derived pointer. This allows algorithms in the base class to be replaced in the derived class, even if users don't know about the derived class.
The derived class can either fully replace ("override") the base class member function, or the derived class can partially replace ("augment") the base class member function. The latter is accomplished by having the derived class member function call the base class member function, if desired
What's the difference between how virtual and non-virtual member functions are called?
Non-virtual member functions are resolved statically. That is, the member function is selected statically (at compile-time) based on the type of the pointer (or reference) to the object.
In contrast, virtual member functions are resolved dynamically (at run-time). That is, the member function is selected dynamically (at run-time) based on the type of the object, not the type of the pointer/reference to that object. This is called "dynamic binding." Most compilers use some variant of the following technique: if the object has one or more virtual functions, the compiler puts a hidden pointer in the object called a "virtual-pointer" or "v-pointer." This v-pointer points to a global table called the "virtual-table" or "v-table."
The compiler creates a v-table for each class that has at least one virtual function. For example, if class Circle has virtual functions for draw() and move() and resize(), there would be exactly one v-table associated with class Circle, even if there were a gazillion Circle objects, and the v-pointer of each of those Circle objects would point to the Circle v-table. The v-table itself has pointers to each of the virtual functions in the class. For example, the Circle v-table would have three pointers: a pointer to Circle::draw(), a pointer to Circle::move(), and a pointer to Circle::resize().
During a dispatch of a virtual function, the run-time system follows the object's v-pointer to the class's v-table, then follows the appropriate slot in the v-table to the method code.
The space-cost overhead of the above technique is nominal: an extra pointer per object (but only for objects that will need to do dynamic binding), plus an extra pointer per method (but only for virtual methods). The time-cost overhead is also fairly nominal: compared to a normal function call, a virtual function call requires two extra fetches (one to get the value of the v-pointer, a second to get the address of the method). None of this runtime activity happens with non-virtual functions, since the compiler resolves non-virtual functions exclusively at compile-time based on the type of the pointer.
Note: the above discussion is simplified considerably, since it doesn't account for extra structural things like multiple inheritance, virtual inheritance, RTTI, etc., nor does it account for space/speed issues such as page faults, calling a function via a pointer-to-function, etc. If you want to know about those other things, please ask comp.lang.c++; PLEASE DO NOT SEND E-MAIL TO ME!
[ Top | Bottom | Previous section | Next section | Search the FAQ ]

[20.4] What happens in the hardware when I call a virtual function? How many layers of indirection are there? How much overhead is there?
This is a drill-down of the previous FAQ. The answer is entirely compiler-dependent, so your mileage may vary, but most C++ compilers use a scheme similar to the one presented here.
Let's work an example. Suppose class Base has 5 virtual functions: virt0() through virt4().
// Your original C++ source code

class Base {
public:
virtual arbitrary_return_type virt0(...arbitrary params...);
virtual arbitrary_return_type virt1(...arbitrary params...);
virtual arbitrary_return_type virt2(...arbitrary params...);
virtual arbitrary_return_type virt3(...arbitrary params...);
virtual arbitrary_return_type virt4(...arbitrary params...);
...
};
Step #1: the compiler builds a static table containing 5 function-pointers, burying that table into static memory somewhere. Many (not all) compilers define this table while compiling the .cpp that defines Base's first non-inline virtual function. We call that table the v-table; let's pretend its technical name is Base::__vtable. If a function pointer fits into one machine word on the target hardware platform, Base::__vtable will end up consuming 5 hidden words of memory. Not 5 per instance, not 5 per function; just 5. It might look something like the following pseudo-code:
// Pseudo-code (not C++, not C) for a static table defined within file Base.cpp

// Pretend FunctionPtr is a generic pointer to a generic member function
// (Remember: this is pseudo-code, not C++ code)
FunctionPtr Base::__vtable[5] = {
&Base::virt0, &Base::virt1, &Base::virt2, &Base::virt3, &Base::virt4
};
Step #2: the compiler adds a hidden pointer (typically also a machine-word) to each object of class Base. This is called the v-pointer. Think of this hidden pointer as a hidden data member, as if the compiler rewrites your class to something like this:
// Your original C++ source code

class Base {
public:
...
FunctionPtr* __vptr; ← supplied by the compiler, hidden from the programmer
...
};
Step #3: the compiler initializes this->__vptr within each constructor. The idea is to cause each object's v-pointer to point at its class's v-table, as if it adds the following instruction in each constructor's init-list:
Base::Base(...arbitrary params...)
: __vptr(&Base::__vtable[0]) ← supplied by the compiler, hidden from the programmer
...
{
...
}
Now let's work out a derived class. Suppose your C++ code defines class Der that inherits from class Base. The compiler repeats steps #1 and #3 (but not #2). In step #1, the compiler creates a hidden v-table, keeping the same function-pointers as in Base::__vtable but replacing those slots that correspond to overrides. For instance, if Der overrides virt0() through virt2() and inherits the others as-is, Der's v-table might look something like this (pretend Der doesn't add any new virtuals):
// Pseudo-code (not C++, not C) for a static table defined within file Der.cpp

// Pretend FunctionPtr is a generic pointer to a generic member function
// (Remember: this is pseudo-code, not C++ code)
FunctionPtr Der::__vtable[5] = {
&Der::virt0, &Der::virt1, &Der::virt2, &Base::virt3, &Base::virt4
}; ^^^^----------^^^^---inherited as-is
In step #3, the compiler adds a similar pointer-assignment at the beginning of each of Der's constructors. The idea is to change each Der object's v-pointer so it points at its class's v-table. (This is not a second v-pointer; it's the same v-pointer that was defined in the base class, Base; remember, the compiler does not repeat step #2 in class Der.)
Finally, let's see how the compiler implements a call to a virtual function. Your code might look like this:
// Your original C++ code

void mycode(Base* p)
{
p->virt3();
}
The compiler has no idea whether this is going to call Base::virt3() or Der::virt3() or perhaps the virt3() method of another derived class that doesn't even exist yet. It only knows for sure that you are calling virt3() which happens to be the function in slot #3 of the v-table. It rewrites that call into something like this:
// Pseudo-code that the compiler generates from your C++

void mycode(Base* p)
{
p->__vptr[3](p);
}
On typical hardware, the machine-code is two 'load's plus a call:
1. The first load gets the v-pointer, storing it into a register, say r1.
2. The second load gets the word at r1 + 3*4 (pretend function-pointers are 4-bytes long, so r1+12 is the pointer to the right class's virt3() function). Pretend it puts that word into register r2 (or r1 for that matter).
3. The third instruction calls the code at location r2.
Conclusions:
• Objects of classes with virtual functions have only a small space-overhead compared to those that don't have virtual functions.
• Calling a virtual function is fast — almost as fast as calling a non-virtual function.
• You don't get any additional per-call overhead no matter how deep the inheritance gets. You could have 10 levels of inheritance, but there is no "chaining" — it's always the same — fetch, fetch, call.
Caveat: I've intentionally ignored multiple inheritance, virtual inheritance and RTTI. Depending on the compiler, these can make things a little more complicated. If you want to know about these things, DO NOT EMAIL ME, but instead ask comp.lang.c++.
Caveat: Everything in this FAQ is compiler-dependent. Your mileage may vary
What's the difference between how virtual and non-virtual member functions are called?
Non-virtual member functions are resolved statically. That is, the member function is selected statically (at compile-time) based on the type of the pointer (or reference) to the object.
In contrast, virtual member functions are resolved dynamically (at run-time). That is, the member function is selected dynamically (at run-time) based on the type of the object, not the type of the pointer/reference to that object. This is called "dynamic binding." Most compilers use some variant of the following technique: if the object has one or more virtual functions, the compiler puts a hidden pointer in the object called a "virtual-pointer" or "v-pointer." This v-pointer points to a global table called the "virtual-table" or "v-table."
The compiler creates a v-table for each class that has at least one virtual function. For example, if class Circle has virtual functions for draw() and move() and resize(), there would be exactly one v-table associated with class Circle, even if there were a gazillion Circle objects, and the v-pointer of each of those Circle objects would point to the Circle v-table. The v-table itself has pointers to each of the virtual functions in the class. For example, the Circle v-table would have three pointers: a pointer to Circle::draw(), a pointer to Circle::move(), and a pointer to Circle::resize().
During a dispatch of a virtual function, the run-time system follows the object's v-pointer to the class's v-table, then follows the appropriate slot in the v-table to the method code.
The space-cost overhead of the above technique is nominal: an extra pointer per object (but only for objects that will need to do dynamic binding), plus an extra pointer per method (but only for virtual methods). The time-cost overhead is also fairly nominal: compared to a normal function call, a virtual function call requires two extra fetches (one to get the value of the v-pointer, a second to get the address of the method). None of this runtime activity happens with non-virtual functions, since the compiler resolves non-virtual functions exclusively at compile-time based on the type of the pointer.


[20.4] What happens in the hardware when I call a virtual function? How many layers of indirection are there? How much overhead is there?

This is a drill-down of the previous FAQ. The answer is entirely compiler-dependent, so your mileage may vary, but most C++ compilers use a scheme similar to the one presented here.
Let's work an example. Suppose class Base has 5 virtual functions: virt0() through virt4().
// Your original C++ source code

class Base {
public:
virtual arbitrary_return_type virt0(...arbitrary params...);
virtual arbitrary_return_type virt1(...arbitrary params...);
virtual arbitrary_return_type virt2(...arbitrary params...);
virtual arbitrary_return_type virt3(...arbitrary params...);
virtual arbitrary_return_type virt4(...arbitrary params...);
...
};
Step #1: the compiler builds a static table containing 5 function-pointers, burying that table into static memory somewhere. Many (not all) compilers define this table while compiling the .cpp that defines Base's first non-inline virtual function. We call that table the v-table; let's pretend its technical name is Base::__vtable. If a function pointer fits into one machine word on the target hardware platform, Base::__vtable will end up consuming 5 hidden words of memory. Not 5 per instance, not 5 per function; just 5. It might look something like the following pseudo-code:
// Pseudo-code (not C++, not C) for a static table defined within file Base.cpp

// Pretend FunctionPtr is a generic pointer to a generic member function
// (Remember: this is pseudo-code, not C++ code)
FunctionPtr Base::__vtable[5] = {
&Base::virt0, &Base::virt1, &Base::virt2, &Base::virt3, &Base::virt4
};
Step #2: the compiler adds a hidden pointer (typically also a machine-word) to each object of class Base. This is called the v-pointer. Think of this hidden pointer as a hidden data member, as if the compiler rewrites your class to something like this:
// Your original C++ source code

class Base {
public:
...
FunctionPtr* __vptr; ← supplied by the compiler, hidden from the programmer
...
};
Step #3: the compiler initializes this->__vptr within each constructor. The idea is to cause each object's v-pointer to point at its class's v-table, as if it adds the following instruction in each constructor's init-list:
Base::Base(...arbitrary params...)
: __vptr(&Base::__vtable[0]) ← supplied by the compiler, hidden from the programmer
...
{
...
}
Now let's work out a derived class. Suppose your C++ code defines class Der that inherits from class Base. The compiler repeats steps #1 and #3 (but not #2). In step #1, the compiler creates a hidden v-table, keeping the same function-pointers as in Base::__vtable but replacing those slots that correspond to overrides. For instance, if Der overrides virt0() through virt2() and inherits the others as-is, Der's v-table might look something like this (pretend Der doesn't add any new virtuals):
// Pseudo-code (not C++, not C) for a static table defined within file Der.cpp

// Pretend FunctionPtr is a generic pointer to a generic member function
// (Remember: this is pseudo-code, not C++ code)
FunctionPtr Der::__vtable[5] = {
&Der::virt0, &Der::virt1, &Der::virt2, &Base::virt3, &Base::virt4
}; ^^^^----------^^^^---inherited as-is
In step #3, the compiler adds a similar pointer-assignment at the beginning of each of Der's constructors. The idea is to change each Der object's v-pointer so it points at its class's v-table. (This is not a second v-pointer; it's the same v-pointer that was defined in the base class, Base; remember, the compiler does not repeat step #2 in class Der.)
Finally, let's see how the compiler implements a call to a virtual function. Your code might look like this:
// Your original C++ code

void mycode(Base* p)
{
p->virt3();
}
The compiler has no idea whether this is going to call Base::virt3() or Der::virt3() or perhaps the virt3() method of another derived class that doesn't even exist yet. It only knows for sure that you are calling virt3() which happens to be the function in slot #3 of the v-table. It rewrites that call into something like this:
// Pseudo-code that the compiler generates from your C++

void mycode(Base* p)
{
p->__vptr[3](p);
}
On typical hardware, the machine-code is two 'load's plus a call:
1. The first load gets the v-pointer, storing it into a register, say r1.
2. The second load gets the word at r1 + 3*4 (pretend function-pointers are 4-bytes long, so r1+12 is the pointer to the right class's virt3() function). Pretend it puts that word into register r2 (or r1 for that matter).
3. The third instruction calls the code at location r2.
Conclusions:
• Objects of classes with virtual functions have only a small space-overhead compared to those that don't have virtual functions.
• Calling a virtual function is fast — almost as fast as calling a non-virtual function.
• You don't get any additional per-call overhead no matter how deep the inheritance gets. You could have 10 levels of inheritance, but there is no "chaining" — it's always the same — fetch, fetch, call.
Caveat: I've intentionally ignored multiple inheritance, virtual inheritance and RTTI. Depending on the compiler, these can make things a little more complicated. If you want to know about these things, DO NOT EMAIL ME, but instead ask comp.lang.c++.
Caveat: Everything in this FAQ is compiler-dependent. Your mileage may vary

January-2007 [4]
1.
b) Under what conditions two objects of the same type are called deep equal? Describe how this equality is different from shallow equal. [4]

July-2007 [19]
1.
b) How would you delete an object that is created without any references to it? [4]
2.
a) What do you understand by pointer swizzling? Describe the various approaches to pointer swizzling. [9]

pointer swizzling is the conversion of references based on name or position to direct pointer references. It is typically performed during the deserialization (loading) of a relocatable object from disk, such as an executable file or pointer-based data structure. The reverse operation, replacing pointers with position-independent symbols or positions, is sometimes referred to as unswizzling, and is performed during serialization (saving).
For example, suppose we have the following linked list data structure:
record node {
int data
node *next
}
We can easily create a linked list data structure in memory using such an object, but when we attempt to save it to disk we run into trouble. Directly saving the pointer values won't work on most architectures, because the next time we load it the memory positions the nodes now use may be in use by other data. One way of dealing with this is to assign a unique id number to each node and then unswizzle the pointers by turning them into a field indicating the id number of the next node:
record node_saved {
int data
int id_number
int id_number_of_next_node
}
We can save these records to disk in any order, and no information will be lost. Alternatives include saving the file offset of the next node or a number indicating its position in the sequence of saved records.
When we go to load these nodes, however, we quickly discover that attempting to find a node based on its number is cumbersome and inefficient. We'd like our original data structure back so we can simply follow next pointers to traverse the list. To do this, we perform pointer swizzling, finding the address of each node and turning the id_number_of_next_node fields back into direct pointers to the right node.

3.a) Explain as to how a persistent pointer is implemented. Contrast this implementation with that of pointers as they exist in general-purpose languages such as C or Pascal. [6]
O++ [4, 5, 9] is a database programming language based on C++ [11, 18]. Amongst other things, O++
provides facilities for making C++ objects persistent. Objects of any class can be allocated on the stack, on
the heap, or in persistent store. Objects allocated in persistent store are called persistent objects, and
pointers to such objects are called persistent pointers. From the O++ user’s point of view, the semantics of
persistent pointers are identical to those of volatile pointers. In particular, inheritance related mechanisms,
including virtual base classes and virtual functions, should behave the same way for both kinds of pointers.
C++ objects of types that have ‘‘virtual’’ functions and ‘‘virtual’’ base classes contain volatile
(‘‘memory’’) pointers. We call such pointers ‘‘hidden pointers’ because they were not specified by the
user. In the case of virtual functions, the hidden pointer points to a virtual function table that is used to
determine which function is to be called. In the case of virtual base classes, the hidden pointers are used for
sharing base classes.
O++ objects are C++ objects. The O++ compiler ofront generates C++ as its output, and relies on the
C++ compiler for implementing C++ semantics for objects. In attempting to preserve C++ semantics, we
encountered problems because the hidden pointers inside an object are only valid for the duration of the
program that created the object. These pointers become invalid across transactions (or program
invocations). C++ implementations were not designed to work with persistent objects. As a result, the
hidden pointers must be fixed to have correct values so that C++ semantics are maintained for persistent
objects.
In this paper, we describe the hidden pointers problem in detail and show how it can be solved. Our
solution is novel and elegant in that it does not require modifying the C++ compiler or the semantics of
C++. In addition to this solution, we outline two alternatives that we considered, and explain why they
were not adopted for the O++ compiler.
We also discuss another problem related to making C++ objects persistent. C++ allows base class pointers
to point to a derived class objects. Similarly, in O++ a persistent pointer to a base class may actually point
to a persistent object of a derived class. When reading the referenced object into memory, the persistent
base class pointer must be adjusted to point to the correct offset within the derived class object. Therefore,
when an O++ object is written to disk, some type information must be stored inside the object indicating
the object type. This information may be used at a later time to read the object properly and to set the base
class pointer to an appropriate location in the object. In addition, when the object is read from disk, the
hidden pointers must be initialized to their appropriate values.

Page 2
- 2 -
C++ has emerged as the de facto standard language for software development, and database systems based
on C++ have attracted much attention [2, 3, 5, 10, 13, 16]. We hope that the details and techniques
presented will be useful to database researchers and to implementors of object-oriented database systems
based on C++. We expect the reader to be familiar with C++ [18].
The paper is organized as follows. Section 2 illustrates the main memory layout of objects used by C++. A
reader familiar with the C++ implementation may skip this section. Section 3 describes the problems
associated with making C++ objects persistent, and sections 4 and 5 present solutions to these problems.
Section 6 surveys and compares related work, and section 7 presents our conclusions.
2. VIRTUAL FUNCTIONS AND VIRTUAL BASE CLASSES
To illustrate some key points in the representation of C++ objects, we use four classes whose inheritance
relationships are shown pictorially in Figure 1. Class studEmp is derived from both student and
employee each of which are in turn derived from person:
person
student
employee
studEmp
Figure 1. A class hierarchy.
Class person is said to be the base class of the classes student and employee, which are called
derived classes. Classes student and employee, as well as person, are the base classes of class
studEmp.
These types are defined via the C++ type definition facility called the class. Class declarations consist of
two parts: a specification and a body. The class specification includes the data members (attributes) and
member functions (methods) of the class. The class body consists of bodies of functions declared in the
class specification but whose bodies were not given there.
2.1 Virtual Functions
Virtual functions are the mechanism used by C++ to support a special kind of polymorphism using ‘‘late’’
binding. C++ allows a base class pointer (or reference) to refer to a derived class object. For example, a
person pointer (reference) can refer to a student object. When a virtual function is invoked using this
pointer (reference), the specific virtual function that is called depends on the type of the referenced object.
It is the task of the C++ compiler to generate code that will invoke the appropriate function.
Assume that classes person and student are defined as shown below.

Page 3
- 3 -
class person {
public:
char first[MAX];
char last[MAX];
int age;
virtual void print();
};
class student: virtual public person {
public:
char university[MAX];
virtual void print();
};
Class person defines print as a virtual function. Class student is derived from (based on) class
person. The derived class student may define its own version of print, with a different body
(implementation). For example, the body of each print function may have been defined as follows.
void person::print()
{
cout << first << " " << last << ", age = " << age << endl;
}
void student::print()
{
person::print();
cout << "student at " << university << endl;
}
When invoking print through a pointer (or reference) to person, the actual virtual function to be
invoked is determined at run time according to the actual type of the referenced object.
As an example, consider the following code:
1 main()
2 {
3
person *pp = new person;
4
student *ps = new student;
5
...
6
pp->print();
7
ps->print();
8
...
9
pp = ps;
10
pp->print();
11
...
12 }

The first pp->print function call (line 6) invokes the function person::print because pp points to a
person object. Similarly, the ps->print function call (line 7) invokes the function
student::print. But the second pp->print function call (line 10) invokes the function
student::print even though the type of pp is a pointer to person. This is because pp was assigned
a pointer to a student object (line 9).
Class student declares person to be its virtual base class. The keyword virtual is used to ensure that
only one copy of the virtual base class appears in an instance of the derived class. The virtual base class is
shared by all the components of the inheritance hierarchy that specify this class as a virtual base class.
Declaring a base class as virtual has no effect with single inheritance but it makes a difference in case of
multiple inheritance as we shall see later.

Page 4
- 4 -
Figures 2 and 3 illustrate the memory representation of objects of type person and student.
&person::print
0
age
last
first
vtbl
pointer
age
last
first
vtbl
pointer
person
person’s vtbl:
Figure 2. Memory layout of a person object
pp
ps
delta(student, person)
&student::print
&student::print
0
:
person’s vtbl
student’s vtbl :
vtbl
pointer
vtbl
pointer
vbase
pointer
first
last
age
student
person
university
Figure 3. Memory layout of a student object
Each object of a class that has virtual functions contains a hidden pointer that points to a virtual function
table, called the vtbl. The vtbl contains addresses of virtual functions. It also contains offsets (deltas), that
are used to find the address of a derived class object given the address of a base class sub-object. Returning
to our example, after the assignment
pp = ps;
in line 9, ps points to a student object, while pp points to the person sub-object within the student
object. This is illustrated in Figure 3. Consider the call
pp->print();
in line 10. The call requires an indirection via the vtbl pointer of the person sub-object. resulting in the
application of the function student::print. However, student::print expects to get the address
of a student object as its argument. This address is calculated by subtracting from pp the value of
delta(student, person), stored in the vtbl.
Note that had print not been declared as a virtual function in class person, then C++ would not have
generated the hidden vtbl pointer, and calls to the function print would not have required any indirection
in the translated code.
Because person is declared as a virtual base class, references to the person component of a student
object require an indirection through a pointer, called the vbase pointer. In this example the indirection
may seem unnecessary, but we shall shortly see that it is required to implement sharing of the virtual base
class in objects of types specified using multiple inheritance.

Page 5
- 5 -
2.2 Virtual Base Classes and Multiple Inheritance
When using multiple inheritance, a base class can occur multiple times in the derivation hierarchy. By
default, C++ generates multiple copies of such a base class. If only one copy of the base class is to be
generated, that is, the base class is to be shared as in other object-oriented languages (e.g., [8, 12, 14] ), then
the base class must be declared to be a virtual base class.
In the following specification class employee is derived from class person, and class studEmp is
derived from both employee and student.
class employee: virtual public person {
public:
char company[MAX];
int sal;
virtual void print();
};
class studEmp: public employee, public student {
public:
int maxhours;
virtual void print();
};
_ Because class person is a virtual base class of both the employee and student classes, every
studEmp object must contain one instance of class person instead of two. Both the employee and
student sub-objects share this instance of person.
Consider the following code:
main()
{
studEmp *se;
int a, b;
...
se->student::age = a;
...
se->employee::age = b;
...
}
_ Because se->student and se->employee share the same person object, se->student::age
and se->employee::age both refer to the same component, i.e., se->person::age.
As before, classes employee and studEmp may define their own implementation of the print
function, as shown below.
void employee::print()
{
person::print();
cout << "employed at " << company << endl;
}
void studEmp::print()
{
person::print();
cout << "student at " << university << endl;
cout << "employed at " << company << endl;
}

Page 6
- 6 -
Figures 4 and 5 illustrate the memory representation of objects of type employee and studEmp.
delta(employee, person)
sal
employee’s vtbl
employee
:
person’s vtbl
:
vtbl
pointer
vbase
pointer
person
company
age
last
first
vtbl
pointer
&employee::print
0
&employee::print
Figure 4. Memory layout of an employee object
age
last
first
maxhours
university
vtbl
pointer
&studEmp::print
&studEmp::print
&studEmp::print
:
:
:
0
company
student’s vtbl
employee & studEmp’s vtbl
person’s vtbl
employee
student
studEmp
person
sal
delta(studEmp, student)
delta(studEmp, person)
vtbl
pointer
vbase
pointer
pointer
vbase
pointer
vtbl
Figure 5. Memory layout of a studEmp object
An optimization utilized in most C++ implementations is to share the virtual table of a derived class object
with its first non-virtual base class sub-object, since both objects have the same address. This is why in
Figure 5, there are 3 virtual tables instead of 4 — studEmp and employee share the same virtual table.
3. PERSISTENCE AND THE HIDDEN POINTERS PROBLEM
The database programming language O++ [5], which is an upward compatible extension of C++, models its
persistent store on the heap. An object allocated on the persistent store becomes persistent. Each persistent
object is uniquely identified by its object id (oid). A pointer to a persistent object is called a persistent
pointer, for short. Similarly, a pointer to a volatile object is called a volatile pointer and it contains the
memory address of the referenced object. Persistent objects are allocated in O++ by using operator pnew
as opposed to using the operator new that allocates volatile objects on the heap. Here is some code
showing the allocation of a persistent employee object:
persistent employee *pe;
...
pe = pnew employee;
The type qualifier persistent designates pointers to persistent objects. Persistent pointers are used and
manipulated much like ordinary pointers.
O++ extends the language constructs provided by C++ so that associative queries over collections of
objects can be expressed. For example, here is a code fragment that retrieves high salaried employees
(making more than 100K) from the databases and invokes the print function on each such employee:
for (pe in employee) suchthat(pe->sal > 100000) {
pe->print();
}
3.1 Implementation of Persistence in O++
O++ programs are translated into C++, compiled and linked together with the Ode object manager.
O++
O++
compiler
ofront
C++
C++
compiler
object
code
Linker
executable
code
Ode Object
Manager Library
Figure 6. Compilation of An O++ Program
The Ode object manager is a software layer built on top of the EOS storage system [6]. EOS manipulates
objects as uninterpreted sequence of bytes stored on disk pages, the unit of I/O. The format of each such
object consists of an object header — a tag, followed by the actual length of the object — and then the
object itself.
1
The Ode object manager extends the object header to include a pointer to the object’s type
descriptor; besides that, the on-disk representation of the object is identical to the one used in-memory.
Type descriptors — objects that describe types of objects in the Ode database — are held in the Ode
catalog. Catalog information is important to various internal modules of the database system. For
example, the query optimizer would access the catalog to check what indexes exist for some collection of
objects in order to decide how to execute a selection on it. Here, however, we are only concerned with the
components of the catalog that are needed so that an object can correctly be fetched from or placed back on
disk. Each entry in the catalog describes a single type. Since every persistent object (including type
descriptor objects) has a pointer to its type attached to the object, the Ode object manager can access
information about the object’s type. In particular, the solutions described in subsequent sections require the
object manager to invoke functions specific to each type. These functions are generated by the O++
compiler, which also loads the function addresses into the catalog entry before the main program is
executed.
__________________
1. If the object is large, i.e., it cannot fit in a single page, then the object is stored in as many pages as necessary to hold the entire
object, and a directory to these pages is stored right after the object header [7].


- 8 -
When a persistent pointer is dereferenced, the entire page the referenced object resides on is brought from
disk to memory. Once the object is in memory, its starting address is computed and used to reference the
object. Thus, the result of the dereference is that we have a C++ object in memory.
3.2 The Hidden Pointers Problem
Virtual functions and virtual base classes have an impact on persistence because of the ‘‘hidden’’ vtbl and
vbase pointers (indicated by the shaded areas in the figures shown earlier) generated by C++ compilers to
implement these facilities. Virtual function invocations involve an indirection that uses the vtbl pointer to
access the entries in the virtual function table. And references to the components of virtual base classes
must follow the vbase pointer. We call the vtbl and vbase pointers ‘‘hidden’’ pointers because they
represent implementation related information, and are invisible to the user. Most C++ programmers are not
even aware of their existence.
Unfortunately, hidden pointers are volatile pointers, i.e., they are not valid beyond the lifetime of the
program that created them. Saving objects containing hidden pointers on disk and then reading these
objects back from disk in another program means that the hidden pointer values in the objects read from
disk are invalid. The same observation holds for the values of data members that are volatile pointers. In
the case of pointer members, it is the programmer’s responsibility to ensure that the pointers are not used
with invalid values. However, in case of hidden pointers it is the responsibility of the system providing
persistence, the database programming language O++ in our case, to ensure that the objects read from disk
do not contain invalid values prior to their use in the program. Otherwise, a reference to a virtual function
or a component of a virtual base class will lead to an illegal memory reference.
4. THE O++ SOLUTION
We now discuss how the O++ implementation handles the hidden pointer problem. We then describe
pointer adjustment to allow pointers to persistent objects to also point to persistent objects of derived
classes (in accordance with C++ semantics).
As mentioned, the O++ compiler ofront generates C++ as its output. The O++ compiler does not have
direct access to the hidden pointers. We did not want to modify the C++ compiler to fix the hidden
pointers, because this modification would make O++ non-portable. It would require modification of the
local C++ compiler, which could affect other C++ programs. We decided to use C++ facilities to place
valid values in the hidden pointers.
Our solution is based on the fact that each class constructor, as translated by the C++ compiler, contains
code to properly initialize the hidden pointers. This code is executed prior to executing the constructor
body, written by the user.
4.1 Using Constructors
The basic scheme is as follows.
1. Read the object from disk. As a result of this request, the page the object resides on is fetched from
disk into the buffer pool. Thus, the requested object is now in main memory but it contains bad
hidden pointers.
2. Apply a constructor to the object read from disk, to fix the hidden pointers. The constructor must not
change the data members of the object.
This solution uses the fact that for every constructor, the C++ compiler adds code to properly initialize the
hidden pointers in an object of that type. There are two obstacles to implementing the above scheme. First,
C++ does not allow a constructor to be invoked in conjunction with an existing object (as are member
functions). However, we can call the constructor indirectly by defining an overloaded version of the global
operator new function. When an object of class C is created by calling new C, C++ does two things:
(a) calls the function operator new to allocate storage for the object.
2
(b) applies an appropriate
constructor (as determined from the arguments to the constructor supplied with the invocation of new) to
initialize the hidden pointers and components of the object (the latter is as specified by the user).
We do not want to allocate storage for the object. We simply want to make new perform the constructor
application. Consequently, we overload the new operator by defining a new version of operator new.
We pass to this function the address of the location where we have stored the object read from disk. The
function simply returns this address as its result (no storage is allocated).
3
Here is the definition of the
overloaded operator new:
4
class _ode { };
void* operator new(size_t, _ode *p)
{
return (void *) p;
}

Class _ode is a unique type defined to ensure that the overloaded definition of new is invoked. Suppose
for example that p points to an employee object that has been read from disk. Then the overloaded
definition of operator new is invoked as
new ((_ode *) p) employee;
This invocation of operator new invokes the argumentless constructor for class employee.
The second obstacle in using this scheme is that we cannot simply invoke a constructor defined by the user
to correctly initialize the hidden pointers in the object read from disk because the constructor may modify
the values of data members of the object (and even update other objects as well). We need to invoke a
constructor that will not modify any data items. That is, it should have a null body.
We first thought of generating for every class a constructor with a null body and a single parameter of type
_ode *. For example, for class employee, this empty constructor would be
employee::employee(_ode *) {}
This constructor would be invoked if we called operator new as illustrated below:
new ((_ode *) p) employee((_ode *) 0);
Unless otherwise specified, a constructor for a class D will invoke the argumentless constructor for each of
its base class sub-objects and for every data member of D that is a class object. However, the special
constructor must
a. invoke the special constructor for each base class sub-object and for every data member of D that is a
class object;
b. initialize all constant and reference members of D in its initializer list.
We abandoned the special constructor solution when we realized that we could not extend it to handle the
case when D had an array of class objects as a data member. In such a case, C++ requires use of the
argumentless constructor.
Next we came up with the idea of modifying each user specified constructor so that it would do nothing
(execute no statements) when it is called to initialize the hidden pointers. The value of an integer global
__________________
2. If C has an overloaded C::operator new, then it is called, otherwise the global ::operator new is used.
3. The idea of using an overloaded operator new to invoke a constructor on an existing object was suggested in a different con-
text in [17].
4. C++ requires the first parameter of an overloaded definition of function operator new to be of type size_t and that new
return a value of type void *.

Page 10
- 10 -
variable _fix_hidden
short _fix_hidden;
is used to determine whether or not the constructor was being invoked to fix hidden pointers.
Assume that class D defines a constructor of the form
D::D(parameter-declarations
opt
)
{
...
}
The subscript opt indicates an optional item.
This constructor is transformed as follows:
D::D(parameter-declarations
opt
)
{
if (!_fix_hidden) {
...
}
}
This transformation has to be refined to ensure that any initializers present in a constructor definition do not
modify any data members. Initializers are given just before the constructor body:
D::D(parameter-declarations
opt
) initializer-list
{
...
}
Initializers are used to initialize the base class components and the data members of the object. In some
cases, initializers are required. For example, if the base class component or a data member can only be
initialized by invoking a constructor with arguments, or if the data member is a constant or reference
member, then appropriate initializers must be specified.
Initializers that are constructor calls do not have to be modified, because the constructors will have been
modified to execute conditionally based on the value of the global variable _fix_hidden.
Other initializers, those that specify an initial value for a data member, are modified to change the value of
the data member only if the constructor is being called to initialize a newly created object. They have no
affect if the constructor is invoked to fix the hidden pointers for an object that has been read from disk. For
example, an initializer of the form
m(initial-value)
where m is a data member, is transformed to the initializer
m(_fix_hidden ? m : initial-value)
When _fix_hidden is one, the initializer effectively assigns the member to itself; thus such an initializer
does not change the value of the data member.

Page 11
- 11 -
As an example, a constructor for class employee may be defined as follows:
employee::employee() : sal(30000)
{
strcpy(company, "None");
}

This constructor is transformed into
employee::employee() : sal(_fix_hidden ? sal : 30000)
{
if (!_fix_hidden) {
strcpy(company, "None");
}
}

This initialization of hidden pointers is encapsulated in a member function, reinit, that is generated for
each class. For example, here is the body of the reinit function for class D:
5
extern short _fix_hidden;
static void D::reinit(void* p)
{
_fix_hidden = 1;
new ((_ode *)p) D;
_fix_hidden = 0;
}
Function reinit sets the global variable _fix_hidden to 1 before invoking the overloaded version of
the new operator (that does not allocate any storage). Any constructors that are invoked as a result will find
_fix_hidden to be one, and will not execute any user specified code in the constructor body. The effect
of this invocation is simply that the hidden pointers are assigned the right values. Function reinit sets
the global variable _fix_hidden to 0 before returning.
As an example, we give below the code generated by the O++ compiler for class employee:
class employee: virtual public person {
public:
char company[MAX];
int sal;
virtual void print();
void reinit(void *);
};
extern short _fix_hidden;
void employee::reinit(void* p)
{
_fix_hidden = 1;
new ((_ode *)p) employee;
_fix_hidden = 0;
}
5. reinit is declared a static member function since it not invoked in association with a particular object.

Page 12
- 12 -
4.2 Allowing Base Class Pointers to Point to Derived Class Persistent Objects
In C++, a pointer to an object of class B can point to an object of a class D that is derived from B.
Similarly, in O++ a pointer to a persistent object of class B can point to a persistent object of type D. When
such a pointer is dereferenced, the object manager brings the object into memory and fixes its hidden
pointers, as described above. It then returns a pointer to the D object in memory. To conform to C++
semantics, the memory pointer returned must be adjusted properly. In our example, the pointer should be
adjusted to point to address of the B base class object within the D object.
This adjustment is performed as follows. The object manager consults the catalog entry for the object’s
type. In our example, this type is D. The entry contains a list of base classes for this class, and the correct
adjustment for each one. These values are filled in by the O++ compiler when it analyzes the definition of
this class. The code generated by the O++ compiler for the dereference informs the object manager of the
(declared) type of the persistent pointer being dereferenced, in our example B. The object manager thus
finds the required offset from a D object to it’s B sub-object, delta(D, B), and adjusts the returned pointer
accordingly.
5. ALTERNATIVE TECHNIQUES
We have also considered two other alternative solutions to the problems addressed in the paper. We outline
these solutions and explain why we did not adopt them for the O++ compiler ofront.
The following definitions are exported by the object manager:
typedef unsigned int uint;
class OID {
...
};
5.1 Using Special Member Functions
For each class, the O++ compiler synthesizes two functions, readObj and writeObj, used to read and
write objects of that class. For example, the prototype of these functions for class employee is:
void employee::readObj(OID& oid, uint& doff);
void employee::writeObj(OID& oid, uint& doff);
Function readObj is used to read an object from disk. The object might be contained in a larger object.
The object id locates the containing object on disk, and doff indicates the offset of the object from the
beginning of the containing object. Similarly, function writeObj is used for writing a value to disk.
In addition, global functions ::readObj and ::writeObj are used to read and write values of built-in
types, such as integers or character strings:
void ::readObj(OID& oid, uint& doff, void *memp, uint cnt);
void ::writeObj(OID& oid, uint& doff, void *memp, uint cnt);
memp specifies the location in memory where the value is to be stored, and cnt specifies the size of the
value.
To read an object from disk the following steps are performed:
1. Allocate an object by calling operator new. The hidden pointers are thus set correctly.
2. Read the value of the object from disk by invoking the readObj function defined for that object’s
type. This function does not perform a byte copy of the data from disk. Instead, it reads each data
member of the object using the readObj function defined for the data member’s type (and the
global ::readObj function for simple types).
The object manager does not know about object types — it views objects conceptually as uninterpreted
bytes. Therefore, it cannot allocate an object by calling operator new directly. We allow the object

Page 13
- 13 -
manager to invoke operator new indirectly by encapsulating object allocation in the member function
newObj which is generated by O++ and whose address is stored in the catalog. The address of newObj is
loaded for every class in an application program as part of the program initialization process. Given a
persistent object, the object manager can find the address of newObj for this object’s class by following
the pointer from the object to the catalog entry describing its type.
We illustrate this mechanism by showing the translated version of class employee, which includes the
generated functions readObj, writeObj, and newObj as its members:
class employee: virtual public person {
public:
char company[MAX];
int sal;
virtual void print();
void readObj(OID& oid, uint& doff);
void writeObj(OID& oid, uint& doff);
static void *newObj();
};
The bodies of the readobj, writeobj and newObj functions of class employee are as follows:
6
...
void employee::readObj(OID& oid, uint& doff)
{
// read base classes
person::readObj(oid, doff);
// read members
::readObj(oid, doff, objp->company, MAX * sizeof(char));
::readObj(oid, doff, objp->sal, sizeof(int));
}
void employee::writeObj(OID& oid, register uint& doff)
{
// write base classes
person::writeObj(oid, doff);
// write members
::writeObj(oid, doff, company, MAX * sizeof(char));
::writeObj(oid, doff, sal, sizeof(int));
}
void *employee::newObj()
{
return (void *)new employee;
}
This solution assumes that employee has an argumentless constructor. C++ generates an argumentless
constructor automatically if no constructor has been specified for the class. But it does not generate this
__________________
6. It might appear that function newObj could be declared inline, but then its address could not be taken and stored in the catalog.
Similarly, it would not suffice to store the address of a the operator new in the catalog, since the application of this operator indi-
rectly through a pointer does not result in the invocation of a constructor.

Page 14
- 14 -
argumentless constructor if a constructor has been specified for the class. ofront therefore generates an
argumentless constructor for the class if the user has explicitly specified one or more constructors but has
not specified an argumentless constructor.
This solution has the following disadvantages:
1. Three functions must be synthesized for every class.
2. An object is read component-wise from the disk (and written in the same way). Each component
requires another function call.
5.2 Using The Assignment Operator
The basic scheme is as follows:
1. Allocate space for the object and read the object from disk. The object contains bad hidden pointers.
2. Allocate another object. This object contains correct hidden pointers.
3. Assign the object read from disk to the new object.
The default assignment performs member-wise assignment of the components of the source object to the
destination object. In particular, the hidden pointers are not copied from the source object to the destination
object.
We do not discuss this solution in detail. We rejected this solution for the following reasons:
1. Storage has to be allocated twice for every object (optimization can be used to reduce this number).
2. An assignment is required to fix the hidden pointers.
3. This solution assumes that the assignment operator performs member-wise assignment. These are
the semantics of the default assignment operator generated by C++. However, users are allowed to
define their own version of the assignment operator. This may invalidate our solution, if the
explicitly defined assignment operator does not perform member-wise assignment, or has side-
effects.
6. RELATED WORK
The hidden pointers problem was also identified in Vbase [1] and E [16]. The approach taken in Vbase was
to make the vtbls persistent objects. The E compiler efront replaces the virtual base class pointer by an
offset, an implementation that will probably be used in future C++ compilers as well. It also generates a
unique type tag for every class that can have persistent instances (a ‘‘dbclass’’) having virtual functions.
Every instance of such a class contains this tag. The E implementation performs run-time virtual function
dispatch by hashing on the type tag. In contrast, when an object is brought into memory O++ uses the
pointer to the type descriptor of the object to convert invalid C++ hidden pointers to valid ones.
The solutions presented in this paper have the advantage that they do not require modification of the C++
implementation. In addition, virtual function invocation is as fast as in C++ — it requires a single pointer
dereference. The Vbase scheme requires a persistent pointer dereference, while the E scheme involves a
hashing operation. The disk representation of O++ objects is the same as the memory layout of
corresponding C++ objects, except that at the beginning of each persistent O++ object there is a pointer to
the object’s type descriptor (another persistent object). The identical on-disk and in-memory object layout
allows for code compatibility of O++ and C++.
Systems that implement pointer swizzling [13, 15, 19] encounter problems similar to those that arise from
hidden pointers: different formats must be used to refer to objects in memory and on disk. To provide fast
access to the memory version of the object, the object id contained in a persistent object is replaced by the
memory pointer (the object id is ‘‘swizzled’’). We must similarly replace the hidden pointers by valid
values. However, a key difference between pointer swizzling and fixing the hidden pointers is that the O++
compiler does not know the locations of the hidden pointers. It is the C++ compiler that generates the
hidden pointers and their locations within an object depends solely upon the code generated by the C++
compiler.

Page 15
- 15 -
In ObjectStore [13], the object manager knows the location of the hidden pointers in an object of a given
type.
7
In addition, the object manager has a table that maps type names into vtbl addresses; the table is
created during schema generation time. As a special case of the pointer swizzling mechanism, vtbl pointers
in persistent objects that are brought into memory are assigned the address of the vtbl of their respective
types, in a recursive fashion. vbase pointers are handled by the usual pointer swizzling mechanism, since a
vbase pointer contained in a persistent object points to another persistent object, the virtual base class sub-
object. In comparison, we note that the O++ solution does not require knowledge of the location of the
hidden pointer or the vtbls, and is thus more portable. In addition, the complexity of recursively fixing the
hidden pointers in base class and member sub-objects is simply transferred to the C++ compiler, by calling
a constructor.
The problem of adjusting a base class pointer to a derived class object also arose in the implementation of
collections in E [16]. Collections are implemented using EXODUS files. A collection whose objects of
some type can contain derived type objects. Therefore, an EXODUS iterator over a collection must return
an adjusted pointer. This special case was handled by storing the appropriate offset in the object header.
7. CONCLUSION
Hidden pointers are memory pointers that are generated by C++ compilers in objects whose classes contain
virtual functions or virtual base classes. These pointer values are not valid across database applications
(transactions). Consequently, the hidden pointers in any object retrieved from the database must be
‘‘fixed’’ before the object can be accessed. We have presented solutions to correctly reinitialize the hidden
pointers. Our solutions are elegant in that they do not require modifying the C++ compiler or the semantics
of C++.
Object-oriented database systems based on C++ are attracting attention because of the emergence of C++ as
the language of choice for software development. We hope that the details and techniques presented by
will be useful to database researchers and implementors.