[Back]

Pedagogical Framework Documentation:
How to Document Object-Oriented Frameworks
Summary of an Empirical Study

Ian Chai
University of Illinois at Urbana-Champaign, Universiti Telekom
1

Submitted to World Engineering Congress '99

Abstract

Frameworks are a key to reuse in the Object-Oriented programming world, but they are hard to learn and document. Different people have proposed different ways to document frameworks, but it is unclear which ones actually are better. I did an empirical study to see which documentation philosophy is best for documentation for new users of a framework. I found that different documentation styles are better for different situations, depending on what your goals are, and discovered guidelines that can help someone who is trying to document a framework.

Keywords
Patterns, Minimalist, Documentation, Object-Oriented, Frameworks, Empirical

Introduction & Motivation

Although object-oriented programming is often used to increase code reuse, reusable object-oriented software is usually based on or is part of a framework. Frameworks are reusable designs expressed in code. This allows the reuse of both design and code. The right framework can decrease the time to develop a piece of software by an order of magnitude. However, frameworks have a number of problems. Both designing and learning to use a framework is hard. The real benefits to a framework only come after you know how to use it.

This project is about how to make frameworks easier to learn. Even a simple framework is hard to understand if the documentation is bad. One reason documentation is often not good is that there are no well-established rules for documenting frameworks. Instead, there are various competing ideas about the best way to document frameworks. This project examines several of these ideas to determine which method of documenting a framework is better.

Many of the ideas about how to document frameworks sound reasonable. It is hard to figure out which idea is best just by thinking about them. So, I decided to experiment so I could tell the relative advantages and disadvantages of the ideas. It pointed out that different forms of documentation are good for different purposes.

What exactly are frameworks? You can think of frameworks as reusable parts for a particular domain. For example, in the physical world, you can build houses from reusable parts much faster than if you built them from scratch. Similarly, you can build application programs from the objects in a framework. The parts embody decisions the designer made. For example, house kits put limitations on what kind of houses you can build. If the designer embodied a good design in the kit, you can rely on that design in your house. In the same way, if the designer embodied a good design in the framework, you can rely on that design in your program.

Just as in the physical world, having the pieces is not enough. A framework is a reusable design, not just a set of parts. Learning a framework means learning a set of parts, learning the kinds of things you might build with them, learning how they interact, and learning the properties that are most useful for them. If you are familiar with one framework, that does not mean it is trivial for you to learn how to use another.

Three Philosophies of Documentation

The part of the larger general problem of documenting frameworks I selected to tackle is new user documentation or tutorials. I think this is a very important part because once past the new user stage, one often has the familiarity to figure the details out.

I have selected three philosophies of documentation to study: traditional step-by-step instructions, minimalist documentation, and patterns-style documentation. They are directly competing philosophies in that some of their guidelines contradict each other.

Of other suggested documentation methods, CRC cards are not suitable for permanent documentation. Use Case Maps and UML are useful for documenting the structure of a framework, so these diagrams could be used together with the three styles I selected -- hence they form an independent axis from the three styles. Formal descriptions are more suitable for detail work by experts.

(Traditional) Step-by-Step Instructions

The philosophy behind Step-by-step Instructions is that if we give clear steps of what needs to be done, the student can follow them and accomplish the task. Interspersing explanations along the way will let the student learn things that are not obvious from the steps. The documentation should be in one clear, flowing narrative.

This is the traditional form that most documentation takes.

Minimalist Documentation

Minimalist documentation is based on two ideas: (1) people do not want information irrelevant to the task at hand, and (2) people are not good at following step-by-step instructions. They attempt to give the reader the minimal amount of information to get the task done, and arrange it in short pages or index cards of information so that users can read in whatever order suits them, as researched by Dr. John Carroll [Carroll 90].

As each minimalist page or card contains little information, they often refer to other pages or cards. Hence, they lend themselves well to hypertext presentations like the Web.

Carroll [Carroll 90] gives these guidelines for minimalist documentation:

One important thing that Carroll does not list in these guidelines is the overriding sense throughout his book that minimalist documentation is about giving the minimal amount of information necessary for the reader to be able to get the task done. Perhaps he does not list it because it is so implicit in everything minimalist documentation is about.

Patterns

Patterns are a documentation technique that has been used to document good object-oriented design practices (e.g. [GoF 95]), software packages and frameworks (e.g. [Johnson 92]) as well as how to do many other things. Some of the diverse things people have documented using patterns include designing web pages [Orenstein 95], how to teach in a classroom [Anthony 95], how to run an organization [Coplien 97], and how to write patterns themselves [Meszaros-Doble 96].

A definition commonly used at Pattern Languages of Programming (PLoP) conferences for patterns is:

A pattern is the solution to a problem in a context.

One of the objectives of a pattern is to get readers to understand some of the rationale for the solution, so that they can decide when to apply the pattern. In contrast, a traditional, step-by-step tutorial usually gives readers the steps to take without explaining why they should take them.

Patterns often refer to other patterns when a problem or its solution is too big to discuss in one sitting. Hence, they lend themselves well to hypertext presentations such as those found on the Web.

Gerard Meszaros and Jim Doble [Meszaros-Doble 96] say that patterns should have these elements:

For examples of patterns and minimalist documentation, see [PLoP 94], [Carroll 90], and [Chai 99].

Comparing the Three Philosophies

All three philosophies want good writing style, like those proposed by [Williams 90]. Each is compatible with the idea of testing your documentation on end-users and making changes based on their reactions. Each is open to the idea of mixing text, examples, and diagrams.

However, where step-by-step instructions recommend that all the instructions be given in clear chronological steps, minimalist documentation disagrees and says that the documentation should be random-access. It seems reasonable that if the steps are laid out clearly, one should be able to follow them to completion. But the minimalist philosophy disagrees, saying that people are bad at following steps and will tend to skip and miss things.

Step-by-step instructions also recommend that you put all the information readers need to understand in the documentation, perhaps adjacent to the corresponding steps. Minimalist documentation says that you need to leave out all materials not relevant to the task at hand ˝ that the user will figure out the background as they go along and get used to the system ˝ this is the idea behind "reasoning and improvising."

Patterns agrees with minimalist documentation that random access is good, and hence disagrees with step-by-step instruction. It is neutral towards most of the other recommendations of minimalist documentation, except that its emphasis on giving the context of a problem contradicts with the minimalist idea of leaving out materials irrelevant to the task at hand.

In other words, minimalist documentation tries to avoid giving background information, on the theory that the user will figure out relevant background information themselves as they use the system, while patterns says that you need to give the context of the problem so that the reader can tell where this solution fits in.

Each of these philosophies sounds reasonable from their description. So, one might not know which is best. Therefore:

We wanted to discover which documentation philosophy would work better than the others for documenting frameworks for new users. If we know the right way to structure the documentation, we could have more effective documentation.

I did some preliminary research which was too small for statistical significance, but which showed that the research has promise. For a full report of this research, see [Chai 97.1], [Chai 97.2].

The Habanero Project

Habanero is a project of the National Center for Supercomputing Applications (NCSA). It is a framework for collaborative computing written in Java. [NCSA 97] [CGJPS 98] NCSA is probably most famous for having developed the first graphical web browser, Mosaic. [NCSA 93]

The Habanero environment allows users to run hablets, which are collaborative programs. All users logged in to the same session can see what the other users are doing. The Habanero framework allows programmers to turn Java applets into hablets.

In order to convert applets into hablets, programmers need to include the Habanero code and change their code to call it. We assume the programmers have already learned how to write interactive applets or applications in Java.

A typical interactive Java applet has several aspects: the GUI, which is event-driven, and the data, which is passed from place to place and manipulated. So, in order to convert an applet into a hablet, the programmer needs to understand how the events work in that particular applet. Since in Habanero things that happen locally need to be reflected to other users, Habanero needs to capture the events and pass them along. If those events contain data from objects, those objects need to also be serialized and sent on to the other sites.

In a typical basic hablet, the application programmer needs to subclass 2 classes in addition to whatever classes the application itself needs for non-framework related tasks. By comparison, in the popular Hotdraw framework, the typical application needs to subclass 2 classes. The VisualWorks GUI framework has a lot more classes because it is so mature, but even here, the typical application only needs to subclass one class unless they need a special display that one of the existing widgets does not provide.

Habanero can be difficult for a new programmer to use because there are many things that the application needs to do to be a hablet, that may not be obvious. Habanero exhibits the same kind of up-and-down-the-hierarchy flow control typical of frameworks. Also typical of frameworks, hablet programmers need to know what classes to subclass and/or interfaces to implement, and what methods to write.

All this cannot be easily surmised by looking at the source code. Thus documentation can help new programmers figure out what they need to do.

Developing the Materials

I learned Habanero by sitting down with a few of the framework developers one at a time and having them lead me through the process while making notes. They would also give me code they had written to work from.

One of the problems I ran into was that between the time that I started working on the documentation, Java and Habanero both went through a major revision on how events were handled. This is an example of the difficulties one will encounter in real life documentation writing. There were many other problems [Chai 99].

I developed three sets of documentation for the experiments. They all had the same purpose: to teach the reader how to convert a Java applet into a Habanero hablet. They all had the same audience: programmers who had done some Java programming but no Habanero programming. All three kinds of documentation shared as much as possible the same English text. This was to prevent there from being differing quality of writing for the different sets of documentation. The goal was that the only differences should be those that follow from the philosophy, and not the quality of the prose.

Before gathering the actual data, I refined the documentation iteratively by trying it out on a dozen testers, according to the findings from the preliminary experiment. One of my major findings is you cannot write good documentation on the first try. The only way to find out if your documentation works well is to have people use it. These testers were drawn from the same pool of people as the experimental subjects ˝ students and staff who knew the Java programming language.

Iteration primarily improved the prose, and in most cases, those improvements applied equally to all three versions of the documentation. The result of this iteration was that each of the three sets of documentation became very effectiveˇonly two out of the 45 subjects failed to complete the exercise.

As I did the iterative refinement by testing the documentation on preliminary subjects, I discovered a number of common mistakes people made, the solutions of which I put in a section on its own, which proved to be very useful.

Even after the 12 people that I used to refine the documents, I found things that could be improved. However, to prevent contaminating the experimental data, I did not change the documents after the experiment started.

The philosophies behind the three documentation styles were discussed earlier. Here is how they worked out in practice:

The full documentation consists of the Basic Topics and the Advanced Topics, and the experiment only used the Basic Topics because we had to limit the scope of the task in the experiment to something that the subjects could complete in one session. The Basic Topics alone was about half the size of the entire documentation, and described what most hablets would need.

The Experiments

For details of how the experiment was conducted, see [Chai 99]. We analysed the data to see if one of the sets of documentation let the subjects compile (SEMICOMPLETION) and finish the fastest (COMPLETION), as well as understand the most (COMPREHENSION). We also checked for the relevance of the programming experience, as well as the correlation to the self-reported scores of how well they thought the documentation taught them how to habanerize an applet (HABANERIZATION) and how well they thought the documentation taught them about the inner workings of Habanero (WORKINGS).

 

Trad.

Min.

Pat.

Semicompletion

Mean

36.27

28.47

34.93

StdDev

14.69

11.87

11.07

Completion

Mean

68.28

43.57

57.73

StdDev

16.93

13.84

16.93

Comprehension

Mean

3.20

2.27

4.20

StdDev

1.57

1.62

1.42

Habanerization

Mean

4.33

4.53

4.13

StdDev

.617

.516

.834

Workings

Mean

3.80

2.67

3.60

StdDev

.676

1.18

.828

The means and standard deviations

 

Semi

Compl

Compr

Hab

Work

Pat vs Min

0.506

0.236

0.005

0.369

0.019

Pat vs Trd

1.000

0.464

0.049

1.000

1.000

Min vs Trd

0.202

0.011

1.000

1.000

0.008

The statistically significant differences

When looking for the standard significance level of 0.05 (i.e. 95% probability), we see that none of the treatments make a significant difference to the semi-completion time. Therefore, we conclude that is no significant difference between the three styles how long it takes for the subjects to do the first compile.

The subjects using the minimalist documentation finished faster than the ones using the other two documentation styles. When we used the standard significance level of 0.05, we find that subjects in the minimalist group had a significantly lower completion time than those in the traditional group. In our sample, we found that the minimalist group also was on average faster than the patterns group, and the patterns group was on average faster than the traditional group. However, the differences are still not statistically significant.

Subjects who used patterns exhibited significantly higher comprehension scores than those using minimalist or traditional documentation.

There was no significant difference between how well the subjects thought the three kinds of documentation taught them how to habanerize an applet. This might be because the students only used one kind of documentation and the documentation was reasonably effective. Perhaps if they had the opportunity to try out more than one of the documentation styles, they might have rated them differently.

Subjects who used Patterns and Traditional documentation rated them more highly in terms of their effectiveness on teaching them how Habanero works internally than for Minimalist documentation. In this particular sample, the subjects who used Traditional documentation also on average rated them higher than those who used Patterns. However, the difference was not significant.

Minimalist Advantage During Debugging

There was no significant difference in the semi-completion time between the three documentation styles. However, minimalist documentation was significantly faster than traditional step-by-step documentation in completion time. This suggests that the advantage that minimalist had over step-by-step documentation came into play mainly during the debugging phase, because semi-completion time is the time at which the subject did their first compile, before debugging.

Qualitative Results & Discussion

Just as you cannot write software and expect it to work without testing, you cannot write documentation and expect it to be effective without testing. The iterative refinement all three versions of the documentation went through meant that all three versions were highly effective. Early versions of the documentation had high failure rates, but in the final trials, only 2 out of the 45 subjects failed to complete the exercise.

The "common errors" page was very useful for people during the debugging stage -- if they found it. Some people could not find the section at first, and some assumed it would not help. Most people eventually found the section though a few figured out how to solve their problems on their own.

Therefore, it may be useful to develop a "common errors" page during the iterative refinement stage ˝ note them as you do iterative refinement. If these errors can be addressed in the main documentation itself, it is better to put them near the place where they make that error.

Everyone liked the running example in the cyan-coloured boxes. Sometimes, however, this would lead people to skip over the explanatory text and only read the examples.

People do not like to read a lot of material. This problem is the most pronounced in the Traditional step-by-step documentation where everything is in one flowing narrative. They tended to skip around and so sometimes miss important things. Some people even quit reading the Traditional version when they come to the first long explanatory section, thinking that they had finished the "work" section of the documentation.

Tell people things in the same order as they will type them in the resulting source code example, even if it is arbitrary. This avoids needlessly complicating matters.

Make it easy for people to find the "before" and "after" complete source code of the running example. Unfortunately, some people will ignore the documentation once they find the source code example and work directly from that, hence missing any important explanations there might be.

Tell people what they need to do in general, then show them the concrete example. Make sure that what they need to do is clearly visible in "generic" code as well as in the prose before the example. The one case when I failed to do this in all 3 versions of the documentation, was where I told them to put a package declaration at the top of each source file. I showed the example, but nonetheless a majority of people failed to do this task.

There were two other factors that probably contributed to this common error: this is the first time they encounter the running example, and hence did not realise what the cyan box was yet; most novice Java programmers never use package.

Interestingly, several subjects, when they went back to this part of the documentation to fix their error of not including a package declaration, wondered how they managed to miss it, as it was so obvious. Many of them assumed it was their own fault and carelessness, and not the fault of the documentation. This would have been a reasonable conclusion, if one did not know that a lot of other people had the same trouble.

Conclusions

We began this work with the idea of discovering which of three philosophies worked best for new user framework documentation.

Step-by-step documentation is the one traditionally used for software documentation. The philosophy behind it is that if you give people clear, sequential steps, they can follow them and learn how to do the task. Any extra information can be interspersed among the steps at relevant places.

Minimalist documentation says that people are bad at following step-by-step instructions, and instead learn by doing in a self-directed manner. Hence documentation should be minimal. Also, it should be random-access so that the reader can proceed in a self-directed manner, so minimalist documentation promotes using small cards or pages that can be read in the order the reader desires.

Patterns centres upon the idea of giving the solution to a problem in its context. It also has the random-access idea of minimalist documentation, but disagrees with it in that contextual information is not minimal ˝ it usually contains information other than what needs to be done.

At the end of the experiments, we discovered that minimalist documentation helps people get the task done faster, while patterns help people to better understand the internal workings.

Hence if you have a situation where speed is more important than understanding of the internal structure, minimalist documentation would be better. Components may be such an application. If on the other hand understanding how the framework works is more important, then patterns would be better. Most frameworks are not completely black-box, and tend to have areas where some understanding is needed, so this would appear to be the most widely-applicable solution in the world of frameworks.

As for step-by-step documentation, we found that people do not like to follow step-by-step instructions, nor to be forced to read a lot of information. Even though the step-by-step documentation contained as much explanation as the patterns documentation, the subjects using it scored lower in the comprehension test. This suggests that they were skimming too much in their attempt to avoid following and reading everything.

Skippable sections are helpful ˝ users can read them if they are interested, but can safely skip them and still get the job done if they are not interested in deeper understanding. Patterns is organised that way. Try not to surprise people. Surprise is good for a novel, but can mislead people in documentation.

Aside from the documentationÝs organisation, a major part of producing good documentation is to have good text. Just as good software requires testing, we found that good documentation requires testing. A good piece of documentation goes through many iterations of testing and revision.

These experiments answered some questions, but also raised many more. Hence there is a plethora of additional work that can be done in this area. This work is a first investigation into using empirical techniques to discover ways to improve framework. This is a rich field for future research.

For a fuller discussion of these findings, please see [Chai 99].

References

[Anthony 95] Dana Anthony, Patterns for Classroom Education, [PLoP 95] http://st-www.cs.uiuc.edu/~chai/writing/classroom-ed.html

[Carroll 90] John M. Carroll, The Nurnberg Funnel: Designing Minimalist Instruction for Practical Computer Skill, MIT Press, 1990

[CGJPS 98] Anne Chabert, Ed Grossman, Larry Jackson, Stephen Pietrowicz, & Chris Seguin, Java Object Sharing in Habanero, Communications of the ACM, June 1998/Vol. 41, No. 6

[Chai 97.1] Ian Chai, Visualessons: A comparison between traditional paper manual tutorials and web-based minimalist documentation written in pattern style http://st-www.cs.uiuc.edu/~chai/research/esp/

[Chai 97.2] Pedagogical Framework Documentation, Empirical Studies of Student Workshop, Alexandria, Virginia, 24-26 October, 1997. http://st-www.cs.uiuc.edu/users/chai/writing/esp-student.html

[Chai 99] Ian Chai, Pedagogical Framework Documentation: How to document Object-Oriented Frameworks: an Empirical Study, University of Illinois at Urbana-Champaign doctoral dissertation. http://st-www.cs.uiuc.edu/~chai/research/dissertation

[Coplien 97] Jim Coplien, Organizational Patterns http://www.bell-labs.com/cgi-user/OrgPatterns/OrgPatterns?OrganizationalPatterns

[GoF 95] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1994. (Commonly called the "Gang of Four" or "GoF" book.)

[Johnson 92] Ralph Johnson, Documenting Frameworks Using Patterns, Object-Oriented Programming, Systems, Languages and Applications (OOPSLA) Ý92 Proceedings, ACM Press, 1992. ftp://st.cs.uiuc.edu/pub/papers/HotDraw/documenting-frameworks.ps

[Meszaros-Doble 96] Gerard Meszaros and Jim Doble, MetaPatterns: A Pattern Language for Pattern Writing, [PLoP 96] http://www.cs.wustl.edu/~schmidt/PLoP-96/meszaros.ps.gz

[NCSA 93] National Center for Supercomputing Applications, Mosaic Internet information browser and World Wide Web client, http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/

[NCSA 97] National Center for Supercomputing Applications, Habanero collaborative Framework, http://www.ncsa.uiuc.edu/SDG/Software/Habanero/

[Orenstein 95] Robert Orenstein, A HTML 2.0 Pattern Language, [PLoP 95] http://www.anamorph.com/docs/patterns/default.html

[PLoP 94] Jim Coplien & Doug Schmidt, editors, Pattern Languages of Program Design, (Patterns from Pattern Languages of Programming Ý94) Addison-Wesley Press, 1995.

[PLoP 95] John Vlissides, Jim Coplien, Norm Kerth, editors, Pattern Languages of Program Design 2, (Patterns from Pattern Languages of Programming Ý95) Addison-Wesley Press, 1995.

[PLoP 96] The 3rd Pattern Languages of Programming Conference, Washington University Technical Report #wucs-97-07, http://www.cs.wustl.edu/~schmidt/PLoP-96

[Williams 90] Joseph M. Williams, Style: towards clarity and grace, University of Chicago Press, 1990


Footnotes

1This research was done a the University of Illinois at Urbana-Champaign under the direction of Dr. Ralph Johnson of the Department of Computer Science. I am now a lecturer at the Universiti Multimedia Telekom.

[Back]
If you spot any mistakes, please let me know.