Hodson R 1999 Analyzing documentary accounts Sage Publicatio

Selecting and Coding Documents [Scroll through the text til page 8]

This section describes strategies for compiling a comprehensive list of documentary accounts on a topic. Computer-aided searches are a central part of this strategy. We also discuss sampling and the conditions under which sampling is appropriate. The central focus of the chapter is on the development of coding schemes and protocols for coding the data. We describe procedures for field-testing coding instruments and for training coders and supervisors. Inference and missing data are important concerns in the coding of documentary data. Strategies for avoiding or minimizing these problems are outlined. The use of inference in assigning codes needs to be minimized. Missing data need to be coded as such, thus retaining maximum flexibility for the treatment of missing data at the analysis stage. Selecting Cases Having identified a research area in which there are available documentary accounts to analyze, how is one to identify and select the specific cases to analyze? The answer to this question is crucial to the project. The researcher will spend a lot of time with the accounts selected. It is important to make certain that their analysis will meet the research goals. The theoretical goals of the study play a leading role in determining the criteria for selecting cases. The substantive knowledge of the researcher contributes to case selection through an awareness of categories of documents that do and do not fit the criteria for selection (Stryker, 1996). The researcher must develop explicit selection rules to distinguish between cases that are within the population of interest and those that are not. An important early choice is whether to include books, articles, or both. Book-length documentary accounts contain a wealth of information. These accounts easily allow the coding of up to a hundred or more variables, thus allowing a wide range of subsequent analyses. Coding a reasonably large set of book-length accounts, however, is no small project. Each can take 40 or more hours of work. An initial estimate of the number of books will thus provide an estimate of the number of weeks of full-time effort required to code the data. Additional time is required for developing the instrument, keypunching the data, analysis, and so on. Articles take less time to code but provide more limited information. If the researcher's interests in the topic are highly focused, articles may be a good choice, but the resulting data set will have a narrower range of utility. Combining books and articles is also possible. Initially, it is probably better to code one or the other and then consider whether to extend the project to include both. Page 2 of 15 Explicit criteria should be developed for accepting or rejecting a document. For instance, length of time spent in field observation is an important criterion. Professional ethnographers usually consider 6 months to be the minimum time needed to get sufficiently “behind the scenes” to record the true underlying nature of a setting. It is also important to clearly specify the substantive domain of the study. If a researcher is studying deviance, are both adult and adolescent deviance of interest? Should studies of courts and detention facilities be included as well as studies of primary deviance and deviant careers? It is important early in the process to examine a sufficient number of cases to make informed decisions about selection criteria. Make these decisions explicit and stick with them. These decisions are crucial to defining the nature, purpose, and outcome of your study. Population Size and Sampling How many documentary accounts are necessary to allow quantitative analysis of the resulting data? No precise answer to this question is possible. Several guidelines are possible, however. For multivariate analysis in the social sciences, at least 100 cases are generally required. Another rule of thumb is 15 cases per explanatory variable. Thus, if the intended analysis specifies 6 explanatory variables, a sample size of 100 should be sufficient. Additional controls require additional cases. Univariate and bivariate analyses can be done with as few as 40 to 50 cases. Generating the required number of cases may require that the population definition be expanded to include unpublished articles or dissertations or that the time frame covered be extended backward. If the researcher is in the fortunate position of having too many cases to code given available resources, sampling techniques are appropriate. It is essential that the selection be done according to random sampling principles so that the resulting sample is statistically representative of the larger population (Kalton, 1983). Random sampling is essential so that the results from the sample can be generalized to the larger population of documents. For a detailed example of the application of sampling procedures to the selection of documentary accounts, see Gamson (1975). Search Strategies Major sources of information useful for locating documentary accounts include the following: Computerized library and journal archives Bound journal volumes Library shelves in the area of volumes already located Bibliographies of the accounts already located Page 3 of 15 Compilations and archives that have already been developed on the topic of interest Thorough knowledge of a substantive area will generally include awareness of existing sources and archives. If you do not feel deeply knowledgeable in an area, you should consult with senior scholars and library archivists in the area. This process may uncover sources that you would have never considered. Even well-informed researchers should consider checking with other knowledgeable sources as a first step in the research process. Electronic databases of books and articles have become widely available in recent years. They also have improved in quality, comprehensiveness, and accessibility. In addition, they are rapidly being extended backwards in time to include sources from earlier decades. These databases include the Educational Resources Information Center (ERIC), Dissertation Abstracts International, and the Social Science Citation Index (SSCI) as well as online files of books and documents held by libraries. The SSCI is one of the best, perhaps because it is one of the most recent to become available online and therefore uses the latest and most up-to-date software and data archives. Reference librarians often have valuable inside knowledge about additional specialized citation source files. You should crosscheck the key citation sources used to make sure that they are not leaving out important citations. It is essential to use multiple search strategies to ensure full coverage. A partial list of documentary accounts based on an incomplete electronic database that leaves out many accounts is not a good starting point for a lengthy research project. The target is to generate a complete list of all the accounts that meet the stated criteria. Cross-checking multiple sources is an important safeguard toward achieving this goal. Another useful strategy for identifying documentary accounts that are published as journal articles may seem somewhat dated, but it is still highly effective. Go to the main journals that publish the accounts and look through the bound volumes. This will uncover a wealth of accounts that might otherwise be invisible because of vague titles or simply because they are outside the population included in an online archive. Social science journals that routinely publish documentary accounts include the Journal of Contemporary Ethnography, Comparative Studies in Society and History, Social Science History, Past and Present, Human Organization, Journal of Management Studies, Social Problems, Studies in Symbolic Interaction, a n d Qualitative Sociology. In addition, each topic area also may have one or more journals that regularly publish documentary accounts. Academic journals are increasingly becoming available online. Electronic searches of the main journals producing documentary accounts are thus increasingly possible. Edited books on special topics are another rich source of published accounts. Page 4 of 15 Searching the library shelves in the immediate vicinity of volumes already located is also an effective strategy. Many accounts can be uncovered in this way that you might otherwise never find. The reason that this strategy is an important supplement to electronic searches is that key word searching is a very inexact process. Searching the shelves is also an absolute necessity if the target population includes books that are published prior to the most recent decades. (Electronic databases typically include only the most recent decades.) Even the process of searching the nearby library shelves, however, can be electronically assisted. A first pass at this project generally can be done at the computer—most library search routines have an option for viewing the titles of adjacent volumes on the shelf through scrolling backward and forward. At some point in the process, however, there is no substitute for hands-on visual scanning of a shelf of books or an edited volume of articles. As the search proceeds, additional tools become available. The bibliographies of the works already identified provide an important source of additional citations to examine. Examining bibliographies helps the search reach backward in time. To search for current releases, publishers' lists can be an important tool. The publication of studies on particular topics may be a special focus for a publisher. Thus, it is often possible to identify the main publishing house (or houses) for accounts on the topic of interest. Examining publishers' lists of current and new releases can uncover accounts that are too new to be electronically listed or that are not yet released. The process of developing the initial list should thus be an iterative one based on using several different methods. As the list develops, the researcher engages in an initial selection process based on the fit of the document, book, or article to the selection criteria. Accounts that pass this initial selection stage are then available for a more thorough screening. It is important to separate this final screening from the initial selection process (Gamson, 1975). Having a final screening as a separate stage allows the researcher to apply the selection criteria in a more consistent and rigorous fashion. This final screening is important for producing a list of cases that fits the population of interest as closely as possible. Application of the final selection criteria in a happenstance, changing, or variable fashion during the search and identification stage can result in the inclusion of cases with a poor fit or in the exclusion of cases with a good fit. It also may be worthwhile to include unpublished accounts. Unpublished dissertations provide a large body of unanalyzed book-length accounts. Dissertation Abstracts International provides a relatively complete listing of dissertations. Conference papers provide a large body of unanalyzed article-length accounts. These unpublished papers can be accessed by looking through the proceedings of the annual meetings of the relevant professional associations for recent years. In addition, each topic area may have a special repository of published and Page 5 of 15 unpublished holdings. For instance, the Harvard University Business School library has a large repository of documentary accounts of business organizations (Harvard University, 1998). An Example Strategies for identifying and selecting appropriate documentary accounts can be illustrated by considering the search procedures employed by Hodson, Welsh, Rieble, Jamison, and Creighton (1993) for generating a pool of workplace ethnographies. Only book-length ethnographies were considered because the topics of interest included worker citizenship and resistance, management behavior, and organizational characteristics—topics not consistently covered in depth by shorter article-length accounts of workplaces. Many thousands of case studies were examined in a two-phase procedure to locate appropriate book-length ethnographies. First, likely titles were generated by computer-assisted searches of archives, by perusal of the bibliographies of ethnographies already located, and by a search of the library shelves in the immediate area of previously identified ethnographies. We screened titles using online computer archives, book reviews, or direct examination of the books selected from the shelves. Repeated application of these procedures constitutes what we believe was an exhaustive search—eventually our pursuit of new leads produced only titles already considered. We excluded cases that used primarily archival or survey data for their analysis rather than ethnographic observation. This selection process yielded a pool of 365 books as potential candidates for inclusion. During the second phase of selection, we examined each book directly. The criteria for inclusion were (a) the book had to be based on direct ethnographic methods of observation over a period of at least 6 months, (b) the observations had to be in a single organization, and (c) the book had to focus on at least one clearly identified group of workers—an assembly line, a typing pool, a task group, or some other identifiable work group. The requirements of an ethnographic method and a focus on a specific work group were necessary to limit the pool to cases with the depth of observation needed to reliably ascertain the various facets of workplace relations of interest. The focus on a single organization was necessary to produce measures of the organizational characteristics that we hypothesized to be key determinants of workplace relations. Of the 365 books, 86 were retained as appropriate for analysis and 279 were rejected. Of those rejected, more than 200 were excluded because they reported on an occupation as a whole rather than on a particular group of workers in a specific organization. These studies generally failed to provide reliable measures of either work group relations or their organizational Page 6 of 15 correlates. About 25 books were excluded because they studied industries rather than specific organizations. These studies also generally lacked good firsthand information on worker relations. Fifteen books met the three criteria for inclusion but were either so short or so loosely written that accurate or complete information could not be ascertained. Thirteen books were excluded because they focused primarily on a specific job redesign program. Again, these books did not provide adequate information to code many of the variables in which we were interested. Eleven books were community studies, often of a factory town. These studies were typically based on observing and interviewing people and families outside of work and not inside the workplace. As a result, these books generally failed to provide adequate organizational or labor process information. Eight books were excluded because they focused on a particular strike or collective action and included little material on the nature of work or the labor process. Six books were excluded because they concerned plant closings and the resulting stresses and dislocations. These books also provided little material on the nature of work or on workplace relations prior to the shutdown. Seven books were rejected because they were company histories or executive biographies and contained little information on the actual work taking place in the organization. In all cases, we examined each book carefully to see if it met the three criteria for inclusion. In some cases, a book was relatively weak on one criterion, but the depth of its material in other areas allowed its inclusion. Thus, we sometimes included a book with a fairly broad occupational focus if it had excellent ethnographic material on the organization and the labor process in several occupations. For example, a book might contain information about both assembly workers and machinists. When coding material from such a book, we determined which occupational category was the major focus and coded only material about that occupation. In some books, the data allowed the coding of two cases. For example, we coded two cases from a book reporting on a cocktail lounge—one for waitresses and one for managers (see Spradley & Mann, 1975). Gouldner's (1964) Patterns of Industrial Bureaucracy also generated two cases—one for underground miners and one for workers in the gypsum board factory. We coded multiple cases from 10 books. We included books based on observations of several organizations if descriptions of the labor process were particularly strong. Such books were included only if the organizations were similar and were discussed in detail. We coded organizational characteristics for these cases from a composite. Application of the above criteria generated 108 cases from the 86 published ethnographies. We believe these cases constitute the population of book-length English-language ethnographies that provide relatively complete information on a single workplace and on an identifiable work group within that workplace. Page 7 of 15

. Coding the Data Collecting original data is generally a highly rewarding experience for researchers. It is also a lot of work, so it is important that this work be done right. The researcher may be analyzing the data and writing papers from it for years to come. Decisions made at the point of data collection have lasting consequences. A systematic approach to developing a data collection instrument is the key to success. The Coding Instrument The first step in preparing a good coding instrument is to survey thoroughly the existing literature on the topic being studied. List the major concepts appearing in that literature. These are the concepts that you need to include in the data collection instrument. If a concept is complex and has several facets, make sure that you include all facets. In addition, the accounts may allow the coding of topics and issues that are not routinely analyzed in the published research record. A good familiarity with the published literature may suggest some of these areas. For instance, do the accounts allow the coding of more in-depth information on the behavioral settings than is typically available in survey-based investigations? If book-length accounts are being coded, the coding instrument can be quite long, perhaps containing as many as 100 to 200 variables. In surveys, the need to maintain respondent cooperation limits the effective length of the survey instrument. Coding documentary accounts avoids this limiting factor. The limiting factor for documentary accounts is the ability of coders to keep in mind the various variables they need to code as they read the account. If the coding instrument is too lengthy, coders may not recognize a passage as containing the answer to a question on the instrument because they have forgotten it. Periodic review of the instrument at the end of each chapter will help limit this problem. Documentary accounts are rich sources of data. The best advice is to generate a relatively complete instrument and then develop procedures to ensure that it is filled out as completely as possible. It is also important for the coder to record the page numbers on which the information leading to each code is to be found. The coding instrument for the project using workplace ethnographies described in this monograph is provided in the appendix as an example (see also Hodson, 1996). After the initial instrument is developed, it must be field-tested on a set of accounts. This process should not be skipped over or abbreviated. It is important to test and refine the questions and the answer options in an interactive process with the data to be coded. This process may entail the reading of a half dozen or more complete books or even more articles. During this time, the researcher also will be making decisions about which questions to code as Page 8 of 15 open-ended responses and which to code as fixed-option responses. Open-ended responses will entail additional work at the data analysis stage but may be important for preserving complete information where a given question has a wide range of answers (see Tilly, 1981, pp. 76–79). The code sheet contains the questions and the answer options. Additional decision rules also may be developed about how to interpret certain types of passages. These rules are important for establishing consistent criteria for coding the accounts. These rules need to be put in writing and compiled into a supplemental coding protocol that is reviewed regularly. Coding protocols should be developed as fully as possible before the start of the major data collection stage. Later revisions are also typically necessary, especially in the early stages of the coding process. These protocols will be essential during the analysis stage as aids in remembering how specific variables were coded. Detailed protocols are invaluable if the data set is shared with others who were not involved in the initial data collection process. Methodological checks are a final category of variable to include in the instrument. These checks include information both about the documentary accounts and about the coding process. Information about the documentary accounts is easy to record and can and should be recorded at the data collection stage. Methodological variables about ethnographic accounts might include the following: How long the ethnographer was in the field Page length of the ethnography Year of the observations Training of the ethnographer Observational role taken by the ethnographer Types of informants used Theoretical orientation of the ethnographer The ethnographer's organizing question These questions will potentially be very useful in later analysis for establishing the presence or absence of bias in the accounts. Analysis of these methodological features can provide valuable insights about the accounts as a data source. Such variables also can be invaluable for studies of the social construction of the information embodied in the accounts. It is thus crucial to include the appropriate methodological variables in the coding instrument. Information about the coding process also should be recorded. What are the demographic characteristics of the coders, such as age, sex, and ethnicity? At what stage or date in the
###########################################################################################################
Page 9 of 15

. coding process was the account coded? The analysis of methods checks on the documentary accounts and on the coding process will be the focus in Chapter 5, which is on reliability and validity. Avoiding Inference The consistent coding of a phenomenon by different coders is essential for reliability. Coding documentary accounts requires intellectual engagement with the written text, and the coding of many variables requires some interpretation of the text (Stryker, 1996). If substantial inference is required, however, different coders may arrive at different codings depending on their possibly divergent assumptions. Excessive inference erodes reliability. The main way to limit the need for inference is to develop and code concepts that are not highly abstract. For instance, Tilly (1981, pp. 74–75) contrasts the reliability of two questions—one abstract and the other more concrete. The concrete question is the occupation of the social group engaged in a strike. Answers to this question were coded with high reliability (94% consistency between pairs of coders). The other question concerned the degree of coordination of the strike as a whole. This question requires considerable judgment and interpretation. What exactly constitutes coordination? How much preplanning does the concept of coordination imply? For a variable meant to distinguish between coordinated and spontaneous disturbances, Tilly found only 22 agreements out of 37 possible pairs of independent codings (59% consistency). Tilly notes that the underlying concept of coordination requires the synthesis of diverse bits of information and is quite abstract, making it difficult to code reliably. The researcher may be very interested in a somewhat abstract concept, such as the level of coordination of strike activities. However, in the coding phase, such concepts need to be disaggregated into simpler components that can be reliably coded. In the current example, these components may include such items as the presence of leadership, evidence of prior meetings to establish a plan of action, use of social control within the group, and so on. During the analysis stage, these components can be reassembled to measure the concept of coordination in its various facets. In this way, required inferences can be made explicit and are under the control of the researcher. Missing data is a particular challenge when coding documentary accounts. Some accounts may not address issues that are included in the coding instrument. Missing data will be especially problematic when coding articles, which are more limited in scope than books. Again, inference should be avoided. If a phenomenon is not discussed, it should not be coded as absent. For example, the absence of a discussion of accidents and injuries at a workplace should not be taken as evidence of a safe workplace. The ethnographer may simply have
#######################################################################################
Page 10 of 15

. considered this a peripheral topic and not discussed it. Accidents and injuries should thus be coded as missing in this case. If, in the analysis stage, researchers want to infer the absence of the phenomenon, they can do so. Such inference might be based on the values of other variables that were coded and that provide some basis for the inference. By avoiding inference at the data collection stage, researchers do not lose the ability to infer based on reasonable assumptions. Rather, they delay the process and make the standards of inference consistent and explicit. Where there is substantial missing data on an important variable, especially if it is the dependent variable, the only appropriate solution may be to analyze only those cases with data present. Coder Training Coder training is an important part of any research project. Few faculty researchers will have the time to code all their own data. Grant support should be sought if at all possible. Even small amounts of internal support can greatly facilitate a research project. In the absence of support, sometimes a graduate seminar can be arranged so that part of the course work is to code one or more books. If the primary researcher is a graduate student, hiring additional coders may not be within the project budget. Even in this case, however, it is important to have at least some cases duplicate coded by a student colleague to check reliability. Such help might be arranged as part of an exchange of labor among peers in a graduate cohort. Using multiple coders not only lessens the work of the principal investigator but also allows important reliability and bias checks on the data-coding process. Coding documentary accounts makes high demands on coders' comprehension. “Text coding beyond simple words requires text understanding; text understanding involves complex linguistic operations affected by the reader's knowledge, both general and specific, and by the reader's capacity to memorize and recall information” (Franzosi, 1990, p. 451). These demands are increased by the lengthy coding instruments used for documentary accounts, especially for book-length accounts. The good news, of course, is that longer accounts allow the coding of more information than shorter accounts. Shorter accounts are easier to code but may lead to more missing data. They may also encourage a greater use of inference in coding variables because of the more limited information available (Weber, 1990, p. 40). To ensure data quality, it is important to select, train, and supervise coders carefully. For the workplace ethnography project by Hodson et al. (1993), the book-length ethnographies were read and coded by a team of four researchers (the principal investigator, a project director, and
#########################################################################################3
Page 11 of 15

. two senior graduate students) and by eight members of a graduate research practicum. All coders were trained on a common ethnography and met twice weekly as a group to discuss problems and questions. Coders recorded up to three page numbers identifying the passages used for coding each variable. We instructed coders to look for behavioral indicators or specific descriptions for each variable coded and not to rely on the ethnographers' summary statements or evaluations (Weber, 1990). The coders were allowed to select which books they wanted to read and code to optimize their motivation for reading the books carefully. Greater interest may increase the care with which coders read the accounts and may result in better-quality data being recorded (Franzosi, 1990, p. 453). As a result of the decision to allow coders to select books from the pool based on their own preferences, however, any “coder effects” in the data are open to multiple interpretations. Coder effects could result either from coder bias or from the unique characteristics of the subsample of books that each coder selected. We used missing value codes where there was insufficient information to code a variable, and no attempt was made to use proxy indicators at the data collection stage. We also avoided making any assumption that the absence of discussion about a given aspect of work constituted evidence of the absence of the phenomenon itself. If information on a certain feature of the workplace was not provided, the corresponding variable was simply coded as missing. Most organizational ethnographies discuss a core set of topics, but each ethnography has areas of greater or lesser coverage. It is also important to debrief the coders and review their codings in detail after every case is coded. This requires a substantial time commitment on the part of the principal investigator or project director. Such debriefings, however, are essential for catching coding errors and for ongoing training and quality control. Debriefing is essential for maintaining consistent coding procedures between coders and across time. For the workplace ethnography project, coders were debriefed by a member of the research staff after completing each book to check the accuracy of their codings. At this time, all codings were reviewed in detail. Reliability Checks Duplicate coding should be built into the coding process to allow the reliability of the data- coding process to be evaluated. If these checks are evaluated on an ongoing basis, they can also help identify questions in need of refinement or coders in need of greater training or closer supervision. Page 12 of 15

. A good rule of thumb is to have at least a 10% sample of the cases coded by a second reviewer (Elder, Pavalko, & Clipp, 1993). Once a coding operation is in place and functioning, a 10% increase in the number of cases to be coded translates into a relatively small allocation of resources, many of which have already been expended in the process of initiating the project, developing the coding instrument, and selecting and training coders. The additional resources allocated to duplicate coding will be well spent. Without reliability checks, the researcher has little information about the quality of the data. If the researcher has built reliability checks into the data collection process, then an informed discussion of the quality of the data becomes possible. The quality of the final cleaned data set also can be improved by averaging the two sets of codings or by reconciling discrepancies by returning to the primary data. Errors can also creep into data through keypunching mistakes and clerical errors. Errors of this sort further erode the reliability of the data. Fortunately, such errors are relatively easy to identify and correct. Data can be double punched and the files compared to ensure an exact match. Variable frequencies can be examined to locate codes outside allowable ranges. Variables can be cross-tabulated and inconsistencies investigated. These checks are easy to implement using standard data analysis programs and should be pursued rigorously (Franzosi, 1999). Subtopic Analysis Additional topics can be analyzed by focusing on subsets of accounts or by coding additional variables for a topic of special interest. For example, researchers using the Human Relations Area Files have initiated a number of projects in which subsamples of cases with data on special topics are further investigated. These projects include studies of child rearing and time allocation (Munroe & Munroe, 1991). Using organizational ethnographies, race relations at work could be further analyzed by coding additional concepts for the subset of ethnographies that describe organizations with racially mixed workforces. Similarly, additional variables could be coded for the subset of organizational ethnographies that describe gender-mixed workforces, perhaps focusing on coworker relations or related topics (see Welsh, 1994). One of the benefits of ethnographies is that they often report the history of events leading to a current situation. Recent developments in sequence analysis allow the identification and comparison of complex causal paths from event histories such as those available in documentary accounts (Abbott & Hrycak, 1990; Griffin, 1993). Such causal sequences can be investigated by coding data from the documents for important types of events and event

Page 13 of 15

. sequences. Electronic Scanning and Autocoding Recent advances in technology have created new opportunities for social scientists interested in the analysis of textual data. Electronic scanning, for instance, has become more reliable and cost-effective. Scanning allows large bodies of textual data to be rendered available for computer-assisted coding. These new computer-assisted technologies make the free searching of large amounts of textual material for ideas and concepts much easier and much less tedious than searching by hand. The limitation of these methods is that not all episodes relevant to a given concept are likely to be discussed using the same words or phrases. Rigorous use of lists of synonyms and key phrases, however, can result in relatively complete searches. These technologies make extensions of the analysis to new topics and concepts less prohibitive in terms of time and effort because the entire document does not have to be reread to search for a new idea. The reduced cost of duplicating data held in such mediums also increases the accessibility of the accumulated documentary record on a topic (see Levinson, 1989). It is also possible to code somewhat more complex concepts by using autocoding software that “interprets” passages and generates data based on the interpreted meaning of the passage (Bernard & Ryan, 1998). Researchers involved in the study of conflict events have most extensively developed such procedures. An example of this technique is provided by Bond, Jenkins, Taylor, and Schock (1997, p. 563). For this research project, news headlines were analyzed by a software program developed to look for the following information: “Who is doing what to whom, when, where, why, and how?” Answers to each of these five questions are sought in a data set composed of news headlines and first lines of articles. For instance, the following headline was autocoded: “Eight students were arrested today by local police in front of the U.S. Embassy in Seoul after they staged a teach-in directed against U.S. trade policies.” The autocoding program coded two events from this headline: (a) South Korean police arrest students and (b) students demonstrate against U.S. Embassy. The computer produces this coding by matching the words in the text against lists of acceptable words measuring each concept. Autocoding (in combination with text scanning) allows the coding of relatively simple ideas from large numbers of cases, potentially reaching into the thousands. In addition, the coding is completely consistent in the sense that the computer will code the data the same way each
#####################################################################################
Page 14 of 15
 time the program runs. Comparisons with human coders suggest similar or higher reliability for autocoding programs (Bond et al., 1997, pp. 567–569). An additional strength of autocoding systems is that changes can be made to the system, such as adding new codes, and the program can be rerun. Thus, as the researchers perfect the system, they can upgrade the resulting data set. Such recoding would be prohibitively expensive if done by hand. Autocoding has two major limitations. First, autocoding is useful only for relatively simple coding projects. Thus, a software program can be developed to code five variables from the headlines and first lines of newspaper articles about conflict events. These data points can later be augmented by other data about the newspaper such as the date, city of origin, and ideological orientation of the newspaper. But the resulting data sets are much more limited than the full range of data available from extended documentary accounts. The central strength of documentary accounts, which is their depth of observation, is thus, as yet, largely inaccessible through autocoding systems. Autocoding is a relatively new technology in the researcher's toolkit, and it is difficult to predict its range of utility. Autocoding has already been used successfully in coding conflict events. Its introduction into the coding of documentary accounts will in all likelihood be relatively slow because of the difficulty of developing software to code large numbers of complex events and relationships, especially where these are described in varying and diverse ways. Autocoding might make its initial contributions to the coding of documentary accounts in the analysis of relatively focused subtopics such as coworker relations as reported in workplace ethnographies or specific types of crime as reported in juvenile delinquency studies