Сравнительный анализ печатных и электронных средств представления знания
Андрес Грегор Зелман
Амстердамский университет, Амстердам, Нидерланды
Andres Gregor Zelman
The University of Amsterdam, Amsterdam, The Netherlands
Андрес Грегор Зелман
Амстердамський університет, Амстердам, Нідерланди
Описываются результаты анализа печатных и электронных средств представления знания, проведенного исследовательской группой Самоорганизации Европейского информационного общества (СЕИО). Описываются основные архитектурные параметры коммуникационной системы СЕИО и системные свойства сети. В основе исследования лежат три теории: теория средств, теория сетевого действия и теория самоорганизации. Теория средств предоставляет возможность сравнить печатные и электронные средства производства знания. Теория сетевого действия описывает комплексные отношения, вовлеченные в коммуникационные процессы, а также характеризует собственно средства связи. Теория самоорганизации выводит системные исследования производства знания на более высокий уровень. В рамках проекта модуль информации становится явно заданным при использовании тех или иных средств передачи. При сопоставлении полученных результатов мы приходим к пониманию смещений акцентов, которые могут оказывать влияние на процесс производства знания.
The study describes the results of the computer assisted textual analysis and comparison of the print and electronic communications of the Self-Organization of the European Information Society (SOEIS) research group. Key architectural features of the SOEIS communications are identified, networks revealed, and systemic properties described — each of which reflect particular biases of the respective media under analysis. Importantly, the study is theoretically motivated by several distinct theoretical traditions: Medium Theory, Actor Network Theory, and Self-Organization Theory. Medium Theory provides an initial basis for the architectural comparison of print and electronic modes of knowledge production; Actor Network Theory enables a description of the complex networked relationships involved with communicating research over print or electronic channels, and to characterize media themselves as integral actants in processes of knowledge production; and finally, Self-Organization Theory permits a higher order perspective whereby the systemic features of knowledge production can be observed. The mode of information implied with the use of particular media becomes explicit through a comparison between the print and electronic dimensions of the SOEIS research project, and through juxtaposition of these results with the theoretical lens we gain an understanding of the biases which may influence processes of (mediated) knowledge production.
Висвітлюються результати аналізу друкованих і електронних засобів представлення знання, здійсненого дослідницькою групою Самоорганізації Європейського інформаційного суспільства (СЄІС)
, основні архітектурні параметри комунікаційної системи СЄІС і системні властивості мережі. В основі дослідження лежать три теорії: теорія засобів, теорія мережевої дії і теорія самоорганізації. Теорія засобів надає можливість порівняти друковані та електронні засоби виробництва знання. Теорія мережевої дії описує комплексні відносини, залучені в комунікаційні процеси, а також характеризує власне засобу зв'язку. Теорія самоорганізації виводить системні дослідження виробництва знання на більш високий рівень. В рамках проекту модуль інформації стає явно заданим при використанні тих чи інших засобів передачі. При співставленні отриманих результатів ми приходимо до розуміння зміщення акцентів, які можуть впливати на процес виробництва знання.
Introduction
The study is motivated by a number of current debates concerning the impact of electronic media on the academic environment. Gibbons (et al., 1994) describe the changing nature of scientific research as a shift from Mode 1 to Mode 2 knowledge production which indicates a move away from knowledge produced in traditional research contexts to an environment in which knowledge is created in broader trans-disciplinary social and economic contexts. This development, they argue, is related in part to the introduction of electronic media into academic environments. Similarly, two significant OECD publications (1996, 1997) have recognized these changes in the science system as part and parcel of the introduction of new Information and Communication Technologies (ICTs).
The current study is motivated by such claims, and aims to assess this impact on the science system through an analysis and comparison of Mode I and Mode II features of a collective research endeavour known as the Self-Organization of the European Information Society (SOEIS) research project.
The SOEIS group was selected for the analysis because it exhibited evidence of both Mode I and Mode II processes of knowledge production through its print exchange and email communications. Given the centrality of the print medium to the functioning of academic communications (in the case of the SOEIS project this includes the application, milestones and reports), this study aims to determine if electronic media provide a decidedly different mode of communication than print media, and if this difference has had any notable impact on SOEIS project communications. The print and electronic communications over the two year time period of the project were examined for architectural parameters, networks of word co-occurrence, and systemic properties — each of which are understood to reflect different media biases. The results of the print and electronic analyses are then juxtaposed. In this way the different roles media play in relation to each other in processes of knowledge production are identified.
Research Focus
Different media enable and constrain human communication in different ways, and one can therefore assume certain biases specific to each medium. Importantly, both print and electronic writing leave examinable traces in the form of published text and electronic archives — and it is precisely these traces and their interrelation that are examined in order to locate structural elements in the data. A central concern here is whether network architectures can be identified, and whether the similarities and differences between print and electronic architectures can be interpreted to exhibit media bias. The comparison of the results of the textual analysis of the print dimension with the results of the analysis of the electronic dimension make the similarities and differences between these respective modes of communication apparent.
The analysis relies upon textual analysis techniques to map processes of knowledge production. Similar analyses have been performed; Leydesdorff (1989, 1997) provides arguments both for and against using co-word analysis to map the intellectual development of the sciences. In the case of the former, indicators of intellectual structure and organization were identified; yet, in the latter, a comparison of bio-chemistry articles revealed an intellectual structure, but at the level of the document set the structure was no longer discernable. The textual analysis literature has also emphasized the importance of scientific visualization and the development of software tools to aid in this process (Bradley & Rockwell 1992, 1994). Additionally, actor Network Theory (ANT) approaches are often used to describe network features of text. As Callon (et al., 1986) argue, word and co-word analyses can be framed conceptually with actor-network approaches. They state that “by having recourse to the quantitative, it thus becomes possible to map the degree to which the efforts of actors to build their worlds are met with success. The maps present a pattern of translations that arises from the interaction between the efforts of many actors.”(1986, 225) Importantly, ANT plays a central role in the theoretical orientation of this analysis.
The architectural, network and systemic dimensions explored herein are supported by three intellectual traditions: Medium Theory, Actor Network Theory, and Self-Organization Theory, respectively. Meyrowitz (1994), a central proponent of the Medium Theory tradition, argues that historically, social roles operate as information networks, and that they are necessarily mediated. Here the notion of the information network is used as a heuristic to aid in conceptualizing how the print and electronic information networks (as architectures) were established and maintained over the two years of the research project. ANT outlines word use as precisely that which enables communication and the implied patterns therein as discernable processes of knowledge production. Individual communications compile through collective action, and this perspective enables a complexification of this architecture by theorizing the medium as used, and mediation as a process. Finally, Self-Organization Theory enables a macro view of the systemic properties of SOEIS print and electronic communications.
Results
The SOEIS print and electronic output is examined for its basic architectural features including the document size, word frequency, unique-word frequency, and ratio percent. Additionally, the analysis aims to discover network features revealed by differences in overall word distribution, and systemic features such as phases of critical transition in the datasets. For both datasets the time period is roughly the same; both began approximately September 1997, and ended in December 1999. The results of the print and electronic analyses are described below in tandem.
The print documents include the original project design, milestones, deliverables and final reports of the SOEIS research project. The electronic documents include all 1261 emails exchanged between SOEIS members and associates over the two year time period of the project. Two databases were created: the print documents were divided into four chunks, representing four time periods of six months each; similarly, the electronic communications were divided into four chunks of six months each. Importantly these two datasets share approximately the same 24 month timeline. Both the print and electronic documents were all filtered for the most commonly occurring words (if, and, but) prior to analysis using an adapted stop-list, and then auto-lemmatized to reduce the plural form. Additionally, prior to analysing the electronic documents, a preliminary filtering of the data was performed to eliminate redundancy and skewed relations.
The collected and collated documents representing the four respective time periods for each medium were then run through the WordSmith program. Each individual text and the full document texts were examined for basic statistics including size, word-occurrence, unique-word count, and their ratio percent. Importantly, the Wordsmith program calculates the ratio percent of the unique-word / word-occurrence afresh for every 1000 words throughout the entire dataset and then computes a running average. Accordingly, the ratio is a percentage of new types for every 1000 words and in this way one can compare ratios across texts of differing lengths. Table 1: Print Architecture and Table 2: Electronic Architecture, below, provide the parameters of the two respective databases.
Name |
Size (KB) |
Word Occurrences |
Unique Words |
Ratio % |
P1.txt |
477 |
70638 |
2395 |
3.39 |
P2.txt |
932 |
145582 |
4470 |
3.07 |
P3.txt |
223 |
34560 |
1399 |
4.05 |
P4.txt |
2219 |
350524 |
6502 |
1.85 |
P-All.txt |
4073 |
635861 |
9385 |
1.48 |
Table 1: Print Architecture
In Table 1: Print Architecture size and word occurrence appear to be related, but there is no necessary relationship between unique word and word occurrence. In P3 there is a significant reduction in the amount of print writing generated during this time period, and yet despite the low volume P3, the number of unique words increases relatively. Therefore, a process of codification is evident in P3 because a lower volume indicates a lower amount of unique words. Further, because the word occurrences go down one would expect the number of unique words to also decrease, but this is not the case. In P4 there is a high volume of word occurrence, but particularly low ratio percent — this reduction in the number of unique words suggests the development of a stabilized vocabulary of the SOEIS group in the final period of the project.
Name |
# Emails |
Size (KB) |
Word Occurrences |
Unique Words |
Ratio % |
E1.txt |
350 |
788 |
115484 |
3364 |
2.91 |
E2.txt |
357 |
957 |
135009 |
4412 |
3.27 |
E3.txt |
293 |
654 |
92912 |
3161 |
3.40 |
E4.txt |
261 |
969 |
137480 |
4788 |
5.23 |
E-All.txt |
1261 |
3358 |
495436 |
9599 |
1.94 |
Table 2: Electronic Architecture
By contrast to the print architecture, Table 2: Electronic Architecture reveals an even volume across the time periods. There is an apparent reduction in file size with E3, but this is not reflected in the unique word count. With E4 the number of unique words again increases thereby suggesting a renewed vocabulary of the SOEIS members at the end of the project. The new words here may indicate the raising of new research questions.
Thus, while the analysis of the print database revealed periods of information codification, the electronic communications remain unaffected. This suggests a relationship between the communications medium and its designated function; print communication is representative of the level of formal communications with the European Union (milestones, reports), and the electronic communication houses the informal, supplementary communications. The shifts in the size and unique-word count of the respective databases help indicate the priorities of certain time periods and of each respective medium with respect to its designated role in the SOEIS project.
The four respective wordlists of each database were then examined for overall word distribution. Similar to the final analysis, the concern here is the overall distribution of collective patterns of word use. This analysis was performed on two levels for each medium. The first level compared for the shared occurrence of all words present in the full print and electronic reference corpuses — 7142 words for print, and 7076 for electronic. Next, the word lists were compared solely on the basis of the words that were shared by all four texts. With this additional level of specification, 841 shared words were identified in the print database, and 1483 shared words in the electronic. In this way the parameters of information flow present in both print and electronic writing were determined; the networked dimensions are outlined below, thereby adding substance to the architectures outlined above. Below specificity is shown as the specificity of total word distribution, or more accurately: the ratio of the expected information content of the distribution relative to the maximum information content, and mutual information as Transmission.
File |
Unique Words |
Specificity |
Transmission |
Print-All |
7142 |
0.77 |
0.28 |
Electronic-All |
7076 |
0.99 |
0.38 |
Print-Select |
841 |
0.80 |
0.07 |
Electronic-Select |
1483 |
0.99 |
0.11 |
Table 3: Print / Electronic Network Compare
Skewness may indicate a specificity of the communication in terms of collective word use. This measure suggests that the distributions of word use in the print documents are considerably skewed. It may be that there is a somewhat stronger process of codification in the print documents as opposed to the electronic documents, as implied by the restriction in the number of words used. When the select datasets were compared it was found that print appears to do the same job with a smaller number of words than in the electronic set, thereby suggesting a restriction in the number of words used during the two year time period.
There appears to be a stronger process of codification in the print documents than the electronic documents. The restriction in the number of words used during the two year time period proves more significant when compared with Transmission, or mutual information. Here mutual information is understood to reflect a reduction of specificity; while specificity referred to the specificity of total word distribution, transmission eliminates the words not shared across the four time periods. Hence there is a reduction in the mutual information shared. There appears to be more transmission between the time dimensions in the electronic case (distribution of rows E1, E2, E3, & E4), but the difference (60%) is mainly caused by words that occur in specific periods and not in others. Again, the print set appears to accomplish its function using less words than the electronic document set when the select document sets are observed.
Finally, the Systems analysis measured for path dependencies in the data. Self-Organization Theory offers a bird's eye perspective on keyword distributions over the whole text. The analysis for self-organized criticality determines if there have been crucial transitions in the ways that words are distributed which would lead one to conclude that particular pathways had been chosen and were necessary for the end product. The examination of the datasets for system transformation used entropy statistics to assess the texts for critical transitions or path dependencies over the respective datasets. Here the expected information content of each time-period is related to each previous period, or state of the communication.
The analysis was performed on the shared occurrence of all words present in the full print and electronic reference corpuses, and then again on the select lists, containing only those words shared by all four texts, respectively. The data was first analyzed for linear transitions (P1-P2-P3- P4, and E1-E2-E3-E4), and then compared the non-linear associations (P1-P3, P1-P4, P2-P4, and E1-E3, E1-E4, E2-E4). The print results are presented in Figure 1, and Figure 2; the electronic results are presented in Figure 3 and Figure 4.
In this final analysis it was found that there are no visible periods of critical transition at the level of either print or email communication. Thus, neither the print nor electronic databases seem to have any periods in which the words being communicated over the two year time period of the SOEIS project significantly change. This reflects the codified nature of project communications, and is suggestive of a lack of newness or innovation in heavily structured EU funded projects.
Conclusions
The theoretical lens has proven useful in containing the various dimensions under analysis. Differences were found between the print and electronic datasets by reflecting upon the theoretical triad of architectural, network and systemic dimensions. The parameters of the print and electronic datasets were compared on the basis of differences in the ratio percent of unique words and words used. These differences were interpreted in light of the Medium Theory notion of the Information Network. Actor Network Theory provided a focal point for understanding how individual actions compile to form dynamic information networks. In this respect, the databases were analysed for specificity to measure the expected information content of the maximum information content, and transmission to measure changes in the mutual information exchanged. These analyses enable a number of different perspectives on how print and electronic networks differ in the case of the SOEIS research project. Finally, using Self-Organization Theory as a basis, the databases were measured for critical transitions or path dependencies. Overall the analyses have lead to a number interesting insights concerning how print and electronic communications differ, and more specifically about the dynamic relation between Mode I and Mode II types of knowledge production in the SOEIS research project.
In conclusion, the analysis of the print and electronic databases for apparent architectures revealed a codification evident with print communications. In the case of the electronic communications, the high ratio percent in the final time period indicated a renewal of the SOEIS vocabulary, rather than the restriction implied with print. This provided a rationale for theorizing a relationship between communications media and their designated functions. In this context, print communication is seen to be representative of the level of formal communications with the European Union, while electronic communication is interpreted at the informal level, as providing a supplementary role. The network analysis revealed more transmission between the time dimensions in the electronic case, but the difference was caused by words that occur in specific periods and not in others. Thus, the print database appears to perform the same task as the electronic, but with a smaller number of shared words — the SOEIS print thereby provides a more codified communication than electronic. Finally, when measured for systemic properties neither the print nor the electronic database provided evidence of path dependency. It is suggested here that these two media serve to integrate the SOEIS communications systemically and in similar ways, yet print and electronic communications also serve to integrate the SOEIS in decidedly different ways.
References: