Selasa, 12 Juni 2012

Linking Minority Language Dictionaries to Open Data | The Journeyler

What is the role of a dictionary?

Is the role of a dictionary to regulate or to standardize spelling? Is it to validate a speech variety as being real or a bon fide language? Or is it for documenting and establishing the relationships and connections between things (plants, animals, fish, spirits/gods, medicines, etc.) as they are emicly viewed, for connecting people via collaboration, or connecting related concepts and their classes together into documented sets? Or even connecting these things and relationships as they are viewed in one culture to the same things and relationships as they are viewed in another culture or more broadly cross-culturally?
Dictionaries for minority languages have contained within them little bits of information about societies and the way those groups of people interact with their environments and among themselves. For example, dictionaries may include the names of fish, places and clarify terms for relationships among family or social groups. Often in the data management and internet realms these objects, activities and relationships are codified. Sometimes these things are codified through LinkedData or RDF. Additionally, there are ever increasingly large data sets which describe the world in which researchers and minority language speakers alike live and make observations.

An important question emerges:

How should Dictionaries be linked to other online resources and ontologies?

If one takes a traditional view, dictionaries are static, edited, and then published, books, PDFs and in modern cases versioned datasets. One example of perspective applied to linking LinkedData to minority language dictionaries might be linking the names of the species of animals in indigenous/minority languages to resources on those species using lists (ontologies) which already exist in the LinkedData world. An additional step would be to hyper-link to resources which match these definitions. There is already a lot of LinkedData out there. If minority language communities are "linked" to that data through their dictionaries, then the dictionary not only serves as a record of their culture and language but also serves to bridge that community of dictionary users to knowledge areas and experience outside the community about things and relationships which exist inside of that community.

To continue the example of LinkedData for animal species, terms could be linked to Wikipedia. However, Wikipedia is not in general specific enough, nor stable enough (and one should also ask how stable these other digital resources are). A more specific resource would be the ICUN Red List of threatened species (which also makes its data available through LinkedData and RDF).

San Jose Brush Rabbit - on The Red List

The Red List

Example of the Sylvilagus mansuetus
Conejo de la Isla San Jos?
San Jose Brush Rabbit
from The Red List

Protected Planet is another website which has some work on species, but extends its knowledge into the larger ecosystem, not just a list of species. Either of these two resources would provide useful comparative data for minority language speakers. The connectivity would also provide content or format models for how they share their own valuable knowledge. Of course such linking assumes a common language and would also likely require consultation with experts in the domain of knowledge being linked to, in this case Biology.

The Cautions

In language documentation, documenters (and often native minority language speakers in collaboration with these researchers) are interested in getting "snapshots" of language use and cultural events, ideas, and practices which describe the uniqueness of that society. Documenters work with archives to preserve this record. The foundational assumption is that this record is static in its baseline form. That is, the baseline contains a certain set of documents, which might be added to, or enriched (i.e. via transcriptions or annotations) through time, but will never change or go away. However, in thinking through dictionaries as part of language development, we must think dynamically rather than statically. For example, let's take SIL International's definition of language development :

Language development is the series of ongoing planned actions taken to ensure that a language continues to serve the changing social, cultural, political, economical and spiritual needs and goals of its speakers.

This definition of language development suggests that as a community or a society changes its pursuits and communicative activities the undergirding technologies enabling that communication should also change with it to support the designs of the communication. I think this is spot on. As a community changes so the terms in their dictionary should also increase to cover these changes. As words are used in new senses, these senses need to be added to the dictionary. Therefore, this perspective on language development should lead us to consider language resources in a dynamic context rather than in a static context. This perspective is both:

  1. an analogy to how language can be viewed in a dynamic, diachronic context
  2. a contrast to how language resources have existed in the past (considering static materials like bound media, printed media and PDFs.)

When resources and even language use, moves into a dynamic medium, like a digital medium, then recording and documenting the language takes on new challenges or tasks. In digital mediums, like web presentations, the communication and the medium are inseparable. The delivery mechanism is part of the total message.[ref ]
In terms of language documentation,[ref ] the dynamic presentation of data or the linking of data to other sources should be preserved to maintain a record of that language and culture at that time. As language documenters (and archivists) we must ask the question: How is the original work preserved?

To answer this question, some have said that to archive data driven resources like dictionaries, what is needed is to archive the database (as apposed to the final product: i.e. a book, or a PDF or a MS Word Document). This is a good strategy if the data driven resource is static in nature (i.e. a FLEx or Toolbox database), and conventional in access/production mode (i.e. a FLEx or Toolbox program). This also works if the archive has a version of the program and many dictionaries in the file formats of the program (much like a CD player and many CDs). Access to the data and editorial abilities can be restored. Then tools which are dependent on these programs (Like SIL International's Pathway product) for producing usable product (books, PDFs, websites, etc.) from the dataset can access the data. However, if we hold that it is true that, the cognitive interaction between the database user and the database is defined by the interaction with the output through a chosen medium then, Have we really archived the resource if we do not also archive the output display as well? - No. We have only archived a component of the resource. This is fine and even preferable in some use cases where there are various kinds of presentations of a static, edited, compiled dataset. However, this is dispreferred when not just the presentation is dynamic, but also the information architecture and the visual presentation form driving the communicative interaction is also dynamic then archiving under traditional mindsets and practices can actually lead to failing to document a language. - The irony would be to have a 50 year old dictionary which would not document the language (over the course of 50 years).

Going back to SIL's definition of Language Development and considering that many dictionaries today are digital, and that in today's collaborative web-based environment, a society which desires to and is capable of using web tools to produce minority language dictionaries, the actual information architecture becomes important of "the work". It is no-longer just the tokens of the lexeme field which are important. Understanding the Why to how the data is arranged becomes important as well. This is important not just to the web-aplication developers, and information architects, but is really important to the users of the data contained in the various fields of the database portion of the web-application. These relationships between various fields in the database portion of the web-application are not always transparent to those who view data contained in the lexeme fields (or other various fields). This means that reality to the viewer is determined by the way that the viewer experiences their interaction with the data, and that this reality can be very different from what is archived.

If a dynamic-social dictionary is constantly changing to meet the ongoing needs of a speech community in a multitude of ways, then it should be expected to minimally change in these ways:

  • in the visual presentation of the data - affecting the way that people cognitively interact with the data.
  • in the connected state of the data (both internally and externally) - the technically implemented state of cognitive relationships between database elements which already existing in the culture
  • in the content of the database (quantity of lexems, definitions, photos, pronunciations, part of speech, etc.)
  • in the structure of the database (additional fields added or additional relationships added to the database) - as the community changes over time some database categories will become less obvious or needed.

Given these kinds of changes in the dictionary's data structure, then the issue of permanence and diachrony still needs to be addressed to satisfy language documentation practices. The interactive, dynamic and web-based dictionary stands in contrast to the long standing traditional view on lexicography and publishing practices which look towards a stable out-put. In web technology, for users to realize (be exposed to) changes in the database often the presentation layer also has to change. This is one reason that archiving the code presenting the data is important as well in these kinds of use cases. However, this does not completely explain how archiving the visual mode is important to the issue of linking terms or lexems in lexical databases (online or not) to Open Data. To see this relevance I think it is important to understand that linking is communication and that communication is a two way street.

Communication is a two way street

The communicative process is a negotiation of thought and ideas from one entity to another entity. So in the case of LinkedData and linking to Open Data in or from minority language dictionaries, the act of associating these traditional knowledge sets with mainline (global) knowledge sets can have the effect of changing the traditional knowledge set, or minimally, changing the perspective of indigenous/minority language users on the status of traditional knowledge. Let me give a hypothetical example.

David Harrison in his book, When languages die[ref ] (page 44) Tells the story of a macro-class of fish in a native american language. This macro class happens to follow the scientific distinction (genetic classification) of the fish, whereas the English names of the fish would lead English speakers to sub-divide these fish into different classes.

When Languages Die

When Languages Die

So, if the anthropological taxonomy of fish is represented in a single database or ontology then changing that database or ontology will affect the documentation of that language or people group. However, if the editors of the dataset, even if they are native speakers of the language are editing ontologies, then are they editing the record of the culture? Even if changing that database will meet the needs of that language group in the future (or currently in some future time). LinkingData is a two way street. If "A" is related to "B" then "B" is related to "A", though this may not be expressed in visual manifestations of the linkage. When a society is confronted with new knowledge, because it is linked to it, the community has to choose what do with that new knowledge, and in web technology it may not be immediately obvious that cognitive and even behavioral change is occurring. Not that change is bad, because change is bound to occur. Additionally, when we are talking about a speech community which is dealing with LinkedData via the web, we are talking about many web based influences not just those from the LinkedData. However, the editorial state of the database due to other arrangements of data is something of which to be mindful.

Acknowledgments

A special thanks to Pat Kelley and Steve Marlett who have both worked on dictionaries and have inspired me through conversations on these issues to think about and write something down.


Bibliography

mike daisey nicollette sheridan apple dividend snow white and the huntsman snow white and the huntsman peyton manning rupaul drag race

Linking Minority Language Dictionaries to Open Data | The Journeyler
Rating: 100% based on 975 ratings. 91 user reviews.

Tidak ada komentar:

Posting Komentar