IGNACIO M. PALACIOS MARTÍNEZ DEPARTAMENTO DE FILOLOGÍA INGLESA Y ALEMANA UNIVERSIDADE DE SANTIAGO DE COMPOSTELA LEARNER SPANISH ON COMPUTER. THE CAES ‘CORPUS.

1 IGNACIO M. PALACIOS MARTÍNEZ DEPARTAMENTO DE FILOLOGÍA ...
Author: Yolanda Lara Pinto
0 downloads 1 Views

1 IGNACIO M. PALACIOS MARTÍNEZ DEPARTAMENTO DE FILOLOGÍA INGLESA Y ALEMANA UNIVERSIDADE DE SANTIAGO DE COMPOSTELA LEARNER SPANISH ON COMPUTER. THE CAES ‘CORPUS DE APRENDICES DE ESPAÑOL’ PROJECT

2 The CAES Project This presentation will be organised in two parts : The first part will be dealing with the origin, development and description of the project. The second will be concerned with a study derived from the analysis of data extracted from the corpus. This study, which will be centred on false friends, can be considered as a simple example of the kind of research that can be conducted with this tool.

3 The CAES Corpus: General Features Computerised Corpus of Spanish as a foreign language. Financed by the Cervantes Institute (CI). Carried out by a research team from the University of Santiago (Guillermo Rojo and Ignacio Palacios as directors). Compiled between 2012-2014. It contains almost 600,000 words. Written material only for the time being.

4 The CAES Corpus: General Features 5 proficiency levels represented: from A1 to C1. Learners from 6 different L1 : English, French, Arabic, Portuguese, Russian & Mandarin Chinese. 1423 participants from over twenty different countries (502 male & 921 female). Participants’ age ranged from 15 to over 61.

5 Table 1. Main features of the CAES project Compilers Participants' native language Participants' gender Participants' level Participants' main countries represented (Rojo, Palacios, et al.). Arabic Portuguese English French Mandarin Chinese Russian 497 361 227 143 128 67 male female 521 902 A1 A2 B1 B2 C1 526 421 252 162 62 Brazil Morocco USA China France Siria Russia Afghanistan Ireland Algeria Portugal Lebanon Jordan Tunisia 319 312 139 127 92 70 62 52 38 32 31 26 21 16

6 The CAES corpus Table 2. Participants’ distribution according to their L1 and proficiency level ArabicChineseFrenchEnglishPortugueseRussian A15991891327749466 A23641008834425758 B1232698512712341 B2991548419911 C14801826280

7 The CAES Corpus Table 3. Participants’ distribution according to their proficiency level Proficiency levelElementsSample units A1155 458526 A2178 834421 B1116 520252 B280 556162 C142 35062

8 The CAES Corpus Table 4. Participants’ distribution according to their L1 L1ElementsSample units Arabic168 231497 Mandarin Chinese53 163128 French58 412143 English106 968227 Portuguese165 231361 Russian20 71367

9 The CAES Corpus Table 5. Participants’ distribution according to their gender Table 6. Participants’ distribution according to age GenderElementsSample units Male207 992521 Female365 726902 AgeElementsSample units >=15 - =22 - =31 - =41 - =6125 28765

10 The CAES Corpus: Stages in its compilation Stage 1: Before the data collection Computer programme created for the data collection so that participants themselves could enter the data directly in the computer. Protocol prepared and distributed among all the centres that participated in the data collection. Computer programme for data collection was piloted with several groups of students. Participants signed a consent form for the use of the data obtained.

11 CAES Project Figure 1. CAES general interface for data collection

12 CAES project Stage 2: While the data collection Participants had to complete a number of written tasks (3 on average).number of written tasks These tasks were designed according to the CEFR descriptors and DELE tests as well as in accordance with the CI’s General Curricular Document. Examples of activities: - Writing emails to friends & relatives - Critical review of a book - Applying for a job - Booking a hotel room - Making a complaint - Writing a funny story

13 CAES project Stage 3: Text encoding and annotation The texts integrated into CAES adopt the format of XML documents. The texts were tagged both automatically and manually. A total of 702 different tags were used.702 different tags FreeLing, an open source language analysis tool suite, was used to make the necessary adjustments of the equivalences between the FreeLing tagging system and the one our team intended to use. Finally, the texts were manually disambiguated.

14 CAES project Stage 4: The search tool It retrieves statistical information and textual examples of elements, lemmas, word classes and gramatical categories with filters (learner’s L1 and level of proficiency, age, sex, country of origin, etc.) It gives the possibility of distinguishing between lower and higher case words, accented or non- accented. Searches based on co-occurrence of several elements can also be conducted.

15 CAES project Figure 2. CAES search toolsearch tool

16 PART II: STUDY ON FALSE FRIENDS Introduction False friends definition: lexical items whose forms are identical or similar to words in the L1 but whose meanings are different. FF classification: orthographic, phonetic, semantic, contextual, total and partial. Total: Sp. Librería vs. Eng. Library Partial: Sp. Circulación vs. Eng. circulation

17 STUDY ON FALSE FRIENDS: PURPOSE To see the extent to which these lexical items are present in a learner corpus of this size. To explore whether they are problematic words or not. To investigate how they are actually used and what information we can gather from the corpus material. To examine how these lexical items varied from one L1 to another given that the corpus contained samples of learners from 6 different language backgrounds.

18 STUDY ON FALSE FRIENDS: FINDINGS False friends do cause difficulties for learners of Spanish. They are mostly found at the initial stages of language learning, that is, A1 and A2 levels although they are present across all proficiency levels. Let’s consider some examples: English-Spanish: suburb/suburbio, idiom/idioma, firm/ compañia, move/trasladarse, determined/ decidido/a, involve/implicar, large/grande French-Spanish: campagne/campiña, civilisation/cultura, sentiment/impresión Portuguese/Spanish: aula/clase, romance/novela, brincar/ bromear, combinar/quedar, balcâo/mostrador

19 Table 2. Examples of English-Spanish false friends identified in the corpus EnglishSpanishCorpus exampleStudents’ level movetrasladarseLawrence nacio en Pincicolla, Florida en 1975 pero movía a Idaho cuando era muy joven. A1 largegrandeJohn y los otros hombres que eran en la ceremonia llevaron sombreros largos. A2 realisedarse cuentaLa comé la comida misteria y realicé que era pollo! B1 provideproporcionar¿Es posible todavía obtener un lugar en la resendencia universitaria o pudiese aconsejar me con unas agencias que provienen acomodación? B2 in additionademásEn adición, tuve que ir a la casa de mi hermano. C1

20 Table 3. Examples of French-Spanish false friends identified in the corpus French SpanishCorpus exampleStudents’ level campagnecampiña, campoVisitamos a Oxford, Dublin y la campaña irlandesa. A2 se trouverconocerseEncontramos en 2001 cuando veni en Pariz por mis estudios. A2 cuisiner, f aire la cusinecocinarA veces hago la cocina en casa. A2 concoursconcursoCuando el solo tenía 16 años, fue en la competición de X Factor. A2 largeancho/aMi maleta es muy larga y de plástica roja. B1 succèséxitoesperé sin suceso la salida de mi bolso a la llegada B1 entendreoirSoy madame xxxx habia entendido buenas noticias de vuestra compañia... C1

21 Table 4. Examples of Portuguese-Spanish false friends identified in the corpus Portuguese SpanishCorpus exampleStudents’ level combinarquedar, concertarNo puedo llegar la hora combinada. A1 después encontrarme con mis padres en el lugar combinado. A2 sucessoéxitoSu marido hico muchas músicas de suceso en Brasil. A2 contestarmanifestarse, protestarEscribo les para contestar sobre mi equipaje que no ha venido junto a mí en el viaje. B1 lecionarenseñar, impartir claseQuantos professores lecionan en cada curso? B2 passartener lugar, acontecerpelicula esa se pasa en una barrio de Salvador de Bahía que nombra la película. C1 La historia se pasa en Brasil en 2012. B1

22 WORDCOINAGES Interlanguage wordTarget language word hermosidadhermosura contadoracontable opinasopiniones excepcionariosexcepcional excepcionistaexcepcional inhibitóhabitaba hicimos la decisióntomamos la decisión

23 WORDCOINAGES Interlanguage wordTarget language word seriosaseria inexpectadosinesperados ensoladasoleada reservaciónreserva fumantefumador solicitaciónsolicitud garantirgarantizar

24 CODE-SWITCHING/CODE-MIXING  “Mi madre es un accountant y ella es muy buena en matemáticas” (A2, English as L1)  “Me trabajo en un agency” (A1, Russian as L1)  “a continuar su trabajo en el mundo tercera como un ambassador official de el UN” /A2, English as L1)  “Entonces fuinos a la Cloud Forest y hacemos el Zip-line y la Tarzan junp” (A2, English as L1).  “Nosotros fuimos a la carnival de el Lago” (A2, English as L1).  “Entonves el le compró un anel de diamantes muy hermoso que le custó une pequeña fortuna!” (B1, Portuguese).  Vive en un apartamento pero le cuesto mucho pagar la rent (A1, English).

25 FURTHER WORK Plans for incorporating new material: - samples from more learners incorporating data from C2 level learners and from more L1. - spoken data (video recording) - error-tagging system?

26 FINAL REFLECTIONS There is still great scope for further development. Corpus learner research has great potential for investigating how learners actually learn the foreign language. Multiple applications of a learner corpus of this nature: - Spanish as a second language acquisition/learning research - Help for teachers in the planning of lessons. - Syllabus design. - Language teaching materials development. - The field of translation. - Implementing technological resources for the teaching of Spanish.