HELLO CAMPANIA!

Permanent URI for this community

http://hdl.handle.net/20.500.12084/94

The HELLO Campania Corpus is a corpus of sociolinguistic interviews and linguistic tasks with speakers of minority languages. The speakers belong to 5 groups: Filipino, Sri Lankan, Ukrainian, Senegalese, Ghanese and Bangladeshi.
The corpus was created with the following broad aims:

to study immigrant minority language communities in Campania, Italy, in order to examine their linguistic practices with regard to the choice of Italian, the local dialect and their heritage language(s), and how these relate to external variables;
to analyze contact-induced language change in Tagalog, which is spoken as a heritage language in Italy.

The sociolinguistic interviews were conducted in Italian (in 7 cases the participants use English and in 4 cases French), so the corpus contains a sample of Italian spoken as L2. For the Filipino group, the corpus also includes Tagalog language data. For a description of the project and the methodology used for data collection, see Moro & Di Salvo (forthcoming).
The corpus consists of 307 audio files for a total of 95h 25m of recordings (public) and their relative transcriptions in ELAN (upon request).
The collection is organized in 6 bundles by group:

The Filipino bundle contains data for 66 speakers: 32 first generation, 28 second generation, 6 homeland. They completed three tasks: the sociolinguistic interview (in Italian, in Tagalog only homeland speakers), the Frog story (in Tagalog), a description of video clips (in Tagalog).
The Sri Lankan bundle contains data for 48 speakers: 30 first generation, 18 second generation.
The Ukrainian bundle contains data for 29 speakers: 28 first generation, 1 second generation.
The Senegalese bundle contains data for 16 speakers (first generation only).
The Ghanese bundle contains data from 10 speakers: 3 first generation, 7 second generation.
The Bangladeshi bundle contains data from 11 speakers (first generation only).

Each bundle contains a metadata file providing information about:

generation, age, gender, educational level, occupation of the participant
linguistic background of the participant (L1(s) and other languages known)
type of recording (sociolinguistic interview, Frog story, description of videoclips)
main language used in the recording
date of the recording
length of the recording
who collected data and who transcribed them