Retooling Digitization Workflows at UNCC

1 Retooling Digitization Workflows at UNCCSimple Techniqu...
Author: Jocelyn Powers
0 downloads 2 Views

1 Retooling Digitization Workflows at UNCCSimple Techniques for Project Management and Metadata Creation Rita Johnston, Digital Production Librarian, UNC Charlotte Joseph Nicholson, Metadata Librarian, UNC Charlotte

2 UNCC Special Collections & University Archives OverviewCharlotte College  UNC Charlotte in 1965 Large department includes many non-traditional archives areas of responsibility Permanent Digital Production Librarian position (responsible for digitization) established in July 2016. Cataloging department dismantled in ??, Transitioning from old content management systems to new There could be several slides here with images of workspaces (dig. equipment, etc)

3 Staffing Challenges Rita.Few permanent staff to take on enormous task of building digital collections. Work accomplished by a mix of faculty, staff, and student labor Need simple-enough processes and tools to accomplish work While creating quality metadata that can be transformed to MODS and that leverages linked data

4 Technical Challenges Technically proficient staff, but no coders, techies Budget constraints = heavy reliance on student labor Students and paraprofessionals participate in all parts of workflows Need tools that students, paraprofessionals, and faculty with varying levels of technical sophistication can easily learn and use Joseph

5 Migration from two old repositories…Atkins Digital New South Voices (CONTENTdm)

6 …to one new repository Built on Islandora frameworkWill eventually be the sole location for digital collections Solution for both preservation and access

7 Transition to New Metadata Standard MODS Serialized in XML Similar to MARC Very granular Hierarchical Attributes that refine meaning of elements

8 Project 1: Oral History InterviewsDescriptive practices already established, but needed tweaking Modernized to fit needs of onward within staffing constraints

9 Digital Legacy of Oral History Program

10 Project 2: Motorsports PhotographsNewly acquired collection in 2014 Over 100,000 photographic images (negatives, slides, born- digital) Had to learn and adapt along the way

11 Motorsports: Brimming with Challenges/OpportunitiesAsked to digitize this collection as an enticement for the donor to donate his collection. Digitization person (myself) was funded through a different grant that had nothing to do with motorsports, had to limit time spent on project till July 2016 Our first digitization project of negatives. Did not begin project with adequate equipment, all our images were blurry. Donor willing to assist with metadata creation However he was perhaps led to have unrealistic expectations about our rate of production.

12 Motorsports Metadata Project100,000+ images Item-level metadata needed for every image Controlled vocabulary for all persons identified, locations, racetracks, events Donor metadata an essential ingredient Greatest need: Getting huge volume of metadata in spreadsheets into MODS swiftly and painlessly Also anticipated: Splitting large files of MODS records in wrapper into individual files

13 Oral Histories NACO ProjectNAF name forms used for personal names in oral history project— but many don’t have records Objective: Create NACO records for oral histories project names Problem: How to use skills of non-catalogers working on project to create NACO records?

14 2 Projects, Similar Processes & ToolsCommunication & Project Management Metadata Creation G Suite (Google) Trello FAST Open Refine XSLTs

15 G Suite

16 Introducing G Suite to Oral History WorkflowSheets for tracking progress on interviews, eventually for staging of ingests Docs to replace old .txt, .xml, and Word docs as working documents We started out slowly and introduced Sheets and Docs first, plus Drive of course. When I started in 2013, our department had a shared Drive space, which became more widely adopted after Mention advantages here: Flexible set of tools for collaboration Easy access by users with different permissions By 2013, increasingly used across UNCC department and library

17 G Suite in the Motorsports ProjectMultiple Sheets to capture donor’s original data, digitization information, and metadata. (some Excel too) Advantages mentioned before: accessibility, multiple collaborators at once, versioning

18 Motorsports Metadata Project: Authority ControlMulti-tabbed spreadsheet in Google Docs with controlled terms for people, tracks, race names, series

19 Motorsports Metadata Procedure: ExcelImages loaded into Excel spreadsheet Each row represents a metadata record Controlled vocab terms from Google Docs spreadsheet Problem: cells with multiple values

20 Motorsports Metadata Procedure: Excel CodeTab 2 of spreadsheet: Controlled values inserted in columns Tab 1 of spreadsheet: Dropdowns insert controlled terms from Tab 2 with a click (but code needed for multiple values per cell)

21 Authorities Project: Google FormsData entered in form is automatically transferred to Google spreadsheet

22 Creating a Google Form 1. Create form by choosing “blank” form from main screen 3. Give form a name 4. Begin creating questions—multiple choice, short answer, paragraph questions are options 5. Create additional questions by duplicating previous questions; choose if responses are required or not 2. Choose color palate and/or design template

23 Trello

24 Trello for Communication in Oral HistoryFilled a need for communication that spreadsheet wasn’t accomplishing Visual representation of each interview Integrates with G Suite Free, easy to use

25 Trello Card: Oral History

26 Trello for Communication in MotorsportsCreating a Trello board successful for OH, so I created one to manage digitization work for Motorsports as well Many rotating students working on this project. Board makes it so they can come in and know what they can work on without asking a staff member. Students use this alongside the digitization spreadsheet to do the work

27 Trello Board: Motorsports Metadata WorkflowComments in Twitter-like interface make communicating with project staff about particular tasks very simple

28 Transition to New Subject Vocabulary: FAST = Faceted Application of Subject TerminologyNeeded: Method of assigning subject terms that does not require deep training in LCSH FAST: Derived from Library of Congress Subject Headings (LCSH) Geographical, topical, chronological, genre facets recorded as separate terms: no lengthy subject strings Compatible with existing metadata Users do not have to put complex LCSH strings together FAST: LCSH Minorities Romania--Transylvania History--Sources FAST

29 FAST: Gains and Losses Old repository: LCSH All project staff, including students, can assign with ease Keyword searchable FAST interface Accelerated workflows Some loss of subject specificity Browse searches compromised New repository: FAST LCSH FAST

30 FAST: Application in Oral History ProjectSimple enough for students and staff to apply Can keep a running list of commonly used terms Vast improvement over old keyword subject terms We still use some terms not in FAST (but not topics)

31 OpenRefine Freely available, easily downloadable tool for:Data cleaning or wrangling Transformation to other data formats Adding data (reconciliation with data sets) Interface very like Excel—but with more features

32 Motorsports Metadata Step 1: Create Spreadsheet, Import into OpenRefine, Create Project2 3

33 Motorsports Metadata Step 2: Clean DataQuick transformations Faceting Google Refine Expression Language Clustering

34 Motorsports Metadata Step 3: Creating the TemplateSpreadsheet column titles are inserted inside MODS elements (sandwiched inside jsonize syntax). In the OpenRefine transformation, data in cells in those columns will be mapped to the designated MODS element. {{jsonize(cells["Object Type"].value)}}

35 Motorsports Metadata Step 3 ContinuedColumn titles from spreadsheet must match column titles in template exactly—case-sensitive

36 Constant Data Constant data—data that remains the same from record to record— can be included in template. Those elements will be applied to all records created through this process.

37 Splitting Rows with Multiple ValuesSpreadsheet cells with multiple values must be split in OpenRefine so that each value is mapped to its own MODS element.

38 Splitting Columns of DataColumns with multiple values split twice After splitting, columns are renamed to match template spreadsheet

39 Transforming the Data Click on “Export” and then on “Templating”modsCollection root element here Click on “Export” and then on “Templating” 2. Paste template in left side of “Templating Export” boxes modsCollection closing element here

40 Transforming the Data Continued

41 Post-Transformation ProblemsOther major problem: large modsCollection file needs to be split into individual MODS records Empty elements may appear in resulting XML file if metadata records contain fewer elements than were provided for in the template

42 XSLT = Extensible Stylesheet Language TransformationsLanguage that transforms XML documents Transform one flavor/standard of metadata into another Add, remove, change elements in a metadata file or group of files Clean up XML documents Change XML into HTML—or embed elements from XML document into HTML Break up a large file into smaller files XSLT documents can be very complex or simple, depending on task that needs to be accomplished Can cut some content if short on time.

43 Problem 1: Empty Elements in XML FileBefore transformation: Empty elements Simple XSLT identity transformation copies entire contents of file and normalizes spacing—i.e. removes empty elements and attributes After transformation: empty elements stripped out

44 Problem 2: Large Metadata File Needs to be Split into Individual RecordsBefore transformation: 100+ records in single collection-level metadata file xsl:result-document creates multiple new fie and inserts new root element. “href” attribute instructs XSLT to rename files with mods:identifier in each record After transformation: collection-level file broken up into individual metadata records, each with identifier for image as filename

45 XSLT: Running the Transformation1. Identify XML file you want to apply XSLT to 2. Choose destination for results of transformation (folder) 4. Results of transformation appear in folder 3. Click “apply associated”

46 XSLT: Building other stylesheets1. Metadata standard-agnostic XSLT could apply to any XML metadata 2. New template added to manipulate MODS rights element; MODS namespace has to be dropped in root element 3. Additional template inserted to adjust geographical subjects

47 Google Form Data to Authority Records Step 1MARCXML Metadata Template OpenRefine Transformation

48 Google Form Data to Authority Records Step 2“Cleaning” XSLT applied MARCEdit used to transform MARXML file to MARC21

49 Google Form Data to Authority Records: ResultsMethod of capturing NACO data from other workflows Records can be transformed into MADS for repository via separate XSLT (Nota bene --Records are of course checked carefully before submission to NAF)

50 End Product: Motorsports Metadata Display

51 End Product: Oral History Metadata Display

52 Conclusion Free & cheap tools that have worked for us: G Suite TrelloFAST OpenRefine XSLT

53 Thank You! Rita Johnston, Digital Production Librarian, UNC CharlotteJoseph Nicholson, Metadata Librarian, UNC Charlotte