For our VAULT migration. This page uses Mermaid flowcharts. The left-side nodes are EQUELLA XML paths or a JSON path for the non-XML diagram. They progress to the Invenio field on the right, which has its cardinality in parentheses.
Rounded nodes and dotted lines represent fields we're not migrating. See bottom of the non-XML mapping, for instance.
---
title: MODS (/xml/mods) Mappings
---
flowchart LR
TITLE[titleInfo/title] --> |titles after the 1st, type based on @type| ADTITLE["Additional Titles (0-n)"]
TITLE --> |Split author out of artists books| CREATORS
SUBTITLE[titleInfo/subtitle] -->|type=subtitle| ADTITLE
PARTNUMBER[titleInfo/partNumber] -->|type=other, Number prefix| ADTITLE
ABSTRACTS[abstract] ---> |1st one| DESCRIPTION["Description (0-1)"]
ABSTRACTS --> |abstracts after the 1st, type=abstract| ADDLDESC["Additional Descriptions (0-n)"]
NAMENOTE[name/subNameWrapper/description] --> CREATORNOTE["Combined into creator note"]
NAMEAFFILIATION[name/subNameWrapper/affiliation] -->|Non-CCA affiliation| CREATORNOTE
NAMECONSTITUENT[name/subNameWrapper/constituent] -->|CCA type, e.g. faculty/student| CREATORNOTE
NAMEDEPARTMENT[name/subNameWrapper/department] -->|CCA departments| CREATORNOTE
NAMEGRAD[name/subNameWrapper/gradDate] -->|Graduation/employment dates| CREATORNOTE
CREATORNOTE -->|type=other, Creator note prefix| ADDLDESC
NOTES[noteWrapper/note] -->|See MODS note types, type=other| ADDLDESC
PHYSDESCNOTE[physicalDescriptionNote] -->|type=other| ADDLDESC
BOOKCHAPTER["Faculty Research book chapter"] -->|"'Published in title', type=series"| ADDLDESC
TOC[tableOfContents] -->|type=table-of-contents| ADDLDESC
RTYPE[typeOfResourceWrapper/typeOfResource] -->|Use the 1st in the absence of formBroad| TYPE["Resource Type (1)"]
FORMBROAD[physicalDescription/formBroad] -->|Prefer specificity over typeOfResource| TYPE
NAMES[name/namePart] -->|Parse, org or person, 1 or many| NAMEDETAILS["roleTerm -> role, subName affiliations"]
NAMEDETAILS --> CREATORS["Creators (1-n)"]
DATECREATED[origininfo/dateCreatedWrapper/dateCreated] -->DATELOGIC["Use MODS if present else item.dateCreated"]
SEMCREATED[origininfo/semesterCreated] -->DATELOGIC
RELITEMDATE[relatedItem/part/date] -->DATELOGIC
DATELOGIC -->PUBDATE["Publication Date (1)"]
DATECAPT["origininfo/dateCaptured"] -->|date.type=collected| DATES["Dates (0-n)"]
DATEOTHER["origininfo/dateOtherWrapper/dateOther"] -->|date.type=other| DATES
%% CONTRIBUTORS["Contributors (0-n)"]
ACCESSCONDITION["accessCondition"] -->|default to copyright if no license| RIGHTS["Rights (Licenses) (0-n)"]
FORMBROAD --> SUBJECTSMAP["Subjects Map Lookup"]
FORMSPECIFIC["physicalDescription/formSpecific"] --> SUBJECTSMAP
GENRE["genreWrapper/genre"] --> SUBJECTSMAP
MODSUBJECT["subject"] --> SUBJECTSMAP
PHOTOCLASSIFICATION["photoClassification (CCA/C Subject)"] -->SUBJECTSMAP
SUBJECTSMAP -->|id if found, keyword otherwise| SUBJECTS["Subjects (0-n)"]
OIPUB["originInfo/publisher"] --> PUBLISHER["Publisher (0-1)"]
DBR["relatedItem/title = Design Book Review"] --> |different depending on date| PUBLISHER
HOSTCOLLECTION["relatedItem[@type=host]/title"] -->|Libraries Subcollections e.g. Mudflats| COMM["Communities (1-n)"]
RELATEDURL["relateditem/location"] -->|verify URL, use relateditem@type for relation_type| REL["Related Identifiers/Works (0-n)"]
RELATEDSTANDARDNUM["relatedItem[@type=host]/identifier[@type=ISSN|ISBN]"] -->|relation_type=ispublishedin|REL
DOI["identifier@type=doi"] -->|relation_type=isidenticalto| REL
LOCATIONURL["location/url"] -->|verify URL, relation_type=isidenticalto| REL
EXTENT["physicalDescription/extent"] --> SIZES["Sizes (0-n)"]
LOCATION["location"] --> |physicalLocation->place, copyInformation->description| LOCATIONS["Locations (0-n)"]
ID[identifier] -->|"@type=DOI in Faculty Research collection"| ALTID["Alternate Identifiers (0-n)"]
We are not using the Funders or References fields. Few (no?) VAULT items have funding information and none have the identifiers that Invenio expects. We don't have References lists for any items.
---
title: Local (/xml/local) Mappings
---
flowchart LR
USAGEREQUESTS["usageRequestWrapper/usageRequest"] --> ADDLDESC["Additional Descriptions (0-n)"]
ASERIESV["archivesWrapper/series & subseries"] --> ASERIESI["cca:archive_series"]
VLDEPT["department"] --> CDEPTCODE["cca:course.department"]{}
VCIDEPT["courseInfo/department"] --> CDEPT["cca:course.department_code"]
VFACULTY["courseInfo/faculty"] --> CINSTRUCTORS["cca:course.instructors"]
VFACULTY --> |If we do not have a mods/name| CREATORS["Creators (1-n)"]
VSECTION["courseInfo/section"] --> CSECTION["cca:course.section"]
VSEMESTER["courseInfo/semester"] --> CTERM["cca:course.term"]
VSECTION --> CSECTIONCALCID["cca:course.section_calc_id constructed from section & term"]
VSEMESTER --> CSECTIONCALCID
VCTITLE["courseInfo/title"] --> CTITLE["cca:course.title"]
VIEWLEVEL["viewLevel"] --> |public -> public, anything else -> restricted| ACCESS["Access"]
courseworktype[courseWorkWrapper/courseWorkType] --> |TODO if we have no mods/typeOfResource, map 1st| TYPE["Resource Type (1)"]
Basically, /local/courseInfo maps to a cca:course custom field while /local/archivesWrapper maps to a cca:archive custom field.
---
title: Journal Articles
---
flowchart TB
GENRE["mods/genre or mods/genreWrapper/genre"] --> |Check for article, journal article, or journal| GENRECHECK{"Is article?"}
GENRECHECK -->|Yes| RELATEDHOST["mods/relatedItem[@type=host]"]
GENRECHECK -->|No| NOJOURNAL["No journal custom field"]
RELATEDHOST --> JOURNALTITLE["title or titleInfo/title"]
RELATEDHOST --> ISSN["identifier[@type=issn]"]
RELATEDHOST --> VOLISSUE["part/detail[@type=volume|number]"]
RELATEDHOST --> PAGES["part/extent[@unit=page]"]
JOURNALTITLE --> JFIELD["journal.title"]
ISSN --> JFIELD2["journal.issn"]
VOLISSUE --> JFIELD3["journal.volume & journal.issue"]
PAGES --> JFIELD4["journal.pages"]
JFIELD --> CUSTOMFIELD["journal custom field"]
JFIELD2 --> CUSTOMFIELD
JFIELD3 --> CUSTOMFIELD
JFIELD4 --> CUSTOMFIELD
We use the journal custom field to capture journal information related to articles in the Faculty Research and Open Access Journal Articles (DBR) collections.
---
title: Non-XML Mappings
---
flowchart LR
NAME["item.name"] --> |Use 'Untitled' if absent| TITLE["Title (1)"]
VAULTID["item.uuid + item.version"] -->|"isnewversionof VAULT URL"| REL["Related Identifiers/Works (0-n)"]
OWNER["item.owner.id"] -->|TODO set_owner script| PARENT["parent.owned_by.id"]
COLLABORATORS["item.collaborators.id"] --> |TODO share item with them?| PARENT
COLLECTION["item.collection.uuid"] -->|see also mods/relatedItem@type=host| COMM["Communities (1-n)"]
DATECREATED["item.dateCreated"] -->|Use item creation timestamp if no date in MODS| PUBDATE["Publication Date (1)"]
ATTACHMENTS[item.attachments] -->|guess MIME type from filenames| FORMATS["Formats (0-n)"]
ATTACHMENTS -->|File, HTML, & Zip attachments| FILES["Files (0-n)"]
ATTACHMENTS -->|URL, YouTube attachments are related works| REL
STATUS["item.status"] -->|if item.status = 'live'| PUBLISH["Don't publish draft/archived items"]
There's additional, mostly administrative, metadata in the EQUELLA item JSON outside the XML. We represent file/attachment operations here, too.
While VAULT has version information, we plan only to migrate the most recent, live versions of items, so displaying the version number with no way to access prior iterations can only lead to confusion. The version number will still be accessible in the copies of VAULT metadata we store on migrated items.
We do not use displayFields, displayOptions, DRM, modifiedDate, navigation, rating, or thumbnail in the migration. Most of these are specific to EQUELLA's display features.