For our VAULT migration. This page uses Mermaid flowcharts. The left-side nodes are EQUELLA XML paths or a JSON path for the non-XML diagram. They progress to the Invenio field on the right, which has its cardinality in parentheses.
Rounded nodes and dotted lines represent fields we're not migrating. See bottom of the non-XML mapping, for instance.
---
title: MODS (/xml/mods) Mappings
---
flowchart LR
TITLE[titleInfo/title] --> |titles after the 1st, type based on @type attribute| ATITLETYPE["type defaults to other"]
TITLE --> |Split author out of artists books| CREATORS
SUBTITLE[titleInfo/subtitle] -->|type = subtitle| ATITLETYPE
ATITLETYPE --> ADTITLE["Additional Titles (0-n)"]
ABSTRACTS[abstract] ---> |1st one| DESCRIPTION["Description (0-1)"]
ABSTRACTS --> |abstracts after the 1st, type = abstract| ADDLDESC["Additional Descriptions (0-n)"]
NOTES[noteWrapper/note] -->|See MODS note types, type = other| ADDLDESC
RTYPE[typeOfResourceWrapper/typeOfResource] -->|Use the 1st one| TYPE["Resource Type (1)"]
FORMBROAD[physicalDescription/formBroad] -->|In the absence of typeOfResource| TYPE
NAMES[name/namePart] -->|Parse, org or person, 1 or many| NAMEDETAILS["roleTerm -> role, subName affiliations"]
NAMEDETAILS --> CREATORS["Creators (1-n)"]
DATECREATED[origininfo/dateCreatedWrapper/dateCreated] -->DATELOGIC["Use MODS if present else item.dateCreated"]
SEMCREATED[origininfo/semesterCreated] -->DATELOGIC
DATELOGIC -->PUBDATE["Publication Date (1)"]
DATECAPT["origininfo/dateCaptured"] -->|"date.type = collected"| DATES["Dates (0-n)"]
DATEOTHER["origininfo/dateOtherWrapper/dateOther"] -->|"date.type = other"| DATES
%% CONTRIBUTORS["Contributors (0-n)"]
ACCESSCONDITION["accessCondition"] -->|default to copyright if no license| RIGHTS["Rights (Licenses) (0-n)"]
MODSUBJECT["subject"] -->|look up in subjects map| SUBJECTS["Subjects (0-n)"]
FORMSPECIFIC["physicalDescription/formSpecific"] -->|plain keyword TODO should we map these?| SUBJECTS
GENRE["genreWrapper/genre"] -->|look up in subjects map| SUBJECTS
OIPUB["originInfo/publisher"] --> PUBLISHER["Publisher (0-1)"]
DBR["relatedItem/title = Design Book Review"] --> |different depending on date| PUBLISHER
HOSTCOLLECTION["relatedItem[@type=host]/title"] -->|Libraries Subcollections e.g. Mudflats| COMM["Communities (1-n)"]
EXTENT["physicalDescription/extent"] --> SIZES["Sizes (0-n)"]
%% We don't have structured location information & it does not display in Invenio, skip
%% LOCATIONS["Locations. We only have place names, no IDs or coordinates. (0-n)"]
ID[identifier] -->|@type = DOI in Faculty Research collection| ALTID["Alternate Identifiers (0-n)"]
We are not using the Funders or References fields. Few (no?) VAULT items have funding information and none have the identifiers that Invenio expects. We don't have References lists for any items.
---
title: Local (/xml/local) Mappings
---
flowchart LR
ASERIESV["archivesWrapper/series, archivesWrapper/subseries"] --> ASERIESI["cca:archive_series custom field"]
VLDEPT["department"] --> CDEPTCODE["cca:course.department"]
VCIDEPT["courseInfo/department"] --> CDEPT["cca:course.department_code"]
VFACULTY["courseInfo/faculty"] --> CINSTRUCTORS["cca:course.instructors"]
VFACULTY --> |If we do not have a mods/name| CREATORS["Creators (1-n)"]
VSECTION["courseInfo/section"] --> CSECTION["cca:course.section"]
VSEMESTER["courseInfo/semester"] --> CTERM["cca:course.term"]
VSECTION --> CSECTIONCALCID["cca:course.section_calc_id constructed from section & term"]
VSEMESTER --> CSECTIONCALCID
VCTITLE["courseInfo/title"] --> CTITLE["cca:course.title"]
VIEWLEVEL["viewLevel"] --> |public -> public, anything else -> restricted| ACCESS["Access"]
courseworktype[courseWorkWrapper/courseWorkType] --> |TODO if we have no mods/typeOfResource, map 1st| TYPE["Resource Type (1)"]
Our /local metadata is often mapped to custom fields.
---
title: Non-XML Mappings
---
flowchart LR
NAME["item.name"] --> |Use 'Untitled' if absent| TITLE["Title (1)"]
VAULTID["item.uuid + item.version"] -->|"'is new version of' VAULT URL"| REL["Related Identifiers/Works (0-n)"]
OWNER["item.owner.id"] -->|TODO set_owner script| PARENT["parent.owned_by.id"]
COLLABORATORS["item.collaborators.id"] --> |TODO share item with them?| PARENT
COLLECTION["item.collection.uuid"] -->|see also mods/relatedItem@type=host| COMM["Communities (1-n)"]
DATECREATED["item.dateCreated"] -->|Use item creation timestamp if no date in MODS| PUBDATE["Publication Date (1)"]
ATTACHMENTS[item.attachments] -->|guess MIME type from filenames| FORMATS["Formats (0-n)"]
ATTACHMENTS -->|File, HTML, & Zip attachments| FILES["Files (0-n)"]
ATTACHMENTS -->|URL, YouTube attachments are related works| REL
STATUS["item.status"] -->|if item.status = 'live'| PUBLISH["Don't publish draft/archived items"]
DISPLAYFIELDS("item.displayFields")
DISPLAYOPTIONS("item.displayOptions")
DRM("item.drm")
MODIFIED("item.modifiedDate")
NAVIGATION("item.navigation")
RATING("item.rating")
THUMBNAIL("item.thumbnail")
There's additional, mostly administrative, metadata in the EQUELLA item JSON outside the XML. We represent file/attachment operations here, too.
While VAULT has version information, we plan only to migrate the most recent, live versions of items, so displaying the version number with no way to access prior iterations can only lead to confusion. The version number will still be accessible in the copies of VAULT metadata we store on migrated items.