Spark OAI Harvester

Last updated August 18, 2017. Created by Peter Murray on August 18, 2017.
Log in to edit this page.

The DPLA is launching an open-source tool for fast, large-scale data harvests from OAI repositories. The tool uses a Spark distributed processing engine to speed up and scale up the harvesting operation, and to perform complex analysis of the harvested data. It is helping us improve our internal workflows and provide better service to our hubs.  The Spark OAI Harvester is freely available and we hope that others working with interoperable cultural heritage or science data will find uses for it in their own projects.

Technology
Package Type: 
License: 
Development Status: 
Operating System: