Data Curation Preservation Issues (Facilities, Digital repository systems and High performance computing)
The
rapid growth of digital research data has transformed the way knowledge is
created, shared and preserved. Across disciplines, researchers generate vast
quantities of data through experiments, simulations, observations and
computational analyses. As a result, data curation and preservation have become
critical components of the research lifecycle. The key factors influencing
successful data curation are the availability of appropriate facilities, robust
digital repository systems and advanced computational infrastructure such as
high-performance computing (HPC). However, each of these components presents
unique challenges that institutions must address.
One
of the fundamental requirements for effective data preservation is the
availability of adequate facilities and infrastructure. Research institutions
need reliable storage environments, backup systems, secure networks and
disaster recovery mechanisms to protect digital assets (Masenya & Ngulube, 2019; Shah et al., 2021). However,
maintaining such facilities can be costly and technically demanding. According
to Zareef and Jabeen (2025), many
institutions struggle to balance increasing data volumes with the
infrastructure needed to manage them effectively. Without secure storage
facilities and redundancy mechanisms, digital collections are at risk of
corruption, accidental deletion or catastrophic loss (Shah et al., 2021).
Closely
linked to physical and digital infrastructure are digital repository systems
which serve as platforms for storing, managing and disseminating research data.
Repositories play a crucial role in ensuring that datasets remain discoverable,
accessible and reusable over time (Shah et al., 2021). Effective
repositories support metadata creation, version control, access management and
preservation workflows. However, repositories often lack clear governance
structures resulting in inconsistent metadata practices and poor preservation
planning (Rothfritz et al., 2026). This
reduces discoverability of data and hampers long-term usability of stored
content. Again, keeping repository platforms such as DSpace or EPrints updated
is a recurring problem, leading to security and usability issues.
The increasing use of high-perfomance computing (HPC) has
introduced both opportunities and challenges for data curation. HPC systems
enable researchers to process and analyze massive datasets generated through
scientific simulations, artificial intelligence applications and data-intensive
research (Almeida & Okon, 2025). While these
systems significantly enhance research capacity, they also produce
unprecedented volumes of data that require effective management and
preservation strategies. Several technical and organizational obstacles
continue to affect the integration of HPC and digital preservation systems.
These include insufficient storage capacity, difficulties in transferring large
datasets, lack of standardized metadata practices and inadequate coordination
among researchers, information professionals and information technology
specialists (Almeida & Okon, 2025; Yoon et al., 2025). As Arms (2008) observed, modern
science has entered a data-intensive era in which data volumes often exceed
traditional storage and management capabilities. As a result, institutions must
develop integrated approaches that connect HPC environments with data
repositories and preservation infrastructures.
To address these issues, institutions are increasingly
adopting best practices such as implementing trusted digital repositories,
utilizing cloud-based storage solutions, applying FAIR (Findable, Accessible,
Interoperable and Reusable) data principles and investing in scalable cyber
infrastructure (Wilkinson, 2016). Advances in
cloud computing, automated metadata generation and distributed storage
technologies are also helping institutions improve preservation efficiency and
resilience.
In conclusion, facilities, digital repository systems and
high-perfomance computing are essential components of modern data curation and
preservation. While they provide the infrastructure needed to manage and
safeguard valuable research data, they also introduce significant technical,
financial and organizational challenges. Future success in digital preservation
will depend on sustained investment in infrastructure, stronger collaboration
among stakeholders and continued adoption of innovative technologies that
support long-term data stewardship.
https://www.youtube.com/watch?v=64-mBFdWTtM
References
Almeida, F., & Okon, E. (2025). Assessing the impact of
high‑performance computing
on digital transformation : benefits, challenges, and size‑dependent differences. The Journal of Supercomputing.
https://doi.org/10.1007/s11227-025-07281-z
Arms, W. Y. (2008). Cyberscholarship: High Performance
Computing Meets Digital Libraries. 11(1).
https://doi.org/http://dx.doi.org/10.3998/3336451.0011.103
Masenya & Ngulube. (2019). Digital preservation
practices in academic libraries in South Africa in the wake of the digital
revolution. 1–9. https://doi.org/https://doi.org/ 10.4102/ sajim.v21i1.1011
Rothfritz, L., Matthias, L., Pampel, H., & Wrzesinski, M.
(2026). Current challenges and future directions for institutional
repositories: A systematic literature review. An Annual Review of Information
Science and Technology (ARIST) paper. Journal of the Association for
Information Science and Technology.
Shah, U. A., Hussain, M., Saddiqa, M., & Yar, M. S.
(2021). Problems and Challenges in the Preservation of Digital Contents : An Analytical Study.
Library Philosophy and Practice.
https://digitalcommons.unl.edu/libphilprac/5628
Wilkinson, M. D. (2016). Comment : The FAIR Guiding
Principles for scientific data management and stewardship. 1–9. https://doi.org/10.1038/sdata.2016.18
Yoon, A., Kim, J., & Donaldson, D. R. (2025). Big data
curation framework: Curation actions and challenges. Journal of Information Science, 51(1), 205–223. https://doi.org/10.1177/01655515221133528
Zareef, M., & Jabeen, M. (2025). Systematic literature
review of digital curation services in academic libraries ( 2001 – 2023 ): A
global perspective. Journal of Information Science, 1–29.
https://doi.org/10.1177/01655515241305348
This is great
ReplyDeleteExcellent
ReplyDeleteNice job
ReplyDeleteGreat job
ReplyDeleteClear and well written
ReplyDeletePower surges are often underrated but they have caused catastrophic data losses on servers fed directly to the power grid.
ReplyDelete