Trusted Repository Certification and ICPSR


All right great, well thanks for having
me for this webinar and thanks for attending. So today I’m going to talk
about trusted repository certification especially in context of recent
certification we received from the CoreTrustSeal. We received that just a
couple months ago and we’re very excited about that and so today I’m going to
talk about why this trusted repository certification matters for both us and
also our entire community what certification involves what we’ve
gone through what the process looks like so if you’re looking at going through
certification yourself as a repository you can hopefully learn from this
experience that we’ve gone through, or if you’re a user you’re understanding
what’s involved with repository certification. And then lastly I’ll speak
about the benefits that we’ve seen for us and our stakeholders. So the first
thing to discuss is why repository certification matters. So for us at
ICPSR we’ve been around since 1962, we started with 22 members, now we’re
this worldwide consortium of over 700 and I want to say 80 members. And we
cover a wide range of disciplines but it’s all of the social and behavioral
sciences. We have many many files, 10,000+ studies, many of these are
restricted studies and so we do a lot to protect confidentiality and ensure
the data are stored correctly. We also provide services on top of those data. We
curate them, we add related citations through our bibliography of data related
literature, we provide usage metrics, teaching modules and the like.
There are lots and lots of users we have approximately 60,000 active MyData
accounts and these collections span these thematic holdings from addiction
on HIV to child care and early education to healthcare and medical care. So we
have lots of data we’ve been doing this for a while. And one, we think it’s really
important to safeguard our collections and so going through us repository
certification can help us prepare for that and continue to safeguard them well.
Even with 55 years of experience and knowledge and staff, we find it still
useful to benchmark our own practices against these common standards, and
especially standards that hold true across disciplines so other
repositories in the biomedical or other sciences have gone through these same
certification process and used very similar standards regardless that of data
type. But we find that it’s important to improve our our internal documentation
and the repository certification can help. It improves our own transparency
both internally but also to our users and another aspect is that going through
the certification process shows that we care, that we’re not just making these
things up on our own and that we’re actually going through these different
processes to improve ourselves for our users. That’s for ICPSR. Other
stakeholders though, we have several funders and I think many other
repositories do too. This is a list of many of the funders that we’ve
had or currently have relationships with. And these funders are trusting us as
repositories to safely and securely share and preserve their data. This is a
graph or a chart from us a few years ago, it was presented
to the Advisory Committee to the US National Institutes of Health Director
presented by the Associate Director for data science in 2016, and what this shows
is that funders make an enormous investment in a data repository. Here
it’s over a billion US dollars. And so by making this investment they’re
placing a lot of trust in us, and going through these certification processes
helps show to the funders especially that we are following best practices and
following the best procedures for ensuring that this huge investment that
they’re making continues on and is usable by their designated communities
over time. And another area or group of stakeholders will be researchers there
are more repositories now available and than ever, I imagine. This is the re3data.org, it’s a registry of research data repositories. There are over 2,400
repositories indexed right now, I’m sure they’re probably hundreds more or even
thousands more that aren’t indexed here. And so lots and lots of different
options for researchers both to find data but especially to share their data. Many of these make claims of safely and
securely preserving the data and making the data available over time, claims like
we’ll keep it forever, it’s guaranteed, we promise. Those are explicit claims. There are also claims that are implicit or tacit, where there’s nothing set but
there’s that assumed preservation or trustworthiness behind them. But interestingly, this is a
quote from the Trustworthy Repositories Audit Certification manual from
2007, its that, “Claims of trustworthiness are easy to make but are thus far
difficult to justify or objectively prove.” So it’s easy to say, “trust us,”
but it’s harder in more intensive to actually go through that process and
prove that you’re trustworthy and that’s important.
And by trustworthy, we mean a trusted digital repository is one whose mission
is to provide reliable long-term access to manage digital resources to its
designated community both now and in the future. This is a an example I found
in a book by the Society of American Archivists Trends and Archives Practice
Series Module 8, Becoming a Trusted Digital Repository. In that, I think in
the preface, the author Bruce Ambacher discussed how the evaluation of
trustworthiness of repositories changed with the transition to the digital. So
with the physical items a trustworthy archives was physically impressive and
imposing, kind of like this building the National Archives here in the United
States. Donors could judge by that building and by the physical space
whether it was secure, what the environment was, the shelving, the
containers, even the size and expertise of staff so donors could, when they were
physically giving their items, they could make these assessments themselves about
the trustworthiness of that physical archive. This is just an example from
within that building, here’s the actual Constitution behind this thick plated
glass with guards, and the Constitution was lowered down
into a secure vault. Likewise they could do the same if they’re depositing
physical items, they could assess the containers, how well it’s stored and the
like. But with digital data and with digital repositories or data
repositories, making that judgment is not as straightforward. Usually you’re
accessing it through a computer or a portal and there’s no similar physical
objective metric where you with your own physical eyes can assess whether the
data are stored safely securely and the like. You almost have to just trust that
archive. And so that’s that’s really the value of these assessments, is that it
provides a verifiable process that the users and others can make that
similar assessment of, “is the archive actually going through the processes
that they say they’re going through?” and making sure that your data are
safeguarded. So, “If we want to be able to share data we need to store them in a
trustworthy data repository. Data created and used by scientists should be managed
curated and archived in such a way to preserve the initial investment.
Researchers must be certain that data held in archives remain useful and
meaningful into the future.” This comes from the introduction to the CoreTrustSeal requirements. And these trustworthy certification processes do a number of
things. One, they provide a transparent view into the repository. Two, they
improve the processes and procedures of that own repository, and they help
measure against a community standard and then once measured they help the repository
then improve against again that community standard that’s not only
within that discipline, but as I said before,
across disciplines. And then the last thing here is it helps promote trust
by the funding agencies, data producers, data users that the data will be
available for the long term. So let’s set the stage for why it’s
important for our stakeholders. Next I’m going to go into what assessment
involves, what it takes from the perspective of a data repository. Assessments have been developed since the mid 90s, this is from the
Society of American Archivists Trends in Archives practice series, this module
called, “Becoming a Trusted Digital Repository,” and what it shows is
a figure from the 2008 Annual Meeting of the Society of American Archivists, it shows
the development of digital record standards across multiple communities
and multiple assessments and standards have come up over the years,
they’re some very intensive detailed certification standards and they’re more
lightweight but also good standards to use. And in the data repository world we,
from ICPSR’s perspective we’ve gone through several assessments. Today I’m
going to talk about this more basic certification the CoreTrustSeal. We’ve
also gone through an internal assessment against what’s called the Trustworthy
Repositories Audit and Certification, it’s now developed into an international
standard ISO 16363, there are other alternatives out there. So we’ve done this over time, and over
time we’ve learned quite a bit from these both internal test audits and
external certifications and assessments. We started in 2005 with
this TRAC assessment and in 2009 we did a Data Seal of Approval certification.
Data Seal of Approval was a precursor of the CoreTrustSeal assessment or
certification process. We then again went through a re-up, or update, of the data
seal of approval in 2013 and also did a separate World Data System certification. And then more recently the Data Seal of Approval and World Data System
certification merged and developed into this CoreTrustSeal certification and
we completed that recently and very happy about that.
Even though they’re there are different certification processes to go through
there are common elements of assessment. One, typically they look at organization and
its framework, so what are the governance, staffing, policies, finances… Another area
they look at is technical infrastructure, so security, system design, and then
another area is the treatment of the data, so providing access, how do you
ensure that the integrity of the data remains over time, what are the
processes, how do you ensure that there’s preservation. And so I’m going to go
through the the CoreTrustSeal process that we went through just recently. Like
I said it was developed by this partnership Working Group of the
Research Data Alliance, it’s a merging of these two certifications, and there are
16 different criteria or guidelines you use when going through this assessment. They speak to three areas. One, the organizational infrastructure, so there
are six questions about that. Two, about the digital object management,
and then the third area would be technology and a couple questions about
that. So specifically ICPSR had to go through
and answer questions and provide documentation about these areas of
organizational infrastructure, so do we have an explicit mission to provide
access to preserve data in our domain, show that we maintain all applicable
licenses covering data access and use, show that we have a continuity plan to
ensure ongoing access and preservation, show that we’re ensuring to the extent
possible that data are created, curated and accessed in compliance with
disciplinary and ethical norms, show that we have adequate funding and sufficient
numbers of qualified staff and that we’re adopting mechanisms to secure
ongoing expert guidance and feedback. The next area was showing that we
guarantee the integrity and authenticity of the data, that were accepting data and
metadata based on defined criteria, that we’re applying documented processes and
procedures, and we’re assuming responsibility for the long term
preservation and management, that we have the appropriate expertise, that the archiving process takes place according to defined workflows, that
we’re enabling users to discover the data and refer to them in a persistent
way through proper citation, and enabling reuse of the data over time. And from the
technology side, show that it functions on well supported operating
systems and as well as software technologies designed to service the
designated community, and showing that the technical infrastructure provides
for protection of the facility in its data. So these different areas we were
asked to go through, document, and explain how we’re meeting each criterion, and
then also provide transparent references, so public references
on either documents that we have on the lab or if it’s private documentation at
least show a reference to that and that external users could at least see that
there’s that internal documentation and ask about … and if it’s public of course
they could link out to it. So this is an example of one of the criterion, R5. So
the guideline text is, “The repository has adequate funding and sufficient numbers
of qualified staff.” They provide guidance about that text so they’re
asking the range and depth of expertise of both the organization of the staff,
including any relevant affiliations, is appropriate to the mission. They
actually provide more guidance than this and great documentation that they have
on their website. What we did then is go through what they were asking, go through
the guidance that they provided about each of the areas we thought that they
needed a response or would be good to respond to, and then in this document (it’s an online a form that you fill out) we then summarized our response to that guideline or that criterion, and so here we talked
about our external counsel, you can see we have a footnote to that which links
up to a public website. We talk about our staff, the expertise there, and then
again we linked out to the public facing documentation. Here is the references
list and the URLs and the format that we provided it in. What happens then
is that this goes to an external or a couple of external reviewers who then will review
what we’ve submitted, especially looking at the public documentation, and assess
whether we’re meeting the what they’re asking
for or not, and to varying levels of adherence. Okay so, talking about the
effort and resources required to go through a certification process like
CoreTrustSeal. For us for this recent round of review it was three to five
days of time by myself and that amount of time is largely because we’ve gone
through certifications before, I think earlier I mentioned the other
reviews and certifications starting in 2006, and so because we’ve gone through
the process before and had already created documentation the amount of time
was very minimal. And we also expect that recertifying in the future which with
the CoreTrustSeal is done every three years I believe, that that amount of time
will remain minimal or may even decrease simply because we’ve gone through it, we
have the documentation in place and we know the process. If you’re doing this
fresh from scratch it would take longer. I can’t say exactly how much
longer but [for] others I’ve spoken to it’s been several weeks maybe even longer. And a
lot of that is not just the review but coming up with documentation that you’re
going to either post publicly or have available internally, but it’s definitely
worth it especially for that first pass. So what does it look like
in terms of the process? Once we were done completing our own internal review
and filling out all of the documentation and ensuring we had everything ready we
submitted the application through the CoreTrustSeal
website then they responded to us and asked us to pay an administrative fee of
one thousand euros which we then did, and once
we had submitted the fee or the application fee they then reached out
again and said they were assigning reviewers, in our case two external
reviewers, that process took a couple of months for the reviewers to go through
and check to see what we wrote up, look at our external evidence, and then the CoreTrustSeal Board again got back to us and along with the reviews. The reviews didn’t have individual names associated with them, just the scores the
assigned and the comments that they provided which were very helpful.
We then updated our application as needed. Because we’d gone through this
before I think the number of reviews or the number of comments that they
provided were pretty minimal but still useful. And then we updated our
application, resubmitted, after which time the CoreTrustSeal Board got back to us
quickly and said that we had received the certification and gave us a logo or
an image that we could post to our website, and we also put up a blog post
about that. So if you go to our home page there’s a nice CoreTrustSeal
Core Certified image which links to a blog post describing what we went
through to be certified. So that takes us into the third part which is
the benefits to our stakeholders and really I think applies to any data
repository stakeholders. And because the CoreTrustSeal certification
process had built upon these earlier certifications or
self-audits we’ve gone through, I’m going to talk about the benefits learned along
the way, so not just of going through CoreTrustSeal but of these earlier
certifications and discuss both the
lessons we’ve learned, the changes we’ve made, and the overall benefits. So I
mentioned in 2004-2005 we went through this test audit that was more detailed
than the CoreTrustSeal but asked very similar questions and
because it was our first audit and even though though we’ve been around for
decades we found some ways that we could improve. And I should emphasize
it was a very positive overall review showing that we could be trusted
as a repository but they provided very useful feedback and the link at the
bottom of the page takes you directly to the final report of that self audit.
So some things we discovered that we’d probably discover if the
CoreTrustSeal was our first audit (or process itself) was we didn’t have
explicit succession and disaster plans and that we needed those. They pointed
out that we received funding from many different sources of funders, especially
grants, and asked us to review again our funding stability. They asked us
about our acquisition of preservation rights from depositors, and asked us to
more formally state those. They pointed out the need for more process and
procedural documentation especially related to preservation workflows. And
they noted one machine room issue, so actually where our physical servers were
located. So again that test audit report is publicly available here, I think we’ve
also archived it at the institutional repository at Michigan
where we’re at. So some changes we made based on that external audit, or
certification audit, was we hired a Digital Preservation Officer
whose task was specifically to ensure that preservation needs were met within
the building. We created policies including several
related to access, preservation, disaster planning. We changed our deposit process
to be more explicit about ICPSR’s right to preserve content, and we
we continued to diversify funding and made changes to the machine room issue. So we were able to really learn a lot, a huge amount I would say from that
initial initial audit. And so over time several years after that we went through
some more audits and went through the Data Seal Approval which was this
precursor to CoreTrustSeal, and it was not as intensive and we didn’t have
external auditors coming in to our building, but it it allowed us to self-document and then have a peer data repository similar to CoreTrustSeal
review what we were doing, what we said we were doing, and ask questions and make
suggestions. So we learned more, the number of changes we made were fewer, but
we still updated our processes and further improved what we were doing again
to ensure that we were trustworthy to the community as a data repository. So a
few of the things we recognized, we needed to make our policies more public
including static and linkable terms of use, we reinforced work on succession
planning, so we belong to a partnership of other
social science data archives and we entered into a partnership agreement
where if something were to happen to us or to them we would ensure that the data
would move on safely to another archive. And it underscored the need to comply
with this Open Archival community information system, community standard, reference model that
many data archives or repositories use. So again we’ve made that certification
report public so anyone can see it. We’ve archived it here at this this address.
And so more recently for the CoreTrustSeal findings we were very happy that
the Secretariat approved certification and based on the comments we continue to
make fine tuning in our processes, so a few of the things we noticed is we make
our data, our metadata records for collections available, and make our data
continuously available, but some of our older versions aren’t immediately
available, you have to request them. And so we’re trying to improve our systems
to make older versions of collections more immediately available for users.
Likewise for persistent identifiers and citations we’re moving to providing file-level identifiers and citations, and not just at the study level. So the Core
TrustSeal is one of many that we’ve gone through and it’s the continual
improvements that we make based on these assessments. So the certification report
is again public, anyone can convey this and we link to it from our website. So
just to close, talk about some benefits. So what did we learn? So that first test
audit resulted in the greatest number of changes and the greatest increase of
awareness, and fewer changes made as a result of CoreTrustSeal and the
precursors to CoreTrustSeal, also probably because it’s not as detailed.
That said if we were just starting now I would imagine that we would be making
many more changes based on the solid questions that CoreTrustSeal is asking.
So simply because we’ve gone through a number of audits before
we didn’t need to make as many changes and it didn’t
take as much time. Other assessments that we continue to go through
continue to surface these additional issues, and I’m sure over time we’ll
always surface new things to improve upon and that’s one of the benefits
right is that we’re not just standing on our laurels, resting on our laurels. A continuation of these benefits, so again
by providing or documenting our processes we’re opening this transparent
view into our repositories, so having external auditors look at or peer review
what we say we’re doing, and then asking questions, that’s always good, that helps
us improve our own internal processes and procedures, and it especially
uncovers this internal sometimes tacit knowledge that’s passed down from staff
member to staff member that’s not documented or not well documented, and so
it’s helped us improve our own internal processes even if they’re they’re not
ever made publicly available. Other areas that we’re measuring against
the community standards – so while we’ve been around for 50 plus years we’re now
measuring against this basis that many other data repositories in the community
are using, especially cross-disciplinary repositories, and we’re learning from
them. And that’s that’s probably one been one of the the most amazing parts of
going through these assessments is it’s a wonderful learning experience
for the organization and for myself. I think this is the last slide, the difficult to quantify aspects of going through a trusted repository
certification or audit, but still valuable aspects. So even though it’s
it’s harder to quantify I do think that we’re, by going through this process
and showing to our stakeholders the results of the process, that it does
improve the trust. Certainly it’s improving the transparency of what we’re
doing. It’s a teaching opportunity especially for a new staff within a
repository, we’ve had (as many repositories do) continual turnover from
retirement, from staff taking positions elsewhere, and so it provides for
training. As I mentioned improves our own processes and procedures and it aligns
across community standards. I think the last thing is it’s really important we
found that the leadership within a data repository is on board for these
certification assessments, that they both find this important and place
emphasis on it within the repository so that the lessons you learn can be
implemented across all of the repository staff and functions broadly. And so
that’s it. I want to thank you for for listening to this presentation and I’m
happy to open it up to questions at this point. [Pause for participants to enter questions.] All right, so the first question is, “if
someone is interested in going through this audit and needs institutional
approval, how do you suggest they start?” So my suggestion is, again, it points to
that importance of leadership approval or involvement.
I would suggest first showing and talking with your immediate supervisor
and then gaining support eventually from the director or someone in senior
leadership for doing this repository assessment. So what helped for us
is, one, showing the value, and communicating the value of the time it
would take to complete the assessments, so we showed external evidence of value,
we showed what we thought internal benefits would be, and I think
there was also push from external stakeholders when we initially made our,
we when we initially began these external assessments, so all of those
things that the the director level was hearing contributed to senior leadership
supporting this and then providing funding and support for staff to be
involved in the assessment. So thanks for that question. So another question that we received, “what are the pros and cons of depositing data in a trusted repository versus just
another repository at someone’s home institution or elsewhere?”
Hopefully I’ve explained the benefits of a trusted repository,
being able to document that what they say they’re doing they’re actually doing.
I think certainly the huge benefit is that by depositing there you
can rest better assured that your data will be safeguarded, that the integrity
and usability of the data will be more assured over time, and that the
repository then by going through that assessment can also potentially make
better use of services or procedures for you as the depositor and for their
users so they’re more attuned to their community. That’s not to say that
repositories that haven’t gone through a certification process are not as good
as a trusted repository, it’s just that by going through the certification process
you’re making public and more transparent that you’re actually
complying with these standards. So then another question was just
about the level of effort required to attain certification. As I mentioned
because we’d gone through this process before with earlier certifications it
didn’t take that much time. There were 16 questions that we documented and we were
able to knock it out in three to five days for one person doing that. Of
course we had most of this documentation out there already on our public website.
But I’ve heard others where it’s been a couple weeks, sometimes a month of one
person doing this, but really that’s I think very manageable especially for an
organization whose job it should be to have these processes in place. So
even if you weren’t going through a certification process having these
procedures and workflows in place is beneficiary. And that’s probably
another thing to point out is even if we didn’t have an external certification
that we could point to on our website just going through the process ourselves
for our own benefit has been more than worth it. All right, Anna do you see any more
questions?
[Anna] No it looks like you’ve answered all the questions I saw.
[Jared] all right well with that thanks again everyone and feel free to email me or
contact me if you have any further questions, and thanks Anna for running
this.
[Anna] Thank you Jared this was tremendous I actually learned a lot that I didn’t
even expect to learn so this is really helpful for me too. Thank you to everyone
who is on the line today. If you have any additional questions please do email
Jared. You can also message ICPSR help at any time at icpsr [email protected] We
will also be sending the slides and the recording to everyone who is present
today but again if you do have any questions please feel free to reach out.
Thank you so much and have a wonderful rest of your day and special thanks to
Jared for this great presentation. Have a great rest of your day.

Daniel Ostrander

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *