In a recent blog post from Clarke & Esposito, Colleen Scollans, Michael Clarke, and Pam Harley powerfully articulated why data sovereignty matters for associations and societies working with external publishers (typically commercial publishers and large university presses). 

Writing about data challenges in general, they say:

“... [Association and society] access to granular, individualized data across the publishing enterprise remains limited. Common gaps include:

  • Journal- and institution-level subscription data about exact parameters of a “big deal” or “transformative agreements” – including, in the case of consortia deals, which institutions can access their content
  • Audience data – who is viewing content and what content are they viewing?
  • Customer data – who has subscribed to emails and registered for an account on the publisher platform?
  • Marketing performance, SEO, and web page analytics (e.g., Google Analytics) – How are people discovering my content, how are they engaging onsite, and how do my marketing campaigns impact usage trends?

Data sovereignty is not just about receiving reports from your publisher. While reports are valuable, what societies truly need is direct access to raw data on a regular basis. This allows a society to integrate the data with their own systems, perform custom analyses, and leverage it for business intelligence (BI) and marketing purposes.”

Hum has seen first-hand how access to comprehensive data can transform an organization's ability to serve its community and drive innovation. Here, we offer suggestions on the sorts of rights you’ll want to preserve as you build your first-party data strategy and enshrine your data rights with your publishing partners.

Access to Your First-Party Data Matters More Than Ever

The rise of AI and machine learning has made having access to your data more critical than ever. Your publication data isn't just about tracking usage anymore. It's the foundation for many things, including:

  • Training specialized AI models that understand your field's unique terminology and concepts
  • Creating personalized recommendations that connect your readers with more of your relevant content (including non-journals content)
  • Identifying emerging research trends and potential breakthrough areas
  • Developing predictive analytics for submission patterns
  • Improving peer reviewer recruitment and peer review management overall
  • Speeding up peer review itself
  • Building content summarization tools tailored to different audience segments
  • Linking up engagement data from your publishing platforms with data from your other platforms to drive things like
    • improved event programming
    • Ideas for new and improved educational programs
    • Increased registrations for meetings, events, courses, and certifications
    • Increased member engagement (and therefore reduced member churn)

Without direct access to your data, you're not just missing out on current opportunities  you're ceding the value from future AI-driven innovations to your commercial publishing partners.

Contract Considerations for Data Sovereignty

What follows is not legal advice. These are suggestions for points associations and societies may want to raise with their legal counsel as they negotiate agreements with commercial publishers.

Since publishing agreements tend to be for 3-5 years, you need to be planning for the future today. When negotiating with commercial publishers, consider working with your legal team to include these key concepts:

1. Data Ownership

It should be clear in your contract language who owns the first-party data your publishing partner captures as readers interact with your content. This data should include everything your publishing partner captures across their digital platforms, now and in the future. For publishers with sophisticated tech stacks, today that would be things like:

  • Individual user interaction data
    • A unique profile ID (used to track behavior over multiple sessions)
    • What country/region a reader is from
    • What institution a reader is from
    • What pieces of content a reader has looked at (at the level of a DOI)
    • What topics (keywords) are associated with the content they viewed
    • How much of and what parts of a particular piece of content a particular reader viewed (eg abstract or other summary? Read the whole thing?)
    • Whether they took any other action on a particular piece of content (eg downloaded a pdf; shared it, liked it, and so on). This will be defined by the platform being used by your commercial publishing partner.
  • Marketing performance metrics
    • What ads or interstitials they were shown, and of those, which they clicked on
  • Subscription and access data
    • Which institutions have access to your non-open-access content (if applicable)?
    • From which institutions have you had visitors to your open-access content?
    • What business models are in place in each case? (eg, S2O, Read-and-Publish)
  • Author and reviewer information
    • Names, emails, titles, affiliated institutions, and the titles of all authors of all submitted manuscripts, and all reviewers, by journal, and by manuscript.

Ideally, you as the society will own this data. Even if you can’t use it yet, you should preserve your rights as new technologies will make mining this much easier in the near future.

It is possible your publishing partner may wish to co-own this data. Co-ownership is possible, and if you go that route you will want to make it clear whether this data can be licensed or made available to third parties by your publishing partner, and if so, under what terms (for example, if it’s licensed or otherwise exploited on a commercial basis, is your society entitled to a share of the revenues?)

2. Data Access and Delivery

Owning something isn’t enough; you have to be able to get it in a format usable by you. First, decide what that format will be. The answer to this will depend on the level of sophistication of your society’s tech stack and your in-house technical expertise.

The easier a society makes it for a commercial publisher to provide data in a standard, scalable way, the easier it will be for the commercial publisher to agree to make this happen.

Perhaps the two easiest options are either:

  • Real-time API access to specified data streams, or
  • Monthly data exports in an industry-standard format (which include .xls or .csv)

In either case, you’ll want:

  • Documentation of data schemas and field definitions
  • Access to raw data – not just summary reports

3. Data Usage Rights

If your society is not the exclusive owner of the data, it’s a good practice to be explicit about your rights to use the data. This should include, but not be limited to, things like:

  • Use all collected data for any purpose so long as you preserve any privacy commitments made at the time the data were collected
  • Combine provided data with other data sources
  • Create derivative works and products using the data
  • Train AI models using the data
  • Share data with third-party vendors and partners, subject to their agreement to preserve any privacy commitments that might be relevant.

4. Privacy and Compliance

As a society, you will want to make sure the first-party data you are being provided by your publishing partner was properly collected. This means you’ll want the publisher to warrant that they will: 

  • Ensure all data collection complies with all relevant privacy laws
  • Obtain necessary consents for data sharing (this may include making explicit in their privacy policy that they share data collected on their site with their society partners)
  • Maintain appropriate security measures
  • Alert you  in cases where data requests are filed by users (for example, if there is a ‘request to be forgotten’ filed under GDPR).

Note that your publishing partner, in sharing this data, may well ask you to warrant the final two items, as well as warranting that you will respect the privacy commitments made when the data was collected.

Common Pitfalls to Avoid

  1. Not defining minimum, specific data requirements
  2. Accepting aggregate data only
  3. Neglecting data quality requirements
  4. Overlooking data format specifications
  5. Future-proofing yourself by not including data or uses yet to be imagined. 

You May Get Pushback

As Colleen, Michael, and Pam noted in their blog post,

In our experience, commercial publishers often say they agree with the principle of sharing data with their society partners. However, when societies request data access, they often face administrative walls. These walls are reinforced by teams like Legal, Information Security and Privacy, Marketing, Technology, and Product, and they all need to align before data can be shared. And that alignment may take a very (very, very, very …) long time. Or one team may simply say “it’s against our policy” and block progress.  

When societies request their data, publishers typically raise these objections:

  1. We don’t have a mechanism to share behavioral data or customer data.
  2. Privacy laws prevent us from sharing user or customer data.
  3. We can’t share institutional subscription details because it is business proprietary information.

These barriers are not insurmountable…..

In private, publishers sometimes dismiss the urgency of data sharing, suggesting that associations are not ready or would not know how to use the data effectively anyway. Our experience shows the contrary – many associations are actively developing strategies to use data as a strategic asset. Some associations may not be ready today, but they will be in the near future, and wish to take steps now (e.g., in publishing contracts) to secure data rights.

To this we’d add another publisher concern that we have heard expressed privately: that publishers don’t want to get into disagreements about publishing or communication strategies with their society partners, and are reluctant to share data that may result in that. The solution to this, of course, is to be data-driven together, and for societies and publishers to work together to analyse trends, agree ways to benefit authors and readers, and spot opportunities to improve existing products or create new ones.

Look Ahead to Your Future

Data sovereignty isn't just about owning and having access to your data - it’s about controlling your future. As AI continues to transform scholarly publishing, organizations that have full access to their data will be best positioned to innovate and serve their communities–and fulfill their missions–effectively and efficiently. Don’t let another 5-year publishing contract get signed without addressing this issue.

Don’t Just Get Your Data. Get Your Data.

Interested in learning more about using your data to drive results? Gain inspiration from societies, associations, and publishers who are putting their data to work and getting real results, or schedule a demo to see Hum in action.