Menu

Documentation

mathdoc cellule coordination documentaire math contact 3 1

One of Mathdoc’s mission is to develop digital services and libraries for specialised mathematical communities and librarians, in cooperation with various Scientific and Technical Information (IST) operators.

Numdam, the French digital mathematics library has been developed to ensure access and preserve the output of the main French and European journals in mathematics.

Among other documentary services, Mathdoc has created several websites for accessing digital resources from external partners:

The following digital libraries have also been devised by Mathdoc: Mini-DML, LiNum, MDML. They serve as prototypes for larger projects such as Geodesic (currently under development). The EuDML European digital library is the result of a partnership with twelve institutions including Mathdoc, who played a major role.

Mathdoc was actively involved in the development of the Catalogue Fusionné des Périodiques de Mathématiques, combined catalogue of mathematical periodicals (CFP) and of Portail Math and continues to manage these services in conjunction with the Réseau National des Bibliothèques de Mathématique, France’s national network of mathematical libraries (RNBM) and Mathrice.

Selection and acquisitions

mathdoc cellule coordination documentaire math portail math livres

Whether for Numdam or for other digital libraries such as Geodesic, Mathdoc relies on the expertise of its scientific council members for prioritising collections to be acquired. The collection of documents and metadata is done either by aggregation of metadata, via natively digital acquisition chains, or via digitisation operations.

Metadata harvesting

The data exchange protocol used by Mathdoc is the OAI-PMH protocol. Thus, Mathdoc strives to unify access to digital documents in documentary collections produced and hosted by other structures in order to make them more visible and easy to access. This consists of aggregating the metadata of their collections in virtual libraries specialized in mathematics such as mini-DML for articles, LiNuM for books, EuDML at European level.

For the Geodesic project, Mathdoc librarians work in consultation with the RNBM and members of Mathdoc’s scientific council to identify the sources of publications to be aggregated, and in close collaboration with the IT development team for the implementation of harvesting processes.

Aggregation of sources

Mathdoc has developed Portail Math in partnership with Mathrice and the RNBM. This documentary portal aggregates the main sources of mathematical documentary materials on its digital library (among other services provided). New requests for sources to be aggregated are regularly flagged by RNBM to Mathdoc’s Documentation sector.

Acquisitions of native digital documents

In order to add to the existing collections of journals in Numdam which are now published in electronic format, digital acquisition chains are being set up in collaboration with partner publishers. This method of acquisition can also be used for the new collections to be integrated into Numdam.

In this way, after publication of their collections, publishers can make their data available to Mathdoc: full texts in PDF format, metadata in XML format and bibliographies in BibTeX format, if possible. Once the files are received, Mathdoc re-aggregates the metadata into an open XML format suitable for uploading to Numdam.

Scanning

The RNBM supports Mathdoc to provide complete and quality collections to be digitised in order to integrate them into Numdam or other digital libraries such as LiNuM and NUMiR at the time of their constitution.

Regardless of the method of acquisition of the collections, in the absence of a publishing contract stipulating an assignment of digital publication rights, Mathdoc librarians are looking for authors or their successors in title in order to obtain the copyright transfer needed for the digitisation and dissemination of collections to be integrated into the various digital libraries.

Scanning

mathdoc cellule coordination documentaire math portail math numerisation

Scanning operations that are managed or carried out by Mathdoc are part of Numdam’s collection acquisition policy but can also meet other requests. For digitisation operations that involve many volumes (possibly dissassembable), Mathdoc calls on external service providers and coordinates projects. On the other hand, to digitise old and valuable documents punctually, Mathdoc now has a suitable planetary scanner.

Scanning operations by external service providers

Preparations

The first step is the drafting of a call for tenders for public procurement (drafting of a CCTP – special technical clauses) in the event that the service exceeds a certain amount. This stage prepares the production phase. The second step consists of recovering the collections. Mathdoc does not own any documentary collections, and therefore borrows collections from its partners: RNBM libraries, local libraries, university and municipal libraries, publishers… It is therefore necessary to identify the owners of these collections, and then to agree on lending and transport specifications. The third step involves creating a so-called ‘count’ file. This is a spreadsheet file with 3 levels of data: a description of the batch, a description of the documents they contain and a description of the articles in each volume.

The counting file also describes the construction of item identifiers in the form ACRONYME_aaaa__V_F_Ax_0:
aaaa represents the year which can also be of the form aaaa-aaaa
V is the volume number, if one exists
F is the number of the fascicle, if one exists
A is a code for articles with their own pagination
x is the number of the article in the volume if accompanied by an A, or the first page of the article in case of continuous pagination
S is used for special volumes
0 is the location of the article on the page

These identifiers constitute the permanent links of Numdam’s articles to ensure permanent access, even in the event of data transfer to a new server.

Production

The production phase is carried out by the service provider:
sanning of all pages in black and white TIFF 600 dpi* format
creating the “article” files: PDF, DJVU and multi-page TIFF
producing one OCR.xml file per article
creating metadata files in XML format for each volume. The file gathers information about the volume, the articles and the carefully marked-up bibliographic references.

* Single page files accurately reproduce the original document, including blank pages. They are kept for archival purposes and may be used to reprint the originals, or to re-create the PDF of a corrupted article when necessary.

Quality assessment

When production is terminated, Mathdoc carries out a double quality assessment: one is exhaustive, the other on batches. The exhaustive check is a series of automatic analyses carried out on all the delivered files to quickly identify errors,using the initial count file as a basis. It also checks the accuracy of the .xml files. The batch verification allows detailed verification of the visual quality of the files, the page format, and serves to verify the full text search.. Here, the web interface keeps track of all verified files and errors detected. The size of the data samples and the number of acceptable errors are determined according to the AFNOR X06-021 standard (principles of statistical control of batches). The result of the quality assessment is recorded in a receipt file which serves as a basis during exchanges between Mathdoc and the data producer. They will use it to correct the detected errors, until Mathdoc’s requests are fully satisfied.

Scanning operations for local partners

In 2012, Mathdoc scanned the entire Revue d’Écologie Alpine in partnership with Laboratoire d’Ecologie Alpine (LECA) and Université Joseph Fourier (UFJ), which subsidised this project as part of its mission to promote local scientific and technical legacies.

Mathdoc also worked with Observatoire des Sciences de l’Univers de Grenoble (OSUG) to include the Revue de Géologie Alpine into the Numdam production chain. The works are made available via a website dedicated to alpine geology at the Grenoble Observatory administered by Mathdoc.

In regards to Institut de recherche sur l’enseignement des mathématiques de Grenoble (IREM), Mathdoc has initiated and supervised the scanning operations of the IREM publications for the years 1974 to 2014: scanning, quality assessment and data enrichment, data delivery. These data were provided to IREM without Mathdoc putting them online.

In-house digitisation

For its one-time scanning operations, Mathdoc acquired in January 2022 a specific scanner enabling the digitisation of a valuable legacy of mathematical documents (aged and precious books). This in-house digitisation can fill gaps in digital library collections or can respond to external digitisation requests.

Cataloguing and enrichment

mathdoc cellule coordination documentaire math portail math livres ordinateur

Cataloguing consists of analyzing documents to describe them in order to allow their identification in a library catalogue or on a digital library. It is both a physical description via bibliographic records and an intellectual description via indexing using keywords or a classification such as the Mathematics Subject Classification. Whatever the source of acquisitions, document metadata is enriched and full-text is indexed to allow multi-criteria and full-text searches.

Cataloguing

.xml cataloguing

The first task relating to data enrichment is to devise a detailed catalogue of all documents to be included in the digital library, Numdam in particular. To this end, a Document Type Definition (DTD) entitled “Volphys” has been developed in 2003 by the Mathdoc librarians to implement the .xml files that serve as a backbone for the digital library. Here is an example of .xml cataloguing from the Volphys DTD:

We are currently adapting to JATS standards for articles and to BITS standards for books because they are now the most common standards. Indeed, they were developed by the National Information Standards Organization (NISO), a non-profit organization dedicated to standards in the field of publishing, libraries and access to information.

Such cataloguing meets the final objectives of the online publication, i.e. the ability to browse the collections, search by title, author, year, full-text, and via bibliographies.

Cataloguing standards

Mathdoc’s role is also to coordinate at the national level the respect of the cataloguing standards recommended by the Agence bibliographique de l’enseignement supérieur, French bibliographic agency for higher education (Abes) and the harmonization of practices for the description of journals reported in the CFP. This is one of the objectives of the operational committee of the CFP, an RNBM working group co-facilitated by Mathdoc.

				
					<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE volphys SYSTEM "http://www.numdam.org/dtd/volphys.dtd">
<volphys>
	<notice>
		<idvol>AST_1973__1_</idvol>
		<revue>
			<issn>0303-1179</issn>
			<acronumdam>AST</acronumdam>
			<titre_revue>Astérisque</titre_revue>
		</revue>
		<tome>
			<numero>1</numero>
			<annee>1973</annee>
			<titre_vol>Trois problèmes sur les sommes trigonométriques</titre_vol>
		</tome>
		<fascicule/>
		<resp/>
		<editeur>Société mathématique de France</editeur>
		<pages>94</pages>
		<numerisation>
			<idphys>AST_1973__1_</idphys>
			<datescan>2017-06-26</datescan>
			<infos>Numérisation à défilement à 600 dpi noir et blanc</infos>
		</numerisation>
	</notice>
	<article type="normal">
		<idart>AST_1973__1__1_0</idart>
		<ordreart>1</ordreart>
		<pagedeb systnum="arabe" pagination="normal" typepag="normal">1</pagedeb>
		<pagefin systnum="arabe" pagination="normal" typepag="normal">87</pagefin>
		<nbpages>87</nbpages>
		<ordre>0</ordre>
		<auteur>
			<nom>Meyer</nom>
			<prenom>Yves</prenom>
		</auteur>
		<titre xml:lang="fr">Trois problèmes sur les sommes trigonométriques</titre>
		<langue>fr</langue>
		<cphys>page0005.tif … / …page0091.tif</cphys>
		<biblio>
<bibitem>[1] <bauteur><bnom>Borevich</bnom>, <bprenom>Z. I.</bprenom></bauteur> et <bauteur><bnom>Shafarevich</bnom>, <bprenom>I. R.</bprenom></bauteur> <btitre>Number theory</btitre>. <bediteur>Academic Press</bediteur>, <bannee>1966</bannee>.</bibitem>
		</biblio>
	</article>
</volphys>

				
			

Enrichment tools

Tools have been developed in-house to make it easier to work with metadata. Among them are the “refinement” tools that make it possible to improve and unify metadata through the following operations:

correcting the syntax and spelling of titles
re-writing in LaTeX the mathematical formulae within titles, summaries and bibliographies
correcting tags within bibliographies
unifying the Numdam author database: addition of author identifiers IdRef, ORCID, zbMATH, deduplication, merging (work under development, see below)
add links between articles which mention “continuation of”, “erratum of”, etc.
add and / or correct ISSNs (paper or electronic) in collaboration with Abes (via the Cidemis tool)

Matching

Links are created within the bibliographies of articles through a matching process to ensure each reference corresponds to its entries in the mathematical databases zbMATH and MathSciNet. If they exist, links are also established with Crossref, EuDML and Numdam, or to a website providing the full text of the article cited in the bibliography.

Adding author IDs

A project to recover author records from the IdRef database was recently launched to address issues pertaining to duplication and homonymies in Numdam’s author database. Since the zbMATH and ORCID author identifiers are linked to IdRef, Mathdoc can also retrieve these informations.

Re-indexing

Mathdoc is often consulted by owners of digitised collections who wish to improve their visibility through Numdam. BnF’s status as “associated division” initiated the recovery of several collections from Gallica.

Among the BnF collections that have been re-indexed are: the complete works of mathematicians accessible via the Gallica-Math Oeuvres Complètes website, the Journal de Mathématiques Pures et Appliquées and the Répertoire bibliographique des Sciences Mathématiques. Specific websites have been designed for these collections and in time they will be included in Numdam.

The Journal de Mathématiques Pures et Appliquées (JMPA) was recently updated to enhance the BnF’s collection and to expand the processes applied between 1935 and 1946, and finally integrate the entire collection into Numdam. The recovery of this data included a detailed cataloguing of each volume, the scanning of volumes missing from Gallica, and also character recognition within the BnF’s scanned files.

Data curation

For the Geodesic project, the curation work will consist of completing the harvested collections, correcting and enriching the metadata, deduplication the articles and adding links to the full text of the documents distributed in open access. The objective of this project is to provide unified access to the profusion of open access digital documents scattered across the web.

History and ISSN

In accordance with the recommendations of the ISO 8:2019 standard (Information and documentation — Presentation and identification of periodicals), Mathdoc has improved the presentation of Numdam’s collections by adding for each journal title or monograph collection thetitle history with mentions of title changes, publisher, publication periods, ISSN and eISSN and if available, a link to the current edition.

History graphs from CFP

The partnership with the RNBM, pivotal since the beginning of Mathdoc, resulted in the creation of the Catalogue Fusionné des Périodiques de Mathématiques, combined catalogue of mathematical periodicals (CFP) which also serves as a management tool for the Plan de conservation partagée des périodiques de mathématiques, shared conservation plan for mathematics periodicals (PCMath). The interest of this catalog also lies in the display of journal histories that have been retrieved to feed the histories in Numdam.

ISSN

The implementation of the histories required the exploration of the bibliographic catalogues of the BnF, Sudoc, and the ISSN Portal for which Mathdoc has subscribed. The information found was concatenated and compared with each other to achieve maximum objective reality. The publishers’ sites are also authentic in the search if this information is available.

For some documents, such as Seminars, which are of primary importance to mathematicians, but which were not subject to any legal deposit at the time of publication, it was almost always necessary to ask for an eISSN attribution. This is in order to comply with the laws relating to the distribution of digital documents. The Cidemis application (CIrcuit dématérialisé des DEMandes ISSN) allowed this. It is a French Abes tool designed for this type of request. Cidemis is also used for requests to modify records, when, for example, an error occurs in the Sudoc catalogue or even in the ISSN database.

mathdoc cellule coordination documentaire math open access

Dissemination and reporting

mathdoc cellule coordination documentaire diffusion signalement 2

Open access diffusion of research materials is one of Mathdoc’s main objective in regards to its documentary activities, whether in Numdam or in other digital libraries such as EuDML, Geodesic or Portail Math. Interoperability with other platforms also makes it possible to enhance these collections and improve their visibility.

OAI-PMH servers

All Numdam metadata supply the OAI servers which ensure interoperability with other platforms such as Gallica, BASE or EuDML. Access to the database containing the bibliographic references of all the articles of the participating journals is completely free through the search and browsing functions. The database itself is the property of Mathdoc. A CC0 license is assigned to the metadata, so it is placed in the public domain. Numdam has an OAI-PMH server, thus allowing sharing of metadata.

mathdoc cellule coordination documentaire math interoperabilite

KBART files

Mathdoc participates in the feeding of BACON (National Knowledge Base), a CC0-licensed reference metadata warehouse managed by Abes. Its objective is to optimise the reporting of electronic resources in order to facilitate access and promote the sharing of metadata between science communication stakeholders.

For this purpose, Mathdoc provides its own data to Abes in the form of KBART files. These have spreadsheets that contain all the data related to a journal, with mentions of start (and end dates if applicable), ISSN and eISSN numbers, dates of availability in free access online, type of document, periods of mobile barriers, etc.

Initially created manually to harmonize Mathdoc’s metadata with that of higher education and research documentary institutions, it is now produced automatically. The KBART file of centre Mersenne is also available on BACON.

mathdoc cellule coordination documentaire math fichier kbart

In 2021, Mathdoc signed a partnership agreement with Mir@bel which is part of the Open Science movement. Indeed, Mir@bel aims to enhance the content of scientific periodicals (journals or seminars) accessible online. Since then, the journal collections of Numdam and centre Mersenne have been the subject of the creation of records in Mir@bel and these data are regularly updated by Mathdoc via KBART files.