vendredi 28 septembre 2018

SRS barn raising: 4th report

This is the fourth progress report of the GDAL SRS barn effort.

The in-progess task at the time of the last report was the addition of a method createFromPROJString() that takes a PROJ string and builds ISO-19111 corresponding objects. It has now been completed for ProjectedCRS. Not all arbitrary valid PROJ strings can be currently mapped to ISO-19111 objects, but at least strings expressing CRS definitions can. A few constructs involving Helmert transforms are also parsed as CoordinateOperation.

A few random tasks completed:
  • Map time-dependent Helmert transforms, and fixes for existing Helmert transforms
  • Map Molodensky and Abridged Molodensky transformation methods from/to PROJ strings
  • Map Longitude rotation transformation method
  • Update to recognize VRF and TRF WKT2-2018 keywords
  • Add import/export of VERTICALEXTENT and TIMEEXTENT WKT2 elements
  • Add import/export of WKT2:2018 USAGE node
  • Progress in implementation of "esoteric" ISO-19111 classes: DatumEnsemble, DynamicVerticalReferenceFrame, OrdinalCS, DerivedProjectedCRS, EngineeringCRS, ParametricCRS, DerivedVerticalCRS
  • Several fixes

The most interesting task regarding the high-level objectives of the roadmap of the barn campaign is the creation of a proj.db SQLite3 database containing CRS (and related objects) and coordinate operations (datum shifts, ...) definitions. The workflow to build it is:
  1. Import EPSG dataset PostgreSQL .sql dumps
  2. Run scripts/build_db.py that will ingest those dumps in a temporary SQLite3 database and then extract needed information from it and marshall it a more digestable form for our final proj.db. At the end the script, outputs new .sql scripts in the data/sql subdirectory of the PROJ directory. We keep in version control those text files, for better tracability of changes.
  3. At make time, we build the proj.db database by importing those .sql scripts.
Steps 1 and 2 are done (typically by PROJ developers) each time you need to update to a newer version.
Step 3 is done automatically at PROJ build time, from a git clone or a tarball of an official release.

The proj.db structure allows for multiple authorities. Instead of having a single code column to identify and reference objects, we use a tuple (authority_name, code) as a key, both columns being of text type for better generality. So ('EPSG', '4326') or ('IGNF', 'LAMB1') are possible. At that time, only EPSG derived objects are in the database. Import of other dictionaries are task for later.

Having a database is good, but using it is better. So the next task was to implement a factory class able to build an object in our ISO-19111 modelling from its (auth_name, code). This is now completed for all object categories of the database. The in-progress task is the generalization/augmentation of the current method createOperation(sourceCRS, targetCRS) that creates coordinate method without external input than the definition of the CRS as a createOperations(sourceCRS, targetCRS, context) that also uses the proj.db database to find registered coordinate operations (typically between the geographic CRS), and take into account specified area of interest and desired accuracy, to propose a list of candidate coordinate operations (chaining for example the reverse projection, a datum shift, and a forward projection). I'm also working on a new `projinfo`utility that will be similar in purpose to the `gdalsrsinfo`, offering the possibility to ingest PROJ strings (legacy PROJ.4 format and new PROJ pipelines), WKT 1, WKT 2, authority codes and output as PROJ (legacy and new), WKT 1, WKT2:2015, WKT2:2018. A mode to list coordinate operations possible between two CRS will also be available.

One interesting statistics: the number of lines of C++ code (including blank lines and comments) added to PROJ per this work is now greater than the number of historical C code: 47 000 lines (14 K being tests) vs 43 000.

mercredi 29 août 2018

SRS barn raising: 3rd report

This is the third progress report of the GDAL SRS barn effort.

In the last month, the main task was to continue, and finish, the mapping of all GDAL currently supported projection methods between their different representations as WKT 2, WKT 1 and PROJ string: LCC_2SP, LCC_2SP_Belgium, Modified Azimuthal Equidistant, Guam Projection, Bonne, (Lambert) Cylindrical Equal Area,    GaussSchreiberTransverseMercator, CassiniSoldner, EckertI to VI, EquidistantCylindricalSpherical, Gall, GoodeHomolosine, InterruptedGoodeHomolosine, GeostationarySatelliteSweepX/Y, International Map of the World Polyconic, Krovak North Oriented and Krovak, LAEA, Mercator_1SP and Mercator_2SP, WebMercator (including GDAL WKT1 import/export tricks), Mollweide, ObliqueStereographic and Orthographic, American polyconic, PolarSterographicVariantA and PolarSterographicVariantB, Robinson and Sinusoidal, Stereographic, VanDerGrinten, Wagner I to VII, QuadrilateralizedSphericalCube, SphericalCrossTrackHeight, Aitoff, Winkel I and II, Winkel Tripel, Craster_Parabolic, Loximuthal and Quartic_Authalic

The task was tedious, but necessary.  For some cases, this involved cross-checking formulas in the EPSG "Guidance Note 7, part 2 Coordinate Conversions & Transformations including Formulas", PROJ implementation and Snyder "Map Projections - A Working Manual" because of ambiguities in some projection names. Typically the ObliqueStereographic in EPSG is not the Oblique Stereographic of Snyder. The former is implemented as the Oblique Stereographic Alternative (sterea) in PROJ, and the later as the Oblique Stereographic (stere). The parameter names in WKT 2 / EPSG tend also to be much more specific that in GDAL WKT 1. When in GDAL WKT1, you have mostly a "latitude_of_origin" parameter mapping to the lat_0 PROJ parameter, in WKT2, parameter names tend to better reflect the mathematical characteristics of the projection, distinguishing between "Latitude of natural origin", "Latitude of projection centre" or "Latitude of false origin"

The currently ongoing task is now to implement a method that takes a PROJ string and builds ISO-19111 corresponding objects. Done for GeographicCRS (+proj=longlat), in progress for Projected CRS. When this will be completed we will have the infrastructure to convert in all directions between PROJ strings, WKT 1 and WKT 2

When digging into PROJ code, I also uncovered a number of issues in the Helmert implementation (confusing for rotational parameters regarding the "Position Vector" vs "Coordinate frame" convention), the handling of the not-so-well-known +geoc flag for geocentric latitudes and the handling of vertical units for geographic CRS with the new PROJ API. All of those fixes have been independantly merged in PROJ master, so as to be available for the upcoming PROJ 5.2.0, which should be released around mid-september (to remove any confusion, this release will not include yet all the WKT 2 related work)

jeudi 26 juillet 2018

SRS barn raising: 2nd report

This is the second progress report of the GDAL SRS barn effort.

Since the previous report, a number of topics have been addressed:
- extension of the class hierarchy to implement BoundCRS (the generalization of the WKT1 TOWGS84 concept. This concept only exists in WKT 2 and has not been modeled in ISO-19111, so I went on my own modelling), TimeCRS, DerivedGeodeticCRS
- implementation of the exportToPROJ() method for CRS and related objects, and CoordinateOperation
- addition of all documentation needed at class and method level so that Doxygen passes without warnings
- implementation of CoordinateOperation::createOperation() method that can instanciate a transformation between two CRSs. For now, it does not use yet the operations of the EPSG database, but it can already take into account the hints of BoundCRS.
- implementation of a number of Transformations: geocentric translation, position vector transformation, coordinate frame rotation, NTv2,  GravityRelatedHeightToGeographic3D, VERTCON.
- start of mapping all GDAL currently supported projection methods. For now: UTM, Transverse Mercator, Transerve Mercator South Oriented, Two Point Equidistant, Tunisia Mapping Grid, Albers Conic Equal Area, Lambert Conic Conformal 1SP, New Zealand Map Grid. Several tens are obviously still missing.
- addition of a isEquivalentTo() method to compare the various objects.
- and of course, extensive unit testing of all the above work.

The result of this continued work can be followed in this pull request.

As a related effort, I've coordinated with the OGC CRS Working Group to provide my comments on the upcoming ISO:19168 / WKTv2 2018 revision.

lundi 18 juin 2018

The barn is raising

Thanks to the support given by the sponsors of the GDAL SRS barn effort, I have been able to kick in the first works in the past weeks. The work up to now has been concentrated on the PROJ front.

The first step was to set a foundation of C++ classes that implement the ISO-19111 / OGC Topic 2 "Referencing by coordinates" standard. Actually I have anticipated the future adoption of the 18-005r1 2018 revision of the standard that takes into account the latest advances in the modelling of coordinate reference systems (in particular dynamic reference frames, geoid-based vertical coordinate reference systems, etc.), which will be reflected in the corresponding update of the WKT2:2018 standard and future updates of the EPSG dataset. If you are curious, you can skim through the resulting PROJ class hierarchy which is really close to the abstract specification (a number of those classes currenty lack a real implementation for now). With the agreement of the newly born PROJ project steering committee, I have opted for C++11 which offers a number of useful modern features to reduce boilerplate and concentrate on the interesting aspects of the work.

On the functional front, there is already support to read WKT1 (its GDAL variant for now) and WKT2 strings and build a subset of the before mentionned C++ objects. And conversely to dump those C++ objects as WKT1 and WKT2 strings. In particular you can import from WKT1 and export to WKT2, or the reverse (within the limitations of each format). So this abstract modelling (quite close to WKT2 of course) effectively serves its purpose to help being independant from the actual representation of the CRS. As I mentionned an early adoption of the OGC Topic 2 standard, similarly I've taken into account the future WKT2:2018 (OGC 18-010) standard that aligns well with the abstract specification. In the API, the user can select if he wants to export according to the currently adopted version WKT2:2015 (OGC 12-063r5), or with the future WKT2:2018 revision.

The result of those first steps can be followed in this pull request.

Another task that has been accomplished is the addition of the Google Test C++ testing framework to PROJ (thanks to Mateusz Loskot for his help with the CMake integration), so all those new features can be correctly tested locally and on all platforms supported by PROJ continuous integration setup.

There are many future steps to do just on the PROJ front :
  • implement remaining classes
  • code documentation
  • comprehensive support of projection methods (at least the set currently supported by GDAL)
  • import from and export to PROJ strings for CRS definitions and coordinate operations
  • use of the EPSG dataset

vendredi 15 juin 2018

SQLite and POSIX advisory locks

Those past couple days, I was working on implementing multi-layer transaction support for GeoPackage datasources (for QGIS 3.4). Multi-layer transaction is an advanced functionality of QGIS (you have to enable it in project settings), initially implemented for PostgreSQL connections where several layers can be edited together so as to have atomic modifications when editing them. Modifications are automatically sent to the database, using SQL savepoints to implement undo/redo operations, instead of being queued in memory and committed at once when the user stops editing  the layer.

While debugging my work during development, I stumbled upon a heisenbug. From time to time, the two auxiliary files attached to a SQLite database opened in Write Ahead Logging (WAL) mode, suffixed -wal and -shm, would suddenly disappear, whereas the file was still being opened by QGIS. As those files are absolutely required, the consequence of this was that following operations on the database failed: new readers (in the QGIS process) would be denied opening the file, and QGIS could not commit any new change to it. When the file was closed, the file returned again in a proper state (which shows the robustness of sqlite). After some time, I realized that my issue arised exactly when I observed the database being edited by QGIS with an external ogrinfo on it (another way to reproduce the issue would be to open a second QGIS instance on the same file and close it). I indeed used ogrinfo to check that the state of the database was consistent during the editing operations. Okay, so instead of a random bug, I had now a perfectly reproducable bug. Half of the way to solve it, right ?

How come ogrinfo, which involves read-only operations, could cause those -wal and -shm files to disappear ? I had some recollection of code I had written in the OGR SQLite driver regarding this. When a dataset opened in read-only mode is closed by OGR, it checks if there's a -wal file still existing (which could happen if a database had not been cleanly closed, like a killed process), and if so, it re-opens it temporarily in update mode, does a dummy operation on it, and close it. If the ogrinfo process is the only one that had a connection on the database, libsqlite would remove the -wal and -shm files automatically (OGR does not directly remove the file, it relies on libsqlite wisdom to determine if they can be removed or not). But wait, in my above situation, ogrinfo was not the exclusive process operating on the database: QGIS was still editing it.... Would that be a bug in the venerable libsqlite ??? (spoiler: no)

I tried to reproduce the situation with replacing QGIS by a plain sqlite console opening the file, and doing a ogrinfo on it. No accidental removal of the -wal and -shm files. Okay, so what is the difference between QGIS and the sqlite console (beside QGIS having like one million extra line of code;-)). Well, QGIS doesn't directly use libsqlite3 to open GeoPackage databases, but uses the OGR GPKG driver. So instead of opening with QGIS or a sqlite3 console, what if I opened with the OGR GPKG driver ? Bingo, in that situation, I could also reproduce the issue. So something in OGR was the culprit. I will save you of the other details, but at the end it turned out that if OGR was opening itself a .gpkg file using standard file API, whereas libsqlite3 was opening it, chaos would result. This situation can happen since for example when opening a dataset, OGR has to open the underlying file to at least read its header and figure out which driver would handle it. So the sequence of operation is normally:
1) the GDALOpenInfo class opens the file
2) the OGR GeoPackage driver realizes this file is for it, and use the sqlite3_open() API to open it
3) the GDALOpenInfo class closes the file it has opened in step 1 (libsqlite3 still manages its own file handle)

When modifying the above sequence, so that 3) is executed before 2), the bug would not appear. At that point, I had some recollection that sqlite3 used POSIX advisory locks to handle concurrent accesses, and that there were some issues with that POSIX API. Digging into the sqlite3.c source code revealed a very interesting 86 line long comment about how POSIX advisory locks are broken by design. The main brokenness are they are advisory and not compulsory of course, but as this is indicated in the name, one cannot really complain about that being a hidden feature. The most interesting finding was: """If you close a file descriptor that points to a file that has locks, all locks on that file that are owned by the current process are released.""" Bingo: that was just what OGR was doing.
My above workaround (to make sure the file is closed before sqlite opens it and set its locks) was OK for a single opening of a file in a process. But what if the user wants to open a second connection on the same file (which arises easily in the QGIS context) ? The rather ugly solution I came off was that the OGR GPKG driver would warn the GDALOpenInfo not to try to open a given file while it was still opened by the driver and pass it the file header it would be supposed to find if it could open the file, so that the driver identification logic can still work. Those fixes are queued for GDAL 2.3.1, whose release candidate is planned next Friday.

Conclusions:
  • never ever open (actually close) a SQLite database with regular file API while libsqlite3 is operating on it (in the same process)
  • POSIX advisory locks are awful.