jeudi 26 septembre 2024

Those concepts in the geospatial field that have cost us millions of $$$

Every domain has its baggage of concepts, which at first sight don't appear to be that terrible, but which are in practice.

Let's start with a non-geospatial example. A lot of programming languages have the concept of a "null" pointer. It is available in the C and C++ programming languages in particular, but in Java as well, or Python, etc.. Tony Hoare, null's creator, regrets its invention: "I call it my billion-dollar mistake." While very useful in some contexts, that feature also happens to cause a lot of bugs, with severe criticity in some cases. The Rust language has for example (almost) removed the null pointer concept and replace it with optional references, where the compiler enforces that the code explictly checks for the validity of the reference before using it. At the minute, I'm writing this, I'm debating about that very issue with a colleague.

The geospatial field is not free of concepts that are a never-ending source of troubles, and I will come with my own list, with my perspective of software developer.

- Geometry model. Point, lines, polygons. What could be most simple than that? Well, the commonly-used OGC Simple Features model allows those objects to be empty. How is that useful in practice? Hard to tell: NULL geometries are (somewhat paradoxically given the above paragraphy) actually a better replacement. My own perception of that "feature" is that it is mostly a huge headache that goes into your way when trying to write robust code. One evidence of that is that the same standard that invented it neglected to specify how to export an empty point in the Well Known Binary representation. Over the years, the tradition has become to use the Not-A-Number value for the X and Y value of an empty point. Which by itself may cause interesting consequences for applications that would neglect to make a special case. A Not-A-Number is ... well ... not a number, and for example it is not equal to itself ... (and in the IEEE-754 representation, there are litteraly billions of different binary potential encodings of a NotANumber). Everything you learnt at school in your math courses will break with it (this is actually quite general when crunching numbers with computers. Finite precision break a lot of the expected properties of ideal numbers). An empty line has the undesirable property of not having a start or end point: any algorithm must be ready for that. Another annoyance is that there is not just one "geometry empty" object, but a "point empty", a "line empty", a "polygon empty", etc. What is the expected intersection of an empty line with an empty polygon ? An empty line, an empty polygon, an empty point, ... ? The developers of the GEOS library or the PostGIS spatial extension have unfortunately to regularly debate at length about those topics. One can but think there would be a better use of their expertise and time than having to deal with such esoteric subjects (I didn't ask them, so they may actually be thrilled by them. You never know...)

- Coordinate reference system (CRS) axis order.  CRS, such as WGS 84 geographic, or UTM 31N / ETRS89 have several axis. For geographic CRS, this will be the longitude, the latitude, and optionally the ellipsoidal height. When expressing coordinates in a CRS, one must decide in which order they are specified. They are lengthy debates whether this should be longitude first, latitude second, or the reverse. The ISO 19111 / OGC Abstract Topic 2 specification or geodetic registries have decided to not take a firm stance on that, and have allowed authorities responsible for CRS definition and maintenance, to submit CRS definitions with the axis order they wish. Excellent! Well no. The issue is that while non-geomaticians user may chose to express a coordinate in prose like "50 degree of latitude north, 15 degree of longitude east", or "15 degree of longitude east, 50 degree of latitude north", that doesn't mean it is a good idea that the software systems reflect that liberty of speech. Some GIS formats have no way of clearly expressing the CRS, or if they have, they might use an incomplete way of specifying it, in particular lacking the way to express the axis order. The usual practice is to generally specify the longitude as the first value (as corresponding to the X/horizontal axis of a Cartesian plan) and latitude next (Y/vertical axis), refleting the natural mapping to make a graphical representation. Other formats (GML in particular) require that the coordinates are expressed in the order of the CRS definition, which require access to a database to get the axis order, given that in GML vector files, the CRS is only referenced through a code, and not defined inline. Depending on whether the persons responsible to design the protocol/file format, the order may be the "GIS friendly one" (longitude-latitude), or the "CRS pedantic one" (latitude-longitude for example for geographic CRS defined by the EPSG geodetic registry). This is an eternal source of confusion. Sometimes with absurd situations. The OGC GeoPackage file format captures a full definition of the CRS used in the vector tables it contains, including in particular the official axis order, but to reflect the long-GIS tradition, as an amendment, specify that the encoding of coordinates in its (extension of) the WKB format mentionned in the previous paragraph should be longitude-latitude (for geographic CRS) or easting-northing (for projected CRS). I will not blame anyone particular for this. This is an "overall system error". In the ideal situation, a courageous geomatician in a standard organization or in a geodetic registry should have said "here, we are geomaticians: geographic CRS are always longitude-latitude, and projected CRS are always easting-northing.  It is your responsibility as users of our system to provide data always in that order".  Failing to have access to a time-travel machine to warn in advance my glorious predecessors about the incoming catastrophe, the only solution I see to solve the issue it is to ask all population to relocate on the line of longitude=latitude, and exclude any mapping outside of it.

- Raster cell registration issues, a.k.a pixel-centre versus pixel-corner, or pixel-is-point versus pixel-is-area, a.k.a the half-pixel shift error. A raster cell is both an entity you reference with a (column, line) integer pair, so perceived as a discrete point, but when displayed, it actually occupies a non-zero area on your display device. When that raster cell is registered against geospatial coordinates, one debate is: "what exact place in that cell does this (longitude, latitude) or (easting, northing) refer to? Is that the center of the pixel, or maybe its top-left corner?" . Obviously, whenever there is a choice, file format and service specifications never agree together. The GDAL software has courgeously decided to "standardize" its internal model to the convention where that the georeferenced coordinate associated to a pixel qualifies the location of the top-left corner. GDAL format drivers do the necessary dance of applying and de-applying a half-pixel shift to go into that convention ... when they have access to the convention of the underlying format.

A temptative conclusion could be that any proposed standard or specification should go to the step of having an actual real-world implementation of it, not just a "working prototype" ("toy implementation" more casually), to check whether some apparently minor details are not a major source of inconvenience.